Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create isolation kernel and svm_iso #4

Open
frankiethull opened this issue Jul 23, 2024 · 3 comments
Open

create isolation kernel and svm_iso #4

frankiethull opened this issue Jul 23, 2024 · 3 comments
Labels
engine engine topic enhancement New feature or request help wanted Extra attention is needed kernel kernel binding

Comments

@frankiethull
Copy link
Owner

I think it would be cool to build an isolation kernel for {maize}. It would fall under specialty kernels, but needs implemented and then provide bindings to parsnip.

I think this would be a multistep process,

  1. create a custom kernel in kernlab called something like isodot. This would need to be of the kernel class. within this kernel, call isotree::isolation.forest(), i think this would be easiest
  2. add this kernel to a new script called kernels and would be compatible with ksvm, the main binding of {maize}
  3. create new function and bindings via svm_iso and svm_iso_data, referencing the new isodot in defaults
  4. merge in main.

I think that's all that is necessary (?)

@frankiethull frankiethull added the enhancement New feature or request label Jul 23, 2024
@frankiethull
Copy link
Owner Author

Ting, Kai Ming, Yue Zhu, and Zhi-Hua Zhou. "Isolation kernel and its effect on SVM." Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018.

@frankiethull frankiethull added the help wanted Extra attention is needed label Jul 31, 2024
@frankiethull
Copy link
Owner Author

I partially solved this puzzle based on the python code. .

iso_mat <- function(){
  iso <- isotree::isolation.forest(maize::corn_data |> 
                                     dplyr::select(-type), 
                                   ntrees = 200, 
                                   missing_action = "impute")
  isotree.set.reference.points(iso, x, with_distances = TRUE)
  
  k <- predict(iso, x, type = "dist", use_reference_points = TRUE)  
  d <- 1 - k 
  return(d) # i ran this in R and substituted the corn data in Python, confirmed they match the python format and value
}

# woohoo! but not good enough, need to get this working as a dot
iso_mat() -> matrix
ksvm(x = matrix, y = maize::corn_data$type, kernel = "matrix", C = 10)

but I rather do this as a true kernel not a precomputed matrix as that breaks the formula aspect of ksvm(). The way it's setup right now, I am almost tempted to create a step_isolation() for isolation kernel support.. but posting my test here and seeing if others have an idea for this.

when I run this it fails because one row is fed into the isolation.forest at a time.

isodot <- function(x, y){
    iso <- isotree::isolation.forest(x, ntrees = 100, missing_action = "impute")
   isotree.set.reference.points(iso, x, with_distances = TRUE)
  
  k <- predict(iso, x, type = "score", use_reference_points = TRUE)  
  d <- 1 - k
  return(crossprod(d, y))
}
class(isodot) <- "kernel"

ksvm(type ~., data = maize::corn_data, kernel = isodot, C = 10)

@frankiethull frankiethull added kernel kernel binding engine engine topic labels Oct 2, 2024
@frankiethull
Copy link
Owner Author

also discussed here cornucopia

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engine engine topic enhancement New feature or request help wanted Extra attention is needed kernel kernel binding
Projects
None yet
Development

No branches or pull requests

1 participant