Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broaden MLJ target_scitype only when using TemplateExpression #392

Merged
merged 6 commits into from
Jan 3, 2025

Conversation

MilesCranmer
Copy link
Owner

Fixes #390

@ablaom how is this?

@MilesCranmer MilesCranmer force-pushed the narrow-mlj-scitype branch 3 times, most recently from 37e7021 to a394da2 Compare December 21, 2024 03:29
Copy link
Contributor

github-actions bot commented Dec 21, 2024

Benchmark Results

master 9f82c13... master/9f82c13666a326...
search/multithreading 17.6 ± 1 s 17.4 ± 1 s 1.01
search/serial 29.5 ± 0.18 s 29.1 ± 0.31 s 1.01
utils/best_of_sample 1.7 ± 0.49 μs 1.7 ± 0.57 μs 1
utils/check_constraints_x10 11.4 ± 2.9 μs 11.1 ± 2.7 μs 1.04
utils/compute_complexity_x10/Float64 2.07 ± 0.12 μs 2.09 ± 0.11 μs 0.99
utils/compute_complexity_x10/Int64 2.05 ± 0.12 μs 2.06 ± 0.11 μs 0.995
utils/compute_complexity_x10/nothing 1.47 ± 0.12 μs 1.41 ± 0.12 μs 1.04
utils/insert_random_op_x10 5.86 ± 1.9 μs 5.87 ± 1.9 μs 0.998
utils/next_generation_x100 0.353 ± 0.094 ms 0.356 ± 0.099 ms 0.993
utils/optimize_constants_x10 0.0347 ± 0.008 s 0.0354 ± 0.0095 s 0.979
utils/randomly_rotate_tree_x10 5.4 ± 0.63 μs 5.21 ± 0.59 μs 1.04
time_to_load 1.89 ± 0.015 s 1.9 ± 0.03 s 0.994

Benchmark Plots

A plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR.
Go to "Actions"->"Benchmark a pull request"->[the most recent run]->"Artifacts" (at the bottom).

@coveralls
Copy link

coveralls commented Dec 21, 2024

Pull Request Test Coverage Report for Build 12594637511

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 20 of 22 (90.91%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.04%) to 94.776%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/MLJInterface.jl 16 18 88.89%
Totals Coverage Status
Change from base Build 12521558752: 0.04%
Covered Lines: 3157
Relevant Lines: 3331

💛 - Coveralls

@ablaom
Copy link

ablaom commented Dec 22, 2024

Thanks for this. Unfortunately, I'm getting a new problem. It looks like certain kinds of table are not supported. And training is extraordinarily slow.

using SymbolicRegression
using MLJBase
using Tables

M = SRRegressor

X = (x1 = rand(5), x2 = rand(5))
y = rand(5)

mach = machine(M(), X, y)
@time fit!(mach, verbosity=0)  # second attempt
# 6.551545 seconds (209.99 M allocations: 10.681 GiB, 39.67% gc time, 0.10% compilation time: 100% of which was recompilation)

X = Tables.rowtable(X)
machine(M(), X, y) |> fit!
# julia> machine(M(), X, y) |> fit!
# [ Info: Training machine(SRRegressor(defaults = nothing, …), …).
# ┌ Error: Problem fitting the machine machine(SRRegressor(defaults = nothing, …), …).
# └ @ MLJBase ~/.julia/packages/MLJBase/7nGJF/src/machines.jl:694
# [ Info: Running type checks...
# [ Info: Type checks okay.
# ERROR: MethodError: no method matching haskey(::Vector{@NamedTuple{x1::Float64, x2::Float64}}, ::Symbol)
# The function `haskey` exists, but no method is defined for this combination of argument types.
  [a7f614a8] MLJBase v1.7.0
  [697918b4] MLJTestIntegration v0.5.2
  [8254be44] SymbolicRegression v1.5.0 `https://github.com/MilesCranmer/SymbolicRegression.jl.git#narrow-mlj-scitype`
  [bd369af6] Tables v1.12.0
julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 12 × Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 12 default, 0 interactive, 6 GC (on 12 virtual cores)
Environment:
  JULIA_LTS_PATH = /Applications/Julia-1.10.app/Contents/Resources/julia/bin/julia
  JULIA_PATH = /Applications/Julia-1.11.app/Contents/Resources/julia/bin/julia
  JULIA_EGLOT_PATH = /Applications/Julia-1.7.app/Contents/Resources/julia/bin/julia
  JULIA_NUM_THREADS = 12
  JULIA_NIGHTLY_PATH = /Applications/Julia-1.11.app/Contents/Resources/julia/bin/julia

@MilesCranmer
Copy link
Owner Author

Is this related to this PR or is it a different issue? Is this PR itself ok to merge though? I can work on that issue afterwards

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Dec 22, 2024

If I run your example I get

julia> @time fit!(mach, verbosity=0)
  3.370594 seconds (202.80 M allocations: 10.300 GiB, 22.58% gc time, 1.83% compilation time)

Doesn't seem too bad? SRRegressor doesn't change the default parameters if the dataset is small.

@MilesCranmer
Copy link
Owner Author

Regarding Tables.rowtable(X), is there a way I can test that and other types with MLJTestInterface? Currently the MLJTestInterface is passing so I guess there is a blindspot for types it should be compatible with.

@ablaom
Copy link

ablaom commented Dec 26, 2024

Regarding Tables.rowtable(X), is there a way I can test that and other types with MLJTestInterface? Currently the MLJTestInterface is passing so I guess there is a blindspot for types it should be compatible with.

MLJTestIntegration.jl provides more comprehensive testing, and a second signature for test that autogenerates some datasets - both column tables and row tables. It also has higher dependencies. However, as these datasets may not be sufficient for all models, we currently suggest packages create their own "comprehensive" collection of datasets and test these using MLJTestInterface.jl.test, which should suffice to test just the "interface".

However, to improve coverage, you could add MLJTestIntegration tests following the following example, to test with the auto-generated datasets

using MLJTestInterface
using SymbolicRegression
using MLJTestIntegration

MLJTestIntegration.test(
    M = SRRegressor;
    verbosity=0, # bump to debug
    throw=true,
    level=2,
)

Use level > 2 for more than just interface tests - up to level=4. Full docstring is below. I'm running this code with level = 4 in the MLJ integration tests for all models, and this is how I am finding your bugs.

(Some of the advanced tests require that models be truly reproducible with default hyperparameters. However, as I found this is not always the case, we check this manually before proceeding; for slow models, this can add a bit of test time.)

test(models, data...; mod=Main, level=2, throw=false, verbosity=1)
test(model; mod=Main, level=2, throw=false, verbosity=1)

The first signature above applies a battery of MLJ integration tests to a collection of
models, using data for training. Here mod should be the module from which test is
called (generally, mod=@__MODULE__ will work). Here models is either:

  1. A collection of model types implementing the MLJ model interface.

  2. A collection of named tuples, where each tuple includes :name and :package_name as keys, and whose corresponding values point to a model type appearing in the MLJ Model Registry. MLJ.models(...) always returns such a collection.

The second signature applies the same tests to a single model (of either type above) but
applies them using all MLJTestIntegration.jl datasets that might conceivable apply. If
none appear to apply, an exception is thrown.

Ordinarily, code defining the model types to be tested must already be
loaded into the module mod. An exception is described under "Testing
with automatic code loading" below.

The extent of testing is controlled by level:

level description tests (full list below)
1 test code loading :model_type
2 (default) basic test of model interface first four tests
3 comprehensive CPU1() all non-accelerated tests
4 comprehensive all tests

By default, exceptions caught in tests are not thrown. If
throw=true, testing will terminate at the first execption
encountered, after throwing that exception (useful to obtain stack
traces).

Return value

The first signature returns (failures, summary) where:

  • failures: table of exceptions thrown

  • summary: table summarizing the outcomes of each test, where
    outcomes are indicated as below:

entry interpretation
test succesful
× test unsuccessful
n/a skipped because not applicable
- test skipped for some other reason

In the special case of operations, an empty entry, "", indicates that there don't
appear to be any operations implemented.

The second signature returns a vector of exceptions encountered.

Testing with automatic code loading

World Age issues pose challenges for testing Julia code if some code
is to be loaded "on demand". Nevertheless, in case 2 mentioned above,
model types to be tested need not be loaded, provided testing is
carried out in two stages, as shown in the second example below. In
this case, however, the necessary model interface packages need
to be listed in the current Julia environment, and the test calls
must appear in global scope.

Examples

Testing models in a new MLJ model interface implementation

The following tests the model interface implemented by some model type
MyClassifier, as might appear in tests for a package providing that
type:

import MLJTestIntegration
using Test
X, y = MLJTestIntegration.MLJ.make_blobs()
failures, summary = MLJTestIntegration.test([MyClassifier, ], X, y, verbosity=1, mod=@__MODULE__)
@test isempty(failures)

Testing models after filtering models in the registry

The following applies comprehensive integration tests to all
regressors provided by the package GLM.jl appearing in the MLJ Model
Registry. Since GLM.jl models are provided through the interface
package MLJGLMInterface, this must be in the current environment:

Pkg.add("MLJGLMInterface")
import MLJBase, MLJTestIntegration
using DataFrames # to view summary
X, y = MLJTestIntegration.MLJ.make_regression();
regressors = MLJTestIntegration.MLJ.models(matching(X, y)) do m
    m.package_name == "GLM"
end

# to test code loading *and* load code:
MLJTestIntegration.test(regressors, X, y, verbosity=1, mod=@__MODULE__, level=1)

# comprehensive tests:
failures, summary =
    MLJTestIntegration.test(regressors, X, y, verbosity=1, mod=@__MODULE__, level=3)

summary |> DataFrame

List of tests

Tests are applied in sequence. When a test fails, subsequent tests for
that model are skipped. The following are applied to all models:

  • :model_type: Load model type using registry (if named tuples are
    provided) or using load_path(model_type) (if types are provided, to
    check load_path trait is correctly overloaded).

  • :model_instance: Create a default instance.

  • :fitted_machine: Bind instance to data in a machine and fit!. Call report and
    fitted_params on the machine.

  • :operations: Call implemented operations, such as predict and transform

These additional tests are applied to Supervised models:

  • :threshold_prediction: If the model is Probablisitic and
    scitype(data[2]) <: Finite{2} (binary classification) then wrap
    model using BinaryThresholdPredictor and fit!.

  • :evaluation: Assuming MLJ is able to infer a suitable measure
    (metric), evaluate the performance of the model using evaluate!
    and and cross-validation.

  • :accelerated_evaluation: Assuming the model appears to make
    repeatable predictions on retraining, repeat the :evaluation test
    using CPUThreads() acceleration and check agreement with CPU1() case.

  • :tuned_pipe_evaluation: Repeat the :evauation test but first
    insert model in a pipeline with a trivial pre-processing step
    (applies the identity transformation) and wrap in TunedModel (only
    the default instance is actually evaluated).

  • :ensemble_prediction: Wrap the mode as EnsembleModel, train, and
    attempt a predict call.

  • :iteration_prediction: If the model is iterable, repeat the
    :evaluation test but first wrap as an IteratedModel.

  • :stack_evaluation: test a Stack within a Stack, with the model
    being tested appearing at two levels, and evaluate the
    Stack. (Other base models and adjudicators in the double stack are
    instances of KNNClassifier or KNNRegressor.)
    This test is only applied to single target supervised models that
    are probabilistic classifiers or deterministic regressors.

  • :accelerated_stack_evaluation: If the model appears to make
    repeatable predictions on retraining, check consistency of
    evaluations for Stack(acceleration=CPU1(), ...) and
    Stack(acceleration=CPUThreads(), ...) (in the double stack above).

@ablaom
Copy link

ablaom commented Dec 26, 2024

I should add that, in our experience, it is rare to find models that pass the interface tests but fail the integrations tests.

@MilesCranmer
Copy link
Owner Author

Got it, thanks. I will likely add those integration tests in the future

(Some of the advanced tests require that models be truly reproducible with default hyperparameters. However, as I found this is not always the case, we check this manually before proceeding; for slow models, this can add a bit of test time.)

Just for reference, the SRRegressor can be made reproducible with the following setup:

model = SRRegressor(
    parallelism=:serial,
    deterministic=true,
    seed=0  # or whatever other seed
)

It's a bit slower but it does guarantee reproducibility. The default parallelism=:multithreading prevents determinism simply due to the non-deterministic nature of CPU thread scheduling.

Even things like SIMD operations mess with determinism because of things like

julia> (0.1 + 0.2) - 0.3
5.551115123125783e-17

julia> 0.1 + (0.2 - 0.3)
2.7755575615628914e-17

@ablaom
Copy link

ablaom commented Dec 26, 2024

julia> (0.1 + 0.2) - 0.3
5.551115123125783e-17

The test for reproducabiliity is not that fussy, it uses isapprox. But yeah, your default model still fails the repeatability test. I think this just rules out some multithreading options in things like cross-validation, but I forget the details for now.

@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Dec 26, 2024

Oh what I meant by (0.1 + 0.2) - 0.3 != 0.1 + (0.2 - 0.3) was just explaining internal reasons for why SymbolicRegression.jl isn't reproducible in multithreaded mode. But there's a lot of different reasons, in addition to summation order depending on the CPU job scheduler, for why this is true.

@MilesCranmer MilesCranmer force-pushed the narrow-mlj-scitype branch 2 times, most recently from 1be49f4 to 29ddf75 Compare December 26, 2024 23:06
@MilesCranmer
Copy link
Owner Author

Ok I merged #393 here

@MilesCranmer
Copy link
Owner Author

Going to merge now but let me know if there are any issues and we can make a new PR

@MilesCranmer MilesCranmer enabled auto-merge January 3, 2025 08:03
@MilesCranmer MilesCranmer disabled auto-merge January 3, 2025 08:03
@MilesCranmer MilesCranmer enabled auto-merge January 3, 2025 08:03
@MilesCranmer MilesCranmer merged commit 6368a95 into master Jan 3, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: [MLJ Interface] SRRegressor likely has too broad a target_scitype
3 participants