Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add metaexample for CHAIDTree Regression #5065

Merged
merged 11 commits into from
Jun 25, 2020
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion data
22 changes: 11 additions & 11 deletions doc/ipython-notebooks/multiclass/Tree/DecisionTrees.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1405,9 +1405,9 @@
"source": [
"def train_chaidtree(dependent_var_type,feature_types,num_bins,feats,labels):\n",
" # create CHAID tree object\n",
" c = sg.create_machine(\"CHAIDTree\", m_dependent_vartype=dependent_var_type,\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot some, I remember seeing one name in a "get" call...double check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to this:

" tree = sg.create_machine(\"C45ClassifierTree\", labels=labels, m_nominal=types)\n",

I don't think we should change this right? As it isn't related to CHAID tree? As then we will have to make some other change in the source code related to C45ClassifierTree

Also, the other one is :

" output_certainty=tree.get('m_certainty')\n",

This one too, is an instance of C45ClassifierTree. Should I change these two instances too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry. of course you are right!
We will see it in the CI for the notebooks

" m_feature_types=feature_types,\n",
" m_num_breakpoints=num_bins,\n",
" c = sg.create_machine(\"CHAIDTree\", dependent_vartype=dependent_var_type,\n",
" feature_types=feature_types,\n",
" num_breakpoints=num_bins,\n",
" labels=labels)\n",
" # train using training features\n",
" c.train(feats)\n",
Expand Down Expand Up @@ -1560,9 +1560,9 @@
"feature_types = np.array([2 for i in range(13)],dtype=np.int32) \n",
"\n",
"# setup CHAID tree - dependent variable is nominal(0), feature types set, number of bins(20)\n",
"chaid = sg.create_machine(\"CHAIDTree\", m_dependent_vartype=0,\n",
" m_feature_types=feature_types,\n",
" m_num_breakpoints=20)"
"chaid = sg.create_machine(\"CHAIDTree\", dependent_vartype=0,\n",
" feature_types=feature_types,\n",
" num_breakpoints=20)"
]
},
{
Expand Down Expand Up @@ -1644,10 +1644,10 @@
" feature_types[9]=1 \n",
"\n",
" # setup CHAID-tree\n",
" chaid = sg.create_machine(\"CHAIDTree\", m_dependent_vartype=2,\n",
" m_feature_types=feature_types,\n",
" m_num_breakpoints=10,\n",
" m_max_tree_depth=10)\n",
" chaid = sg.create_machine(\"CHAIDTree\", dependent_vartype=2,\n",
" feature_types=feature_types,\n",
" num_breakpoints=10,\n",
" max_tree_depth=10)\n",
"\n",
" # set evaluation criteria - mean squared error\n",
" accuracy = sg.create_evaluation(\"MeanSquaredError\")\n",
Expand Down Expand Up @@ -1727,4 +1727,4 @@
},
"nbformat": 4,
"nbformat_minor": 1
}
}
3 changes: 1 addition & 2 deletions examples/meta/src/regression/cartree.sg.in
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ ft[0] = False
#![set_attribute_types]

#![create_machine]
Machine cartree = create_machine("CARTree", nominal=ft, mode=enum EProblemType.PT_REGRESSION, folds=5, apply_cv_pruning=True, seed=1)
cartree.set_labels(labels_train)
Machine cartree = create_machine("CARTree", labels=labels_train, nominal=ft, mode=enum EProblemType.PT_REGRESSION, folds=5, apply_cv_pruning=True, seed=1)
#![create_machine]

#![train_and_apply]
Expand Down
27 changes: 27 additions & 0 deletions examples/meta/src/regression/chaid_tree.sg.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
File f_feats_train = read_csv("@SHOGUN_DATA@/regression_1d_linear_features_train.dat")
File f_feats_test = read_csv("@SHOGUN_DATA@/regression_1d_linear_features_test.dat")
File f_labels_train = read_csv("@SHOGUN_DATA@/regression_1d_linear_labels_train.dat")

#![create_features]
Features feats_train = create_features(f_feats_train)
Features feats_test = create_features(f_feats_test)
Labels labels_train = create_labels(f_labels_train)
#![create_features]

#![set_feature_types]
IntVector ft(1)
ft[0] = 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlnapf this is causing issues. In octave this becomes a scalar value :( I wrote a fix for this but it's in another branch, not yet merged.. Also this throws an error in the meta example, but ctest doesn't pick this up (I had this issue before) and I am not sure why. The test only fails when comparing the serialised outputs in the integration test, because this will not have serialised anything because of the exception thrown when you put ft

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, unless we merge your fix this meta example will fail only in Octave, right? Could it be possible to merge this PR anyway, but somehow excluding it from testing with Octave (since it is broken atm)? Just to not have to put this on hold indefinitely...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @Hephaestus12 can just fix it here? All you need to do is replace

#ifdef SWIGR

with #if defined(SWIGR) || defined(SWIGOCTAVE)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this works :D

#![set_feature_types]

#![create_machine]
Machine chaidtree = create_machine("CHAIDTree", labels=labels_train, dependent_vartype=2, feature_types=ft, num_breakpoints=50)
#![create_machine]

#![train_and_apply]
chaidtree.train(feats_train)
Labels labels_predict = chaidtree.apply(feats_test)
#![train_and_apply]

#![extract_weights_labels]
RealVector labels_vector = labels_predict.get_real_vector("labels")
#![extract_weights_labels]
44 changes: 0 additions & 44 deletions examples/undocumented/python/regression_chaidtree.py

This file was deleted.

51 changes: 0 additions & 51 deletions examples/undocumented/python/regression_randomforest.py

This file was deleted.

20 changes: 10 additions & 10 deletions src/shogun/multiclass/tree/CHAIDTree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1392,14 +1392,14 @@ void CHAIDTree::init()
m_cont_breakpoints=SGMatrix<float64_t>();
m_num_breakpoints=0;

SG_ADD(&m_weights,"m_weights", "weights", ParameterProperties::READONLY);
SG_ADD(&m_weights_set,"m_weights_set", "weights set", ParameterProperties::READONLY);
SG_ADD(&m_feature_types,"m_feature_types", "feature types", ParameterProperties::SETTING);
SG_ADD(&m_dependent_vartype,"m_dependent_vartype", "dependent variable type", ParameterProperties::SETTING);
SG_ADD(&m_max_tree_depth,"m_max_tree_depth", "max tree depth", ParameterProperties::HYPER);
SG_ADD(&m_min_node_size,"m_min_node_size", "min node size", ParameterProperties::SETTING);
SG_ADD(&m_alpha_merge,"m_alpha_merge", "alpha-merge", ParameterProperties::HYPER);
SG_ADD(&m_alpha_split,"m_alpha_split", "alpha-split", ParameterProperties::HYPER);
SG_ADD(&m_cont_breakpoints,"m_cont_breakpoints", "breakpoints in continuous attributes", ParameterProperties::SETTING);
SG_ADD(&m_num_breakpoints,"m_num_breakpoints", "number of breakpoints", ParameterProperties::HYPER);
SG_ADD(&m_weights,"weights", "weights", ParameterProperties::READONLY);
SG_ADD(&m_weights_set,"weights_set", "weights set", ParameterProperties::READONLY);
SG_ADD(&m_feature_types,"feature_types", "feature types", ParameterProperties::SETTING);
SG_ADD(&m_dependent_vartype,"dependent_vartype", "dependent variable type", ParameterProperties::SETTING);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes the notebooks to fail:
https://dev.azure.com/shogunml/shogun/_build/results?buildId=3629&view=logs&j=089c709a-44eb-5f6e-96e7-15e9ee1ff5bf&t=2da3e16b-a2b2-5f01-2cbe-a20d9528195b&l=1849

Should be simple to fix: open the notebook and edit the name in there as well :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I run test the notebooks on my local machine? Does make test do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a script for doing that https://github.com/shogun-toolbox/shogun/blob/develop/scripts/test_notebooks.sh.

The link Heiko pasted here above will show also how to use it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesnt. You would have to compile shogun with the python interface, make sure you can load it from python, and then open the notebook in jupyter notebook.

However, you might be able to do a simple hack here:

  1. Open the notebook in a texteditor
  2. Search for m_
  3. If it is one of the varnames, change it to the values you updated them to
  4. Save the file in the texteditor and submit

As this is such a simple change, that should do it without the need for you running it locally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have already pushed the code after making those changes. :)

SG_ADD(&m_max_tree_depth,"max_tree_depth", "max tree depth", ParameterProperties::HYPER);
SG_ADD(&m_min_node_size,"min_node_size", "min node size", ParameterProperties::SETTING);
SG_ADD(&m_alpha_merge,"alpha_merge", "alpha-merge", ParameterProperties::HYPER);
SG_ADD(&m_alpha_split,"alpha_split", "alpha-split", ParameterProperties::HYPER);
SG_ADD(&m_cont_breakpoints,"cont_breakpoints", "breakpoints in continuous attributes", ParameterProperties::SETTING);
SG_ADD(&m_num_breakpoints,"num_breakpoints", "number of breakpoints", ParameterProperties::HYPER);
}