Ytk-learn is built using Apache Maven, run:
sh tool/package.sh
ytk-learn.zip
package is created in the "target" directory. You can put it anywhere you want to run ,and run unzip ytk-learn.zip
, then you will see the following lists of directorys below in ytk-learn:
bin
: running scripts including training scrips, data format converting scripts, offline prediction script.config
: configuration files including log4j and model configurations.log
: when you run some scripts, corresponding logs will be generated in this directory.demo
: several demos of each model in this directory.
ytk-learn.zip is avaliable for downloading.
- single machine
- commom cluster
- spark cluster(Yarn)
- hadoop cluster(Yarn)
- easily extended to other computation platforms
Linux | Mac OS | Windows(cygwin) | |
---|---|---|---|
single machine | local_optimizer.sh | local_optimizer.sh | win_local_optimizer.bat/cygwin_local_optimizer.sh |
common cluster | cluster_optimizer.sh | no | no |
spark cluster | spark_optimizer.sh | no | no |
hadoop | hadoop_optimizer.sh | no | no |
Ytk-learn uses master-slave communication mode based on ytk-mp4j. Master node is responsible for the coordination of slave nodes. Slave nodes are real workers. If you want to run more than one training task in the same host at the same time, different tasks must have different master ports.
The properties of thread number, master host, master port, slave hosts, model name, configuration path, data transformation and runing commands can be set in these scripts.
Double click the win_local_optimizer.bat to start training on windows.
- local disk
- hdfs
- easily extended to other storage systems
single | cluster | spark cluster | hadoop cluster | |
---|---|---|---|---|
local filesystem | yes | yes | no | no |
hdfs filesystem | yes | yes | yes | yes |
Most optimization methods use data parallel in cluster training, then ways of reasonably splitting train/test data are very important. Fortunately, in Spark/Hadoop cluster, the train data has been splitted uniformly in each executor/reducer, but in common cluster, the way of splitting the data must be assigned. If you have not assigned your data(assigned : false), there are two options to choose, "lines_avg" and "files_vg".
The configurations for our models mainly consist of four parts: data, model, feature and optimization.
-
Data: data-related configuration, such as training data path, testing data path and data format.
-
Model: model-related configuration, such as model path, user-provided feature dict path and whether to continue training.
-
Feature: feature-processing-related configuration.
-
Optimization: training-related configuration, such as hyper parameters.
Logs in ytk-learn are very useful. You can monitor task procedure, see importance information such as evaluation results and find detailed error information when the program is not running as you expected.
After starting up training, you can use tail -f log/master.log
to watch process, most errors and exceptions are printed in this log file. If training is blocked or nothing about error or exception can be found in master.log
, you must to check slave.log
or slave_error.log
, if there is a ConnectionException
, you can try to set master_host
to be 127.0.0.1
. In the spark/hadoop yarn, you can use yarn logs -applicationId your_application_id
command to get slave's logs.
log file | details | relevant scripts |
---|---|---|
log/master.log | most logs are saved in this file including master startup logs, slave connecting logs, slave reported logs(info, error, exception). | all training shell scripts |
log/master_error.log | master error logs, including most slave error logs | all training shell scripts |
log/master_debug.log | master debug logs, including most slave debug logs | all training shell scripts |
log/slave.log | slave local logs(most slave logs are sent to master) | local_optimizer.sh/cluster_optimizer.sh |
log/slave_error.log | slave local error logs(most slave error logs are sent to master) | local_optimizer.sh/cluster_optimizer.sh |
log/slave_debug.log | slave local debug logs(most slave debug logs are sent to master) | local_optimizer.sh/cluster_optimizer.sh |
log/info.log | other info logs | predict.sh/libsvm_convert_2_ytklearn.sh |
log/error.log | other error logs | predict.sh/libsvm_convert_2_ytklearn.sh |
log/debug.log | other debug logs | predict.sh/libsvm_convert_2_ytklearn.sh |
log/yarn_${master_port}.log | yarn job logs(spark, hadoop) | spark_optimizer.sh/hadoop_optimizer.sh |
tail -f log/master.log | grep "train loss"
: check train losstail -f log/master.log | grep "test loss"
: check test losstail -f log/master.log | grep "auc\|rmse\|confusion_matrix\|mae"
: check metrics.