Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small refactor #14

Merged
merged 16 commits into from
May 17, 2024
Merged

Small refactor #14

merged 16 commits into from
May 17, 2024

Conversation

diegomarvid
Copy link
Collaborator

@diegomarvid diegomarvid commented May 4, 2024

Generate

  • Enable DataFrame for prediction, this allows to infer with real-time DataFrames without the need to have CSVs or Parquets. We can now do pipeline.predict(df).

Calculate Features

  • Automatically ignore datetime_colums for training

Fit Model

  • Merge with PredictStep
  • Rename to ModelStep
  • Rename model_params to model_parameters

Predict

  • Delete step, now it's unnecessary

Model Registry

  • Rename XGBoostModel to XGBoost
  • Improve ModelClassNotFoundError error logging

README.md

  • Update readme with new configuration examples

New configuration example

{
    "pipeline": {
        "name": "XGBoostTrainingPipeline",
        "description": "Training pipeline for XGBoost models.",
        "parameters": {
            "save_data_path": "test.pkl",
            "target": "target"
        },
        "steps": [
            {
                "step_type": "GenerateStep",
                "parameters": {
                    "train_path": "examples/ocf/data/dummy.csv",
                    "test_path": "examples/ocf/data/test.csv",
                    "predict_path": "examples/ocf/data/predict.csv"
                }
            },
            {
                "step_type": "TabularSplitStep",
                "parameters": {
                    "train_percentage": 0.95
                }
            },
            {
                "step_type": "CleanStep"
            },
            {
                "step_type": "CalculateFeaturesStep",
                "parameters": {
                    "datetime_columns": "date",
                    "features": [
                        "year",
                        "month",
                        "day"
                    ]
                }
            },
            {
                "step_type": "EncodeStep"
            },
            {
                "step_type": "ModelStep",
                "parameters": {
                    "model_class": "XGBoost",
                    "model_parameters": {
                        "n_estimators": 3,
                        "max_depth": 3
                    }
                }
            },
            {
                "step_type": "CalculateMetricsStep"
            }
        ]
    }
}

@diegomarvid diegomarvid self-assigned this May 4, 2024
@diegomarvid diegomarvid changed the base branch from main to refactor May 4, 2024 00:25
@diegomarvid diegomarvid requested a review from Ludecan May 5, 2024 00:37
@diegomarvid diegomarvid changed the base branch from refactor to experiments May 16, 2024 15:09
Base automatically changed from experiments to main May 16, 2024 15:42
Copy link
Collaborator

@ovejabu ovejabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor! LGTM

@diegomarvid diegomarvid merged commit c20ec7a into main May 17, 2024
1 check passed
@diegomarvid diegomarvid deleted the small-refactor branch May 17, 2024 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants