Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what dose the parameters "df" really mean in NeuralForecast.predict() method? #1218

Closed
eotich32 opened this issue Nov 27, 2024 · 3 comments
Closed

Comments

@eotich32
Copy link

eotich32 commented Nov 27, 2024

Dose it mean any historical time steps outside the training set used to predict the future?
When I fit the model with h=5 and input_size=20, what do the df with len(df)=10,len(df)=20 and len(df)=30 mean respectively when passing them during prediction?
I found that when len(df)=20 and len(df)=30, the predictions are the same, it seemed that df[20:30] was not used; but when I passed a df with len(df)=10,the prediction was completely wrong
So I am so confused...
Thank you for help

@marcopeix
Copy link
Contributor

Hello! In the predict method, passing a df means that you want to use that as input data. So, if you don't pass anything, the predictions will start right after the training set you used to fit, and the input sequence will come from the training set.

By passing a df to predict, you specify that you want this data to be used as the input sequence, and so the predictions will start after this data.

So, suppose you fit on data that ends on "2024-01-01" (YYYY-MM-DD) and you have a daily frequency. Then, calling predict() will make predictions starting on "2024-01-02". Otherwise, if you do predict(df) and the input dataframe ends on "2024-02-01", then predictions will start on "2024-02-02".

In your case, you specify input_size=20. Therefore, the model needs 20 values to make the next 5 predictions. Therefore, if you pass a df of 20 values or more, it will always use the last 20 as input to make predictions.

However, if you pass less than 20, then padding is done and the performance degrades, because the model needs 20 values to make predictions.

I hope this helps, let me know if it answers your question!

@eotich32
Copy link
Author

Hello! In the predict method, passing a df means that you want to use that as input data. So, if you don't pass anything, the predictions will start right after the training set you used to fit, and the input sequence will come from the training set.

By passing a df to predict, you specify that you want this data to be used as the input sequence, and so the predictions will start after this data.

So, suppose you fit on data that ends on "2024-01-01" (YYYY-MM-DD) and you have a daily frequency. Then, calling predict() will make predictions starting on "2024-01-02". Otherwise, if you do predict(df) and the input dataframe ends on "2024-02-01", then predictions will start on "2024-02-02".

In your case, you specify input_size=20. Therefore, the model needs 20 values to make the next 5 predictions. Therefore, if you pass a df of 20 values or more, it will always use the last 20 as input to make predictions.

However, if you pass less than 20, then padding is done and the performance degrades, because the model needs 20 values to make predictions.

I hope this helps, let me know if it answers your question!

Your answer is really helpful for me, I fully understand it now. Thanks a lot!

@marcopeix
Copy link
Contributor

No problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants