You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The FFT should not be considered a proper feature.
The FFT is constructed from the whole dataset where the earlier values would be affected by the future data.
For example, If you remove the last row of the price data, the whole FFT values will be different.
If the Author can achieve good accuracy, it is mainly based on data leakage.
The whole project can become invalid just because of such data leakage. Every step after will be GIGO.
Even a decent MLP Model can have a good result with such data leakage.
A possible solution:
df= ... # The Price Dataperiods= [3, 6, 9]
index_data= []
forpinperiods:
data[f'abs_{p}'] = []
data[f'angle_{p}'] = []
# Calculate the FFT only to the latest row# Caution: The range(1, len(df)) should be changed as the early data will be useless with such small data to calculate the FFT value.foriinrange(1, len(df)):
window=df[:i]['close']
index_data.append(df.index[i])
fft_close=np.fft.fft(window.values)
absolute=np.abs(fft_close)
angle=np.angle(absolute)
forpinperiods:
fft_list=np.copy(fft_close)
fft_list[p:-p] =0final_fft=np.fft.ifft(fft_list)
absolute=np.abs(final_fft)[-1]
angle=np.angle(final_fft)[-1]
data[f'abs_{p}'].append(absolute)
data[f'angle_{p}'].append(angle)
In such a case, you will notice the huge difference, which WILL NOT capture the same movement from the Author's FIGURE. This proves the project result performance is based on data leakage.
Caution: Separating the training and testing data before using the Author's original FFT feature will still cause data leakage. The problem is FFT can only be calculated at 'seen' data. Otherwise, it will use the whole dataset to calculate the FFT value.
The text was updated successfully, but these errors were encountered:
nova-land
changed the title
Feature Extraction Bug: FFT Data Leakage
Feature Extraction Bug: FFT Data Leakage causing Fake Result
May 12, 2023
The FFT should not be considered a proper feature.
The FFT is constructed from the whole dataset where the earlier values would be affected by the future data.
For example, If you remove the last row of the price data, the whole FFT values will be different.
If the Author can achieve good accuracy, it is mainly based on data leakage.
The following code will contain data leakage.
The original section: Link
The whole project can become invalid just because of such data leakage. Every step after will be GIGO.
Even a decent MLP Model can have a good result with such data leakage.
A possible solution:
In such a case, you will notice the huge difference, which WILL NOT capture the same movement from the Author's FIGURE. This proves the project result performance is based on data leakage.
The text was updated successfully, but these errors were encountered: