Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The "Time-reversed" Problem of Some Crawled Data #13

Open
MiracleXYZ opened this issue Feb 7, 2018 · 0 comments
Open

The "Time-reversed" Problem of Some Crawled Data #13

MiracleXYZ opened this issue Feb 7, 2018 · 0 comments

Comments

@MiracleXYZ
Copy link

When I was downloading historical data of SP500 from Yahoo Finance ^GSPC, I found that the data was time-reversed, i.e. the latest entries of data were put on the top of the DataFrame. This phenomenon also exists in nearly all of the data in the provided data archive (stock-data-lilianweng.tar.gz) except SP500.csv and _SP500.csv.
Now here is the point: we did not sort the data by time to ensure the basic requirement of LSTM model! In data_model.py, Line 25 to 35:

        # Read csv file
        raw_df = pd.read_csv(os.path.join("data", "%s.csv" % stock_sym))

        # Merge into one sequence
        if close_price_only:
            self.raw_seq = raw_df['Close'].tolist()
        else:
            self.raw_seq = [price for tup in raw_df[['Open', 'Close']].values for price in tup]

        self.raw_seq = np.array(self.raw_seq)
        self.train_X, self.train_y, self.test_X, self.test_y = self._prepare_data(self.raw_seq)

We simply extracted the close prices out of the DataFrame without checking the time. Therefore we were using the earliest 10% for test instead of the latest 10%, which is unreasonable.

Maybe we should sort the data by time before extracting the closing prices, or make sure our data is read in a right order / a consistent format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant