Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sounds overlap between test set and train set #5

Open
amiryanj opened this issue Oct 17, 2018 · 2 comments
Open

Sounds overlap between test set and train set #5

amiryanj opened this issue Oct 17, 2018 · 2 comments

Comments

@amiryanj
Copy link

The train and set set has completely overlap

In file create_dataset.py, lines 118-120 the frame indices are shuffled and form the train_idx and test_idx.
Let's assume frames 10 is in test_idx and frame 9 is in train_idx.
Then when you call get_samples_from_frame(f_i=9), it would use the information of frame 10 to generate the samples (lines 81:82).
Then let's say the test data almost perfectly exist in train_set as well!!!
Is there something I misunderstood?

@amiryanj amiryanj changed the title Sounds overlap in the test set and train set Sounds overlap between test set and train set Oct 17, 2018
@piaozhx
Copy link
Member

piaozhx commented Oct 18, 2018

Actually, you are right, It's an old version of the function create_GC_train_test_data, which is a mistake when we release create_dataset.py.

an easy solution is:

# wrong! it will cause overlap  between test set and train set
# random_idx = random.sample(range(frame_num), frame_num)

random_idx = range(frame_num)

@amiryanj
Copy link
Author

Yes I fixed that mistake but it seems that results will converge somewhere else.
I run the algorithm for 350 epochs (10,000 is too much!) and the error does not come under 0.017.
cidnn_result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants