Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get data #36

Open
grst opened this issue May 18, 2024 · 5 comments
Open

How to get data #36

grst opened this issue May 18, 2024 · 5 comments

Comments

@grst
Copy link
Collaborator

grst commented May 18, 2024

In the "Getting started" section you describe two parameters pointing to data directories:

    data_dir_bulk = "/user/benchmarking/datasets/bulks"
    data_dir_sc = "/user/benchmarking/datasets/single_cell"

Could you include a description where to get these data from?

alex-d13 added a commit that referenced this issue May 24, 2024
@grst
Copy link
Collaborator Author

grst commented May 24, 2024

Why not just include the count matrices in figshare as well? Retrieving and preprocessing them from the original data sources is a huge hurdle for reproducing/extending your analysis.

@alex-d13
Copy link
Collaborator

Will check this with the others, but I am not sure if we are allowed to upload all of the datasets again on our own? I think we agreed to just upload our annotations.

@alex-d13 alex-d13 reopened this May 24, 2024
@grst
Copy link
Collaborator Author

grst commented May 24, 2024

Unless you are working with some protected-access datasets such as dbGAP I don't see any issues.

@alex-d13
Copy link
Collaborator

Would we need some kind of license for these, for example the lung cancer dataset?

@grst
Copy link
Collaborator Author

grst commented May 26, 2024

Tbh I don't know what's the license of a dataset published on e.g. GEO, Array express etc. But you can obviously use it to create derivative works in publications, and I don't see why a preprocessed dataset wouldn't count as such a derivative work. At least I did it in the past and nobody complained.

Of course in cases of protected access datasets (such as dbGAP) this is not possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants