Somatic update #156

eudesbarbosa · 2022-06-21T13:20:24Z

eudesbarbosa · 2022-06-28T08:37:45Z

For 'Unify output filenames for Control Freec and CopyWriter':
one should also consider unifying cnvetti_on_target_postprocess output. Currently using _ to separate the file extension:

output/bwa.cnvetti_on_target_postprocess.P00{i}-T{t}-DNA1-WGS1/out/bwa.cnvetti_on_target_postprocess.P00{i}-T{t}-DNA1-WGS1_{ext}

Line 290 - method CnvettiStepPartBase._get_output_files_postprocess()

eudesbarbosa · 2022-06-28T09:29:34Z

I don't understand the changes to snappy_wrappers/wrappers/control_freec/transform/snappy-convert-control_freec.R:
-> They seem to be the reverse of the changes to the workflow.

--                                   ratios_fn=paste("freec.", sample_name, ".ratio.txt", sep=""),
--                                   log2_fn=paste("freec.", sample_name, ".log2.txt", sep=""),
--                                   call_fn=paste("freec.", sample_name, ".call.txt", sep=""),
--                                   segments_fn=paste("freec.", sample_name, ".segments.txt", sep=""),
+-                                   ratios_fn=paste("freec.", sample_name, "_ratio.txt", sep=""),
+-                                   log2_fn=paste("freec.", sample_name, "_gene_log2.txt", sep=""),
+-                                   call_fn=paste("freec.", sample_name, "_gene_call.txt", sep=""),
+-                                   segments_fn=paste("freec.", sample_name, "_segments.txt", sep=""),

For the function postProcess in snappy_wrappers/wrappers/copywriter/call/snappy-copywriter-call.R:
-> Can't the outputs be an argument, similar to control_freec_write_files, so they don't need to be agnostic towards the rest of the code?
Example:

--                         paste( mapper, ".copywriter.", fullID, "_segments.txt", sep="" ) ),
+-                         paste( mapper, ".copywriter.", fullID, ".segments.txt", sep="" ) ),

eudesbarbosa · 2022-06-28T09:42:23Z

Another points about control_freec_write_files, it seems to have default values but the code will stop if files are not accessible. Are defaults really necessary? Can't they just be defined/taken from the wrapper?

ericblanc20 · 2022-06-28T10:04:11Z

I am afraid I have created a monster here. I should have stuck to a naming convention for all CNV steps:

copywriter in somatic_targeted_seq_cnv_calling,
CNVkit in somatic_targeted_seq_cnv_calling & somatic_wgs_cnv_calling,
Control-FREEC in somatic_wgs_cnv_calling
cnvetti (germline)

You are welcome to unify the naming convention.

On a side note, I found that output filenames are configured differently in different places,even within the same step & tool (for example names are constructed with underscore and with dot).

I don't know which is the right way to go. Ideally, I would like to:

merge the somatic_targeted_seq_cnv_calling with the somatic_wgs_cnv_calling (there is no logical need to keep them apart, as the WES & WGS somatic snvs & indel calling are grouped in a single step)
unify using the dot as separator in filenames, the same way that the tools are separated.

This is probably a lot of unnecessary work, but perhaps we should consider moving to dot separation across the somatic CNV steps.

ericblanc20 · 2022-06-28T10:14:21Z

Regarding the control_freec_write_files, I assume you are referring to the Bioconductor packages org.Hs.eg.db, TxDb.Hsapiens.UCSC.hg19.knownGene & BSgenome.Hsapiens.1000genomes.hs37d5.

This is a real problem, because these packages are huge (genome sequence, genome feature annotations, gene ids & functional annotations), and they are only valid for one genome release. If we want to use GRCh38, or mouse data, then we need to use other version of these packages.

The problem comes as the packages are downloaded using the wrapper's conda environment. So the only solution (at the moment) is to put in the conda environment annotation packages for several genome releases
and species.

We discussed with Clemens a way around this problem. We would have an initial sub-step which creates a sub-directory in $PWD/work in which required packages are installed. This would then be used as supplementary library path for the R scripts, provided by the wrapper. I have a pretty good idea how we could do that, but I need a few days to work on it.

eudesbarbosa · 2022-06-29T09:21:23Z

Name pattern: Probably a good idea to stick to one across all workflows. I think I went over all the ones present in the original PR, but it might be something to gradually fix.

Workflow distinction: I suggest we keep the separation. Too much work for very little value.

control_freec_write_files: Way beyond my knowledge of the topic. I was just referring to the fact that there seems to be some default values for the output arguments - in this case you needed to modify them, you might consider have no defaults and avoid that:

--                                   ratios_fn=paste("freec.", sample_name, ".ratio.txt", sep=""),
+-                                   ratios_fn=paste("freec.", sample_name, "_ratio.txt", sep=""),

ericblanc20 · 2022-06-29T10:04:34Z

Sorry I misunderstood your comment. You are absolutely right: ratios_fn should be obtained from the wrapper.

eudesbarbosa assigned messersc and ericblanc20 Jun 21, 2022

eudesbarbosa linked a pull request Jun 28, 2022 that will close this issue

Somatic Update #159

Draft

eudesbarbosa mentioned this issue Jun 29, 2022

Refactor workflow cbioportal_export #129

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Somatic update #156

Somatic update #156

eudesbarbosa commented Jun 21, 2022 •

edited

Loading

eudesbarbosa commented Jun 28, 2022

eudesbarbosa commented Jun 28, 2022

eudesbarbosa commented Jun 28, 2022

ericblanc20 commented Jun 28, 2022

ericblanc20 commented Jun 28, 2022 •

edited

Loading

eudesbarbosa commented Jun 29, 2022

ericblanc20 commented Jun 29, 2022

Somatic update #156

Somatic update #156

Comments

eudesbarbosa commented Jun 21, 2022 • edited Loading

eudesbarbosa commented Jun 28, 2022

eudesbarbosa commented Jun 28, 2022

eudesbarbosa commented Jun 28, 2022

ericblanc20 commented Jun 28, 2022

ericblanc20 commented Jun 28, 2022 • edited Loading

eudesbarbosa commented Jun 29, 2022

ericblanc20 commented Jun 29, 2022

eudesbarbosa commented Jun 21, 2022 •

edited

Loading

ericblanc20 commented Jun 28, 2022 •

edited

Loading