-
Notifications
You must be signed in to change notification settings - Fork 441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tool request: Fast/Ortho/Finder #1168
Comments
You may also want to consider OrthoFinder, it is believed to be more accurate than OrthoMCL/FastOrtho. |
For our project, we should build a little benchmark with those 3 tools and an old homemade script we plan to replace. |
@lecorguille OrthoFinder : I'm on it. |
@Mataivic We've already implemented a basic but working wrapper at https://github.com/NetBiol/hackathon2017/blob/master/galaxy-tools/orthofinder.xml . I can open a pull request here and you can work from that maybe? |
@nsoranzo At first glance, it looks like there is the possibility that your tools here https://github.com/NetBiol/hackathon2017/tree/master/galaxy-tools may potentially complement my tools here https://github.com/gregvonkuster/galaxy_tools/tree/master/tools/plant_tribes. I'll keep an eye on your work. ;) |
@lecorguille Cool, no problem! Planning to submit it here? |
Yes! This tool should integrate a bigger workflow developed in a dedicated GitHub repository. |
Hello, I have been away for two weeks and started to really work on the wrapper this week ; here is what I've done until now : https://github.com/abims-sbr/tools-iuc/tree/orthofinder/tools/orthofinder A whole bunch of tool options are implemented, but this draft does not deal with incompatible options. Issues remain about dataset collections :
|
@Mataivic I have built several Galaxy wrappers, available here https://github.com/gregvonkuster/galaxy_tools/tree/master/tools/plant_tribes, for the PlantTribes analysis pipelines here https://github.com/dePamphilis/PlantTribes. These tools are doing similar things to yours, so hopefully our work will be complementary. I have dealt with the same issues you face with regard to outputs - my tools also produce directories of files, But I don't define the outputs as dataset collections because in most cases, my outputs consist of multiple Galaxy datatypes, and dataset collections assume 1 datatype. Also, in some cases my tools produce multiple directory levels on output, and I'm not sure if/how dataset collections would handle directory hierarchies like this. I've taken the approach of defining new Galaxy datatypes for these tools which are subclasses of the HTML datatype - they are in this PR: galaxyproject/galaxy#3999. These datatypes allow for the directories of files to be placed in the primary dataset's extra_files_path. The primary dataset is rendered with these directories of files as clickable items. Multiple levels of directories can be browsed as well. Here is an example - the center panel is rendered when the These tools form a workflow that typically proceeds in this order:
Tools downstream from those that produce these directories of files are written to consume them as inputs. The "end-point" tools in the workflow produce typical Galaxy-like datatypes so that general Galaxy features work on them. For example, the GeneFamiltPhylogenyBuilder tool produces a dataset collection of The elements of this dataset collection can then be rendered with the recently introduced Phylocanvas chart. Although these tools wrap the PlantTribes analysis pipelines, they can be used to perform this same analysis on any genome. It would be great if you find these datatypes useful for your tools as well as it would help define a more standard approach for handling these directories of files. |
@gregvonkuster Thank you, I'll have a look at it - I don't know really how to deal with Galaxy datatypes yet but I'll do my best to learn that quickly -. What do you mean by "dataset collections assume 1 datatype" ? My outputs collections contains files with various file extension (.txt, .csv and .faa) so I guess it means something else than files extensions ?. |
Here is a good explanation https://galaxyproject.org/learn/datatypes/. If you choose to use my datatypes for your tool, you will just define your tool outputs to use one of those datatypes using the
In the tool outputs section, dataset collections are defined with a format as well, something like this:
The
Galaxy "loosely" uses file extensions to categorize Galaxy datatypes, with each Galaxy datatype class having a file extension via the Can you provide some details about your outputs? It will help me to possibly be able to tell you which of my existing datatypes can be used or whether you will need an additional datatype. |
@gregvonkuster Thank you for the details !
Well, The outputs are several dataset collections, which correspond to the output files of different steps of the tool :
Should I consider to split each collection ? Each collection would contain a single datatype, but I guess it would make a lot of outputs... |
From this description, it sounds like you have a single tool that produces outputs at multiple steps, which implies that steps following an output step will consume the output, do further processing, and produce more outputs. If this is the case, perhaps your tool should be split into multiple tools?
Except for your third item, it looks like your directories of files are fairly easy to handle. I'm not quite sure of a best approach since I don't have the context about the analyses your tool is attempting to perform. Do tools (or tool processing] steps) that consume outputs assume the files are all in the same directory? My tools do. If so, you can probably still use dataset collections, but you'll need to account of a couple of important items.
The other approach would be to use one of the new datatypes I've created in the PR discussed above or add a new one yourself. My datatypes categorize the data in this way.
If any of your outputs consist of datasets that are defined by any of those descriptions, that datatype could be used for your output. Or you could define a new datatype if needed. A very important caveat regarding this approach is that these datatypes cannot currently be tested with the travis test environment defined for this tools-iuc repository. The current Galaxy functional test framework does not accommodate datatypes that represent dynamic numbers of files of multiple datatypes, contained within directory hierarchies. In fact, I'm still working to get some functional tests built for several of my tools that use these datatypes. My approach for this is to incorporate Galaxy workflows for testing the tools. I have taken a look at this project https://github.com/phnmnl/wft4galaxy, but ran into this issue phnmnl/wft4galaxy#2, so I haven't pursued it. Instead, I'm trying to use planemo for testing the workflows, but I have yet to get this approach working.
Based on the testing issues I've discussed above, this may be your best approach. But I only see this working for your first 2 items. I don't see how it will work for your third item which is a hierarchy of directories of files of multiple datatypes. I'm not sure dataset collections will work for this (ping @jmchilton). For your first 2 items, if you use dataset collections, you'll only need 2 I think, so there won't be "a lot of outputs", but only 2 collections. Of course, the number of elements in each collection may be very large, but that's ok. |
Orthofinder is in IUC and up-to-date now |
In live from Gitter
It seems that this tools is needed by different groups.
I propose to put Victor (@Mataivic), a student, on this subject.
But @nsoranzo suggested to integrate it in "a small hackathon coming up next week at my institute"
So we can take the relay afterwards or at least test the output.
Also involved in the thread: @abretaud @pvanheus
The text was updated successfully, but these errors were encountered: