Skip to content

R Package - Workflow Designer for Non-target Screening and Advanced Data Analysis

License

Notifications You must be signed in to change notification settings

odea-project/StreamFind

Repository files navigation

<style> p { text-align: justify; } </style>

Lifecycle: experimental

Logo

StreamFind R package is developed within the project “Flexible data analysis and workflow designer to identify chemicals in the water cycle” funded by the German Federal Ministry of Education and Research (BMBF). The development is carried out by the Institut für Umwelt & Energie, Technik & Analytik e. V. (IUTA), the Forschungszentrum Informatik (FZI) and supporting partners. StreamFind is intended to be a platform for assembling processing workflows for different types of data (e.g. mass spectrometry and spectroscopy) with applications in different fields (e.g. environmental and quality studies of the water cycle and quality control of pharmaceuticals). StreamFind aims to stimulate the use of advanced data analysis (e.g. non-target screening, statistical analysis, etc.) in routine studies, to promote standardisation of data structure and processing, and to facilitate retrospective data evaluation. The StreamFind platform is aimed at scientists, but also at technicians due to its comprehensive documentation, its well categorised set of integrated modular functions and its embedded graphical user interface.

The StreamFind development is ongoing, please contact us for questions or collaboration.

Installation

Pre-requisites for the StreamFind are the R software and the RTools (only applicable for Windows users). RTools is needed for compiling C++ code used in the StreamFind R package. StreamFind also uses python scripts for some of its functionalities, so it is recommended to have the latest python installed and added to the environmental variables for Windows users. Assuming that R, RTools and python are installed, the StreamFind R package can be installed from the GitHub repository via the BiocManager.

if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("odea-project/StreamFind")

Other dependencies

The StreamFind depends on other open source software to process different analytical data. For instance, for non-target screening using mass spectrometry the StreamFind uses the patRoon R package and its own dependencies. Installation instructions for patRoon and its dependencies can be found here. When a dependency is not installed, a warning message will be displayed when necessary with instructions. Consult the documentation for dependencies of other data types.

Suplementary data

The supplementary StreamFindData R package holds the data used in examples and other documentation assets of the StreamFind and can also be installed from the GitHub repository.

if (!require("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("odea-project/StreamFindData")

Docker Setup

The StreamFind can also be used via the Docker container. The Docker container is a pre-configured environment with all the necessary dependencies installed. The Docker container can be built and started with the following commands.

Build the Docker container:
docker build -t my-r-app .

Start the Docker container:
docker run -it -p 3838:3838 -p 8787:8787 -v $(pwd):/app my-r-app

Once the container is up, you’ll be prompted to select the service you want to run:

  • Option 1: Starts the Shiny application, accessible at http://localhost:3838.
  • Option 2: Starts the RStudio Server, accessible at http://localhost:8787
    • Default Username is rstudio and Password is rstudio
  • Option 3: Starts both the Shiny App and RStudio Server

Documentation

The documentation and usage examples of the StreamFind R package can be found in the reference page and articles of the webpage, respectively.

References

The StreamFind is open source due to public funding and the extensive contribution from scientific literature as well as existing open source software. Below, we reference the research and software that is used within StreamFind. Please note that each open source software or research that StreamFind uses relies on other contributions. Therefore, we recommend to search within each citation for other contributions.

Benton, H. Paul, Elizabeth J. Want, and Timothy M. D. Ebbels. 2010. “Correction of Mass Calibration Gaps in Liquid Chromatography-Mass Spectrometry Metabolomics Data.” BIOINFORMATICS 26: 2488.

Chambers, M. C., B. Maclean, R. Burke, D Amodei, D. L. Ruderman, S. Neumann, L. Gatto, et al. 2012a. “A Cross-Platform Toolkit for Mass Spectrometry and Proteomics.” Nature Biotechnology 30 (10): 918–20. https://doi.org/10.1038/nbt.2377.

Chambers, Matthew C., Maclean, Brendan, Burke, Robert, Amodei, et al. 2012b. “A cross-platform toolkit for mass spectrometry and proteomics.” Nat Biotech 30 (10): 918–20. https://doi.org/10.1038/nbt.2377.

Gatto, Laurent, Sebastian Gibb, and Johannes Rainer. 2020. “MSnbase, Efficient and Elegant r-Based Processing and Visualisation of Raw Mass Spectrometry Data.” bioRxiv.

Gatto, Laurent, and Kathryn Lilley. 2012. “MSnbase - an r/Bioconductor Package for Isobaric Tagged Mass Spectrometry Data Visualization, Processing and Quantitation.” Bioinformatics 28: 288–89.

Helmus, Rick, Thomas L. ter Laak, Annemarie P. van Wezel, Pim de Voogt, and Emma L. Schymanski. 2021. “patRoon: Open Source Software Platform for Environmental Mass Spectrometry Based Non-Target Screening.” Journal of Cheminformatics 13 (1). https://doi.org/10.1186/s13321-020-00477-w.

Helmus, Rick, Bas van de Velde, Andrea M. Brunner, Thomas L. ter Laak, Annemarie P. van Wezel, and Emma L. Schymanski. 2022. “patRoon 2.0: Improved Non-Target Analysis Workflows Including Automated Transformation Product Screening.” Journal of Open Source Software 7 (71): 4029. https://doi.org/10.21105/joss.04029.

Ji, Hongchao, Fanjuan Zeng, Yamei Xu, Hongmei Lu, and Zhimin Zhang. 2017. “KPIC2: An Effective Framework for Mass Spectrometry-Based Metabolomics Using Pure Ion Chromatograms.” Anal Chem. 14 (89): 7631–40. https://doi.org/10.1021/acs.analchem.7b01547.

Kapoulkine, Arseny. 2022. “Pugixml 1.13: Light-Weight, Simple and Fast XML Parser for c++ with XPath Support.” Copyright (C) 2006-2018. http://pugixml.org.

Keller, Andrew, Jimmy Eng, Ning Zhang, Xiao-jun Li, and Ruedi Aebersold. 2005. “A Uniform Proteomics MS/MS Analysis Platform Utilizing Open XML File Formats.” Mol Syst Biol.

Kessner, Darren, Matt Chambers, Robert Burke, David Agus, and Parag Mallick. 2008. “ProteoWizard: Open Source Software for Rapid Proteomics Tools Development.” Bioinformatics 24 (21): 2534–36. https://doi.org/10.1093/bioinformatics/btn323.

Kucheryavskiy, Sergey. 2020. “Mdatools – r Package for Chemometrics.” Chemometrics and Intelligent Laboratory Systems 198: 103937. https://doi.org/https://doi.org/10.1016/j.chemolab.2020.103937.

Kuhl, C., R. Tautenhahn, C. Boettcher, T. R. Larson, and S. Neumann. 2012. “CAMERA: An Integrated Strategy for Compound Spectra Extraction and Annotation of Liquid Chromatography/Mass Spectrometry Data Sets.” Analytical Chemistry 84: 283–89. http://pubs.acs.org/doi/abs/10.1021/ac202450g.

Martens, Lennart, Matthew Chambers, Marc Sturm, Darren Kessner, Fredrik Levander, Jim Shofstahl, Wilfred H Tang, et al. 2010. “MzML - a Community Standard for Mass Spectrometry Data.” Mol Cell Proteomics. https://doi.org/10.1074/mcp.R110.000133.

Meringer, Markus, Stefan Reinker, Juan Zhang, and Alban Muller. 2011. “MS/MS Data Improves Automated Determination of Molecular Formulas by Mass Spectrometry.” MATCH Commun. Math. Comput. Chem 65 (2): 259–90.

Pedrioli, Patrick G A, Jimmy K Eng, Robert Hubley, Mathijs Vogelzang, Eric W Deutsch, Brian Raught, Brian Pratt, et al. 2004. “A Common Open Representation of Mass Spectrometry Data and Its Application to Proteomics Research.” Nat Biotechnol 22 (11): 1459–66. https://doi.org/10.1038/nbt1031.

Reuschenbach, Max, Lotta L. Hohrenk-Danzouma, Torsten C. Schmidt, and Gerrit Renner. 2022. “Development of a Scoring Parameter to Characterize Data Quality of Centroids in High-Resolution Mass Spectra.” Analytical and Bioanalytical Chemistry 414 (July): 6635–45. https://doi.org/10.1007/s00216-022-04224-y.

Röst, Hannes L., Timo Sachsenberg, Stephan Aiche, Chris Bielow, Hendrik Weisser, Fabian Aicheler, Sandro Andreotti, et al. 2016. “OpenMS: A Flexible Open-Source Software Platform for Mass Spectrometry Data Analysis.” Nature Methods 13 (9): 741–48. https://doi.org/10.1038/nmeth.3959.

Ruttkies, Christoph, Steffen Neumann, and Stefan Posch. 2019. “Improving MetFrag with Statistical Learning of Fragment Annotations.” BMC Bioinformatics 20 (1): 1–14.

Ruttkies, Christoph, Emma L Schymanski, Nadine Strehmel, Juliane Hollender, Steffen Neumann, Antony J Williams, and Martin Krauss. 2019. “Supporting Non-Target Identification by Adding Hydrogen Deuterium Exchange MS/MS Capabilities to MetFrag.” Analytical and Bioanalytical Chemistry 411: 4683–4700.

Ruttkies, Christoph, Emma L Schymanski, Sebastian Wolf, Juliane Hollender, and Steffen Neumann. 2016. “MetFrag Relaunched: Incorporating Strategies Beyond in Silico Fragmentation.” Journal of Cheminformatics 8 (1): 1–16.

Sheehy, Guillaume, Fabien Picot, Frédérick Dallaire, Katherine Ember, Tien Nguyen, Kevin Petrecca, Dominique Trudel, and Frédéric Leblond. 2023. “Open-sourced Raman spectroscopy data processing package implementing a baseline removal algorithm validated from multiple datasets acquired in human tissue and biofluids.” Journal of Biomedical Optics 28 (2): 025002. https://doi.org/10.1117/1.JBO.28.2.025002.

Smith, C.A., Want, E.J., O’Maille, G., Abagyan,R., Siuzdak, and G. 2006. “XCMS: Processing Mass Spectrometry Data for Metabolite Profiling Using Nonlinear Peak Alignment, Matching and Identification.” Analytical Chemistry 78: 779–87.

Tautenhahn, Ralf, Christoph Boettcher, and Steffen Neumann. 2008. “Highly Sensitive Feature Detection for High Resolution LC/MS.” BMC Bioinformatics 9: 504.

Tisler, Selina, David I. Pattison, and Jan H. Christensen. 2021. “Correction of Matrix Effects for Reliable Non-Target Screening LC–ESI–MS Analysis of Wastewater.” Analytical Chemistry 93 (24): 8432–41. https://doi.org/10.1021/acs.analchem.1c00357.

Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with s. Fourth. New York: Springer. https://www.stats.ox.ac.uk/pub/MASS4/.

Windig, Willem, Neal B. Gallagher, Jeremy M. Shaver, and Barry M. Wise. 2005. “A New Approach for Interactive Self-Modeling Mixture Analysis.” Chemometrics and Intelligent Laboratory Systems 77 (1): 85–96. https://doi.org/https://doi.org/10.1016/j.chemolab.2004.06.009.

Wolf, Sebastian, Stephan Schmidt, Matthias Müller-Hannemann, and Steffen Neumann. 2010. “In Silico Fragmentation for Computer Assisted Identification of Metabolite Mass Spectra.” BMC Bioinformatics 11: 1–12.

Zhang, Zhi-Min, Shan Chen, and Yi-Zeng Liang. 2010. “Baseline Correction Using Adaptive Iteratively Reweighted Penalized Least Squares.” Analyst 135: 1138–46. https://doi.org/10.1039/B922045C.

About

R Package - Workflow Designer for Non-target Screening and Advanced Data Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages