This is the supplementary code for the paper "Meta analysis on gene popularity; less than half gene citations stem from gene regulatory networks" by Ionut Sebastian Mihai , Debojyoti Das, Gabija Maršalka and Johan Henriksson (corresponding).
Laboratory of Molecular Infection Medicine Sweden (MIMS), Umeå Centre for Microbial Research, Department of Molecular Biology, Umeå University, Umeå, Sweden
The features are produced in feature_summarize.R. The smaller input files are already in the input/-folder. Bigger files need to be downloaded. There are files in this directory describing this process. The generated features are otherwise provided with the paper supplementary. Some of the input files need to be processed with the Java code provided
The linear model is provided in model_linear.R - it also has some rendering for the nonlinear models.
The nonlinear model fitting is done by the Jupyter notebooks: model_neuralnetwork.ipynb and model_xgboost.ipynb
Many of the figures in the paper are produced by small_plots.R.
In the website directory, simply run "make" (assumes you have copied required files into its data folder; this likely needs our guidance). All the dependencies are in requirements.txt
- The majority of the analysis was done in R using many fairly standard packages. SQLdf was used plenty
- Machine learning was done in Python using PyTorch and XGBoost
- There is some Greta code here for piece-wise linear model fitting not mentioned in the paper
- Some steps were done in Java for performance
- Webserver uses Dash
- Figures were made mainly in Inkscape
- Paper written using google docs and the paperpile reference manager
This project is licensed under the MIT License.