Sorghum bicolor bicolor Phenophase Bayesian Belief Network in R & Python

This project uses:

Rocker Group's Tidyverse R 4.0 Ubuntu 18 LTS docker container image
data from the TERRA-REF project accessed through the traits R package
jags for Gibbs Sampled MCMC modeling
causalnex to implement the NO TEARS directed acyclic graph structure learning algorithm as described here
causalnex has dependencies: pandas, sklearn, and igraph

To develop a causal Bayesian network, also known as a Bayesian Belief Network, predicting growth rate as a phenotype from the Sorghum bicolor biomass accumulation panel.

This analysis produces a casual inference Bayesian Belief Network similar to Judea Pearle's work, where the nodes (vertices) of the network represent variables and the edges (arcs) represent linked dependencies supported by conditional probailities.

Methods

Docker Setup

To run any aspect of this analysis it is recommended that you have Docker installed on the host machine. Or use singularity-ce to run the containers on high performance clusters.

Running the Analyses with Docker

All RScripts detailed below can be run with the container image cyversevice/rstudio-bayes-cpu:4.0-ubuntu-jags, including the growth rate modeling
All python code will run in the command line with this Docker container image and is written so that this repository is mounted as a volume in the container image as /work/phenophasebbn/
- Ex. docker run --rm -it -v /local/path/to/phenophasebbn/:/work/phenophasebbn/ rbartelme/pytorch-causalnex:0.10.0 python /work/phenophasebbn/bbn/bbn_structure.py (See note below)
- The current Dockerfile for this image is contained in this repository at /causal_nex/Dockerfile
A JupyterLab Docker container image has been created to facilitate the exploration of the python codebase

Initial Graph Embedding

In order to speed up the directed acyclic graph generation for the Bayesian Belief Network, an initial graph was instantiated using lists of tuples that reference the edge/node connections and directions outlined in the conceptual diagram above.

NOTE: Learning the graph structure without any expert knowledge graph encodings via the NO TEARS implementation in causalnex without GPU acceleration is a computationally intensive process and may not solve the graph structure with the Sorghum gene data included in these analyses.

Network Workflow Description

How the contents of this repository were used to generate the analysis.

1. Processing raw data:

Weather & phenotype data processing:
- Code: /bnprocess_functional.R
- Exports (TSV):
  - /season4_combined.txt
  - /season6_combined.txt
  - /ksu_combined.txt (No longer used in final analysis)
Genomic Data:
- Code to process the SNP frequency by Sorghum bicolor gene table from this repository can be found in /genomic_preprocessign/snp_normalization.R
- Exports (TSV):
  - /genomic_preprocessing/genewise_snp_relative_abundance.txt where the relative abundance of single nucleotide polymorphisms is calculated relative to the Sorghum bicolor biomass accumilation panel population
Development work:
- notes and pseudo code are in /sandbox/ and /bnprocess_mac.R

2. Model Growth Rate by Sorghum bicolor Cultivar using JAGS in R:

/jags/ contains the dev code for the growth rate modeling below, these scripts & files are used in the bbn structure learning model
Full logistic growth rate modeling by Jessica Guo
Summary plots of the logistic growth models can be found in /data_figs/

3. Prepare dataset for structure learning in R & Python:

Join genomic, environmental, and phenotypic data
- This is done with the Rscript /bbn/join_datasets.R
Exports:
- /bbn/rgr_snp_joined.csv

4. BBN Structure Learning in Python with NO TEARS algorithm:

Ingest joined data /bbn/rgr_snp_joined.csv and learns structure with:
- /bbn/bbn_structure.py
Process categorical data with labelencoder from scikit-learn
Encode expert knowledge into graph structure via a list of tuples in the first invocation of StructureModel()
- png exported as /bbn/init_graph.png (as of 10-25-2021 this takes a long time to write the png and is commented out of the code, the pickle of this graph is available at /bbn/expert_sm.pickle for the CPD fitting in step #5 after unpicklign the structure model binary)
Optional: learn graph structure with NO TEARS using the from_pandas function from causalnex blacklisting spurrious node + edge connections with a second list of tuples
Exports:
- categorical label encodings for genotype (or cultivar) /bbn/genotype_map.json & /bbn/season_map.json
- Currently stuck solving graph structure, so only expert knowledge encoded graph is available

5. Discritized Data Mapping & Conditional Probability Distribution Fitting:

Import Bayesian Network by structure model pickle
Instantiate Bayesian network with BayesianNetwork() function from causalnex
Map continuous variables into categories
A detailed checklist of these steps can be found= in this GitHub issue

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
bbn		bbn
causal_nex		causal_nex
data_figs		data_figs
data_processing		data_processing
genomic_preprocessing		genomic_preprocessing
jags		jags
jupyterlab-pytorch-causalnex		jupyterlab-pytorch-causalnex
rstudio-bayes-cpu		rstudio-bayes-cpu
sandbox		sandbox
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
all_seasons_distance_nonan.txt		all_seasons_distance_nonan.txt
bnprocess_functional.R		bnprocess_functional.R
bnprocess_mac.R		bnprocess_mac.R
cultivar_lookup_table.csv		cultivar_lookup_table.csv
fit3.rds		fit3.rds
initial_concept.png		initial_concept.png
ksu_combined.txt		ksu_combined.txt
phenophasebbn.Rproj		phenophasebbn.Rproj
season4_combined.txt		season4_combined.txt
season6_combined.txt		season6_combined.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sorghum bicolor bicolor Phenophase Bayesian Belief Network in R & Python

Methods

Docker Setup

Running the Analyses with Docker

Initial Graph Embedding

Network Workflow Description

About

Releases

Packages

Languages

License

genophenoenvo/phenophasebbn

Folders and files

Latest commit

History

Repository files navigation

Sorghum bicolor bicolor Phenophase Bayesian Belief Network in R & Python

Methods

Docker Setup

Running the Analyses with Docker

Initial Graph Embedding

Network Workflow Description

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages