Skip to content
lindenb edited this page Nov 6, 2014 · 5 revisions

##Motivation

Group VCF data by gene/transcript. By default it tries to use data from VEP and SnpEff

##Compilation

See also Compilation.

$  ant groupbygene

##Options

Option Description
-X XML ouput
-T (tag) also use the tag in the INFO column to get the name of the gene. Optional. Can be used multiple times.
-h get help (this screen)
-v print version and exit.
-L (level) log level. One of java.util.logging.Level .

History

  • 13 Dec 2013 : Fixed major bug in prediction parser

##Example ###Delimited output

$ curl -s -k "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" |\
java -jar dist/groupbygene.jar |\
head | column  -t

#chrom  min.POS    max.POS    gene.name  gene.type         samples.affected  count.variations  M10475  M10478  M10500  M128215
chr10   52004315   52004315   ASAH2      snpeff-gene-name  2                 1                 0       0       1       1
chr10   52004315   52004315   ASAH2      vep-gene-name     2                 1                 0       0       1       1
chr10   52497529   52497529   ASAH2B     snpeff-gene-name  2                 1                 0       1       1       0
chr10   52497529   52497529   ASAH2B     vep-gene-name     2                 1                 0       1       1       0
chr10   48003992   48003992   ASAH2C     snpeff-gene-name  3                 1                 1       1       1       0
chr10   48003992   48003992   ASAH2C     vep-gene-name     3                 1                 1       1       1       0
chr10   126678092  126678092  CTBP2      snpeff-gene-name  1                 1                 0       0       0       1
chr10   126678092  126678092  CTBP2      vep-gene-name     1                 1                 0       0       0       1
chr10   135336656  135369532  CYP2E1     snpeff-gene-name  3                 2                 0       2       1       1

###XML output

$ curl -s -k "https://raw.github.com/arq5x/gemini/master/test/test4.vep.snpeff.vcf" |\
java -jar dist/groupbygene.jar -X |\
xmllint --format -
<?xml version="1.0" encoding="UTF-8"?>
<genes>
  <!-- Command line: -X-->
  <!--Version 2c13f6f369faf3d076ccc9420b5284cd990c6892-->
  <samples count="4">
    <sample>M10475</sample>
    <sample>M10478</sample>
    <sample>M10500</sample>
    <sample>M128215</sample>
  </samples>
  <gene name="ASAH2" type="snpeff-gene-name" chrom="chr10" min.POS="52004315" max.POS="52004315" affected="2" variations="1">
    <sample name="M10500" count="1">
      <genotype pos="52004315" ref="T" A1="C" A2="C"/>
    </sample>
    <sample name="M128215" count="1">
      <genotype pos="52004315" ref="T" A1="C" A2="C"/>
    </sample>
  </gene>
  <gene name="ASAH2" type="vep-gene-name" chrom="chr10" min.POS="52004315" max.POS="52004315" affected="2" variations="1">
    <sample name="M10500" count="1">
(...)
    <sample name="M10475" count="1">
      <genotype pos="72057435" ref="C" A1="C" A2="T"/>
    </sample>
  </gene>
  <gene name="ENST00000572003" type="vep-ensembl-transcript-name" chrom="chr16" min.POS="72057435" max.POS="72057435" affected="1" variations="1">
    <sample name="M10475" count="1">
      <genotype pos="72057435" ref="C" A1="C" A2="T"/>
    </sample>
  </gene>
  <gene name="ENST00000572887" type="vep-ensembl-transcript-name" chrom="chr16" min.POS="72057435" max.POS="72057435" affected="1" variations="1">
    <sample name="M10475" count="1">
      <genotype pos="72057435" ref="C" A1="C" A2="T"/>
    </sample>
  </gene>
  <gene name="ENST00000573843" type="vep-ensembl-transcript-name" chrom="chr16" min.POS="72057435" max.POS="72057435" affected="1" variations="1">
    <sample name="M10475" count="1">
      <genotype pos="72057435" ref="C" A1="C" A2="T"/>
    </sample>
  </gene>
  <gene name="ENST00000573922" type="vep-ensembl-transcript-name" chrom="chr16" min.POS="72057435" max.POS="72057435" affected="1" variations="1">
    <sample name="M10475" count="1">
      <genotype pos="72057435" ref="C" A1="C" A2="T"/>
    </sample>
  </gene>
  <gene name="ENST00000574309" type="vep-ensembl-transcript-name" chrom="chr16" min.POS="72057435" max.POS="72057435" affected="1" variations="1">
    <sample name="M10475" count="1">
      <genotype pos="72057435" ref="C" A1="C" A2="T"/>
    </sample>
  </gene>
</genes>

Main code:

https://github.com/lindenb/jvarkit/blob/master/src/main/java/com/github/lindenb/jvarkit/tools/groupbygene/GroupByGene.java

##See also

Clone this wiki locally