You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that genNullSeqs can generate negative sequences whose regions overlap the test set. I generated a test set bed file by taking the top 25,000 macs2 peaks ranked by decreasing fold enrichment. I only took the first 200bp of the peaks for simplicity for this test:
sort -k 7,7nr ${in_file}
| head -25000 | awk '{print $1"\t"$2"\t"$2+200}' > test.bed
I then ran the below R code
library(gkmSVM)
library(BSgenome.Hsapiens.UCSC.hg38.masked)
Strangely, there are hundreds of negative regions that overlap test regions:
intersectBed -u -a negSet.bed -b test.bed
| wc -l
result: 428
I checked with a larger fraction overlap to make sure these overlaps weren't due to edge cases from 0 and 1-based coordinate issues:
intersectBed -u -f 0.5 -a negSet.bed -b test.bed
| wc -l
result: 223
I've also noticed this with a smaller batchsize of 50,000 as well.
Thanks for any help you can provide,
Kevin
The text was updated successfully, but these errors were encountered:
Hello,
Thank you for developing this tool.
It appears that genNullSeqs can generate negative sequences whose regions overlap the test set. I generated a test set bed file by taking the top 25,000 macs2 peaks ranked by decreasing fold enrichment. I only took the first 200bp of the peaks for simplicity for this test:
sort -k 7,7nr ${in_file}
| head -25000 | awk '{print $1"\t"$2"\t"$2+200}' > test.bed
I then ran the below R code
library(gkmSVM)
library(BSgenome.Hsapiens.UCSC.hg38.masked)
genNullSeqs("test.bed",genome=BSgenome.Hsapiens.UCSC.hg38.masked,
batchsize=100000,length_match_tol=0)
Strangely, there are hundreds of negative regions that overlap test regions:
intersectBed -u -a negSet.bed -b test.bed
| wc -l
result: 428
I checked with a larger fraction overlap to make sure these overlaps weren't due to edge cases from 0 and 1-based coordinate issues:
intersectBed -u -f 0.5 -a negSet.bed -b test.bed
| wc -l
result: 223
I've also noticed this with a smaller batchsize of 50,000 as well.
Thanks for any help you can provide,
Kevin
The text was updated successfully, but these errors were encountered: