|
Does anyone have a good handle on the best algorithms to use for mapping/snp calling in very deep coverage of multiple (10 or so) haplotypes in a pooled sample? I'm aware of the MAQ approach and the BWA/samtools approach, but I was surprised by the lack of overlap between these techniques. I also think that they may be confounded by my extremely deep coverage from sequence capture (>200x at most sites). Some of the calls looked a little fishy to me when I pulled them up in IGV, and I'm wondering if something better has come along. Is there a consensus on the right way to do this, or at least a right way to do this? (This may be answered in the forums somewhere, but I can't resist trying out the new Q&A format) |
|
No, maq/samtools are not designed for deep coverage. I usually recommend people to write their own caller based on the pileup output. @lh3 can you provide a link or reference that samtools is not designed for deep coverage? what about varscan?
(16 Jun '10, 23:38)
brentp
@nilshomer , good to know. thanks. :)
(17 Jun '10, 10:13)
brentp
|
|
The CRISP algorithm by Vikas Bansal for pooled samples might be a way to look at your pooled data. It seems to be designed for higher coverage for the pool, coming down to 20-50x coverage per haplotype http://bioinformatics.oxfordjournals.org/cgi/content/full/26/12/i318?etoc "The average coverage of the two pools, based on the alignments, was ~2080x (42x per haplotype) and 2500x (50x per haplotype), respectively " |
Wow, I wish I had noticed this sooner, but using IGV I now see that my samples are highly contaminated... so much for that data set...