|
Dear SEQanswers community, I am working on Illumina RNA-Seq data from human samples. While it is quite clear to me how to map the reads against the human reference under the consideration of splice junctions with tools such as Tophat or MapSplice, I have an additional requirement that makes things more inteersting. In my data, the sampled tissue is infected by one or many different viruses that may induce the expression of viral genes. I now would like to identify these expressed viral transcripts together with the human transcripts. The idea is that viral transcripts can be detected by finding all reads that cannot be mapped to the human reference and re-mapping them against all known viral genomes (perhaps with BLAST or even Smith-Waterman since viral genomes tend to be short and highly variant). However, this strikes me as inelegant and it also requires post-processing of the main mapping to identify unaligned reads. Therefore my question is: are there is any mapping tools geared towards RNA-Seq that allows multiple reference genomes and map each read to the most likely genome while also considering splice junctions? This would be a special case of RNA-Seq for metagenomics data with one large mammal reference genome and many very small reference genomes. Bonus points if the mapper can deal with a higher number of alignment errors (viral transcripts can be quite variant). Thanks a bunch for any leads. |
|
We deal a lot with viral infection systems and our solution is to add the viral genomes of interest to our system genome and then use that combined filed to make the Bowtie index. We have found a surprisingly large amount of viral transcripts and we have reported that this can vary by the type of host if the virus is the same (for example in different strains of mice). |