|
I have fastq and bam files. How can I get basic read statistics. Number of raw reads Raw data reads(Mb) Number of mapped reads Mapped bases(Mb) Mean depth(coverage) |
|
Also check samtools flagstat. 1
Number of raw reads ---> grep -c "^>" fastq.filename Raw data read(MB) ---> Not yet... Number of mapped reads ---> samtools flagstat Mapped bases(Mb) ---> Not yet... Mean depth(coverage) ---> using GATK DepthOfCoverage Thank you.
(21 Jul '10, 09:24)
hongiiv
|
|
For the number of raw reads, using the following command
should do it. Getting the Mb count of your reads involves getting the length of each read and adding them together. This can be done in Perl fairly easily. It can also be included in Java as described below, by adding all the unmapped read lengths. The rest of the statistics are available using the samtools libraries. The samtools utilities probably have some of these functions, but I haven't spent much time with them. The Java implementation of samtools (Picard) is one of the nicest libraries I've ever used. For counting the number of mapped reads you want to loop over all reads in the bam file and count if getReadUnmappedFlag is false (i.e. the read mapped). To get a mapped bases count, add getReadLength for each read that mapped. For the mean depth you can take the mapped bases count and divide by the total number of nucleotide positions. All that should take a few hours for an experienced Java programmer, and a few days for a novice. It should run in less than an hour. |
|
I found GATK for Mean depth(coverage).
|