Evaluate sequencing data, especially RNA-seq data quality using RSeQC.
For overall sequence quality statistics, it outputs:
- The statistics CSV table from samtools flagstats and RSeQC (read_distribution.py)
- Latex report with more detailed information from selected modules. By
default bam_stat.py - Calculate reads mapping statistics is included.
In the modules parameter the desired modules can be specified by using the number of the desired module according to the following list:
1. read_quality - Quality based on Phred score, output in boxplot or heatmap
2. read_duplication - Reads with exactly the same sequence content or mapped to the same genomic location
3. read_GC.py - GC content of reads
4. geneBody_coverage2.py - Read coverage over gene body
5. inner_distance.py - Calculate the inner distance (or insert size) between two paired RNA reads
6. junction_annotation.py - Annotated and novel junctions
7. junction_saturation.py - Check if the current sequencing depth deep enough to perform alternative splicing analyses
8. infer_experiment.py - Check strand specificity
9. RNA_fragment_size.py - Calculate fragment size for each gene/transcript
10. tin.py - Evaluate RNA integrity at transcript level
Version | 1.2 |
---|---|
Bundle | sequencing |
Categories | |
Authors | Alejandra Cervera (alejandra.cervera@helsinki.fi), Ping Chen (ping.chen@helsinki.fi) |
Issue tracker | View/Report issues |
Source files | component.xml function.scala |
Usage | Example with default values |
Name | Type | Mandatory | Description |
---|---|---|---|
alignment | BAM | Mandatory | The aligned RNA-seq reads in SAM or BAM format. |
reference | FASTA | Mandatory | Reference Genome in fasta format. The reference file folder should contain *.fai and *.dict. |
annotation | GTF | Mandatory | GTF file defining transcripts. Make sure the contig names ("1","2", etc or "chr1","chr2", etc) are the same as those in BAM. |
refgene | BED | Optional | Reference gene model in BED format. For speedier results only housekeeping genes can be provided. If input is not given then the annotation file supplied is transformed to bed format. |
chromsize | TextFile | Optional | Chromosome size file. Tab or space separated text file with 2 columns: first column is chromosome name, second column is size of the chromosome. |
log | BinaryFolder | Optional | The stats from TopHat or STAR aligner to be included in the report |
Name | Type | Description |
---|---|---|
report | HTMLFile | Figures |
stats | CSV | Sequence quality statistics. |
Name | Type | Default | Description |
---|---|---|---|
memory | int | 35000 | Memory passed to geneBodyCoverage module |
modules | string | "1,2,3,4,5,6,7,8,9,10" | Prints the help message and exits. |
sample | string | "SampleID" | Identifier for the sample; useful when joining statistics tables from different samples |
Test case | Parameters▼ | IN alignment |
IN reference |
IN annotation |
IN refgene |
IN chromsize |
IN log |
OUT report |
OUT stats |
---|---|---|---|---|---|---|---|---|---|
case1 | (missing) | alignment | reference | annotation | (missing) | chromsize | (missing) | (missing) | (missing) |