0%

STAR-转录组比对

STAR

简单记录STAR的常用命令。

Index

1
2
3
4
5
STAR --runMode genomeGenerate \
--genomeDir /path/to/store/index/ \
--genomeFastaFiles /path/to/ref \
--runThreadN <processorsN> \
--sjdbGTFfile /path/to/gtf

--sjdbGTFfile improve alternative splicing detection

--genomeFastaFile: location of the reference sequence that were used to generate index

--genomeDir: location for storing index, directory is preferable

With limit computional resource

1
2
3
4
5
6
STAR --runThreadN 32 \
--runMode genomeGenerate \
--genomeDir /path/to/store/index/ \
--genomeFastaFiles /path/to/ref \
--limitGenomeGenerateRAM 96000000000 \
--genomeChrBinNbits 16

add --limitGenomeGenerateRAM to specify RAM when STAR terminated by small RAM

add --genomeChrBinNbits when STAR encounters error: terminate called after throwing an instance of 'std::bad_alloc'

from https://github.com/alexdobin/STAR/issues/103#issuecomment-173009628

If you are using a genome with a large > 5,000 number of references (chrosomes/scaffolds), you may need to reduce the –{genomeChrBinNbits to reduce RAM consumption. The following scaling is recommended: –genomeChrBinNbits} = min(18, log2(GenomeLength/NumberOfReferences)). For example, for 3~gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.

Aligning

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# paired-end mode
if ${PAIR}; then
STAR --genomeDir /path/to/index \
--readFilesIn <fq1> <fq2> \
--runThreadN <processorsN> \
--outSAMtype SAM \
--outFileNamePrefix /path/to/output
else # single-end mode
STAR --genomeDir /path/to/index \
--readFilesIn <fq1> \
--runThreadN <processorsN> \
--outSAMtype SAM \
--outFileNamePrefix /path/to/output
fi

--readFilesIn: input fastq files

Output

By default, STAR output following files

1
2
3
4
5
6
- -rw-rw-r-- Aligned.out.sam
- -rw-rw-r-- Log.final.out
- -rw-rw-r-- Log.out
- -rw-rw-r-- Log.progress.out
- -rw-rw-r-- SJ.out.tab
- drwx------ _STARtmp

Aligned.out.sam holds the alignment results

1
2
samtools view Aligned.out.sam |head -n1
A00582:731:HHGVJDSX2:1:1106:3586:16845 99 chr11 74699355 255 149M = 74699383 24655 GTCAAGCTTATTTGATATAGTGGTATGTCCCTCCAGAAAAATCAAAAGTTGTGATCCCTGGATTTGAATTAAATATGCCACTATGTGGCTTCCACAGAGGGAAAAATGATTCTTTTTTTCAAGTGAATCCAATCAGCAACCAGTCAACA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:294 nM:i:0

Log.final.out records the alignment status

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
cat Log.final.out
Started job on | Sep 15 17:26:21
Started mapping on | Sep 15 17:27:00
Finished on | Sep 15 17:30:05
Mapping speed, Million of reads per hour | 318.71

Number of input reads | 16378402
Average input read length | 288
UNIQUE READS:
Uniquely mapped reads number | 14279182
Uniquely mapped reads % | 87.18%
Average mapped length | 287.27
Number of splices: Total | 14764233
Number of splices: Annotated (sjdb) | 0
Number of splices: GT/AG | 14660617
Number of splices: GC/AG | 82087
Number of splices: AT/AC | 6046
Number of splices: Non-canonical | 15483
Mismatch rate per base, % | 0.17%
Deletion rate per base | 0.01%
Deletion average length | 2.28
Insertion rate per base | 0.01%
Insertion average length | 1.86
MULTI-MAPPING READS:
Number of reads mapped to multiple loci | 1337685
% of reads mapped to multiple loci | 8.17%
Number of reads mapped to too many loci | 54922
% of reads mapped to too many loci | 0.34%
UNMAPPED READS:
Number of reads unmapped: too many mismatches | 0
% of reads unmapped: too many mismatches | 0.00%
Number of reads unmapped: too short | 342060
% of reads unmapped: too short | 2.09%
Number of reads unmapped: other | 364553
% of reads unmapped: other | 2.23%
CHIMERIC READS:
Number of chimeric reads | 0
% of chimeric reads | 0.00%

Ref:

https://github.com/alexdobin/STAR

完。