0%

R-提取GO分类下的所有基因

问题描述

有时候我们想知道与某一个GO注释分类相关的基因有哪些,那么我们就需要一种方法将注释到这个GO term所有的基因提取出来

解决方案

在搜索一轮后,发现可以通过以下代码解决:

1
2
3
library(tidyverse)
library(org.Hs.eg.db)
GOgeneID <- get(GOID, org.Hs.egGO2ALLEGS) %>% mget(org.Hs.egSYMBOL) %>% unlist()

下面用DNA 复制(GO:0006260)这一生物学过程为例子,使用人源的GO注释进行展开

1
2
3
4
5
6
7
8
9
library(tidyverse)
library(org.Hs.eg.db)
# GO ID --> gene entrez ID
DNA_geneID <- get('GO:0006260', org.Hs.egGO2ALLEGS)
> head(DNA_geneID)
TAS IEA TAS IMP TAS ISS
"94" "466" "472" "545" "545" "546"
> length(DNA_geneID)
[1] 421

org.Hs.egGO2ALLEGS 包含GO ID与 Entrez ID之间的对应关系,输出的结果中还标注了该基因的注释证据程度,包括以下分类 :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
IMP: inferred from mutant phenotype

IGI: inferred from genetic interaction

IPI: inferred from physical interaction

ISS: inferred from sequence similarity

IDA: inferred from direct assay

IEP: inferred from expression pattern

IEA: inferred from electronic annotation

TAS: traceable author statement

NAS: non-traceable author statement

ND: no biological data available

IC: inferred by curator

详细分类结果可以到以下网址查询:
http://geneontology.org/docs/guide-go-evidence-codes/

进一步我们还可以将Entrez ID转换为Symbol

1
2
3
4
DNA_geneSYMBOL <- mget(DNA_geneID, org.Hs.egSYMBOL) %>% unlist() 
> head(DNA_geneSYMBOL)
94 466 472 545 545 546
"ACVRL1" "ATF1" "ATM" "ATR" "ATR" "ATRX"

完。

Ref:
https://davetang.org/muse/2011/05/20/extract-gene-names-according-to-go-terms/
https://www.ebi.ac.uk/QuickGO/term/GO:0006260
http://geneontology.org/docs/guide-go-evidence-codes/