问题描述 有时候我们想知道与某一个GO注释分类相关的基因有哪些,那么我们就需要一种方法将注释到这个GO term所有的基因提取出来
解决方案 在搜索一轮后,发现可以通过以下代码解决:
1 2 3 library(tidyverse) library(org.Hs.eg.db) GOgeneID <- get(GOID, org.Hs.egGO2ALLEGS) %>% mget(org.Hs.egSYMBOL) %>% unlist()
下面用DNA 复制(GO:0006260)这一生物学过程为例子,使用人源的GO注释进行展开
1 2 3 4 5 6 7 8 9 library(tidyverse) library(org.Hs.eg.db) # GO ID --> gene entrez ID DNA_geneID <- get('GO:0006260', org.Hs.egGO2ALLEGS) > head(DNA_geneID) TAS IEA TAS IMP TAS ISS "94" "466" "472" "545" "545" "546" > length(DNA_geneID) [1] 421
org.Hs.egGO2ALLEGS
包含GO ID与 Entrez ID之间的对应关系,输出的结果中还标注了该基因的注释证据程度,包括以下分类 :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 IMP: inferred from mutant phenotype IGI: inferred from genetic interaction IPI: inferred from physical interaction ISS: inferred from sequence similarity IDA: inferred from direct assay IEP: inferred from expression pattern IEA: inferred from electronic annotation TAS: traceable author statement NAS: non-traceable author statement ND: no biological data available IC: inferred by curator
详细分类结果可以到以下网址查询:http://geneontology.org/docs/guide-go-evidence-codes/
进一步我们还可以将Entrez ID转换为Symbol
1 2 3 4 DNA_geneSYMBOL <- mget(DNA_geneID, org.Hs.egSYMBOL) %>% unlist() > head(DNA_geneSYMBOL) 94 466 472 545 545 546 "ACVRL1" "ATF1" "ATM" "ATR" "ATR" "ATRX"
完。
Ref:https://davetang.org/muse/2011/05/20/extract-gene-names-according-to-go-terms/ https://www.ebi.ac.uk/QuickGO/term/GO:0006260 http://geneontology.org/docs/guide-go-evidence-codes/