## page was renamed from UpToDateGeneSets/EmGeneSetsReadme #acl All:read = Enrichment Map Genesets = <> == Summary == * Enrichment Map Genesets are a set of Gene Set files in GMT format (compatible with [[http://www.broadinstitute.org/gsea/index.jsp| GSEA]]) updated '''monthly''' from original source locations available with: 1. Entrez gene ids 1. !UniProt accessions 1. Gene symbols * The GMT File format contains one gene set per line. Each line contains: * Name (tab) Description (tab) Gene (tab) Gene (tab) ... * In our format: * Name = Gene set Name | Gene set Source | Gene set Source identifier * example --> ATP-dependent protein binding|GO|GO:0043008 '''OR''' arginine biosynthesis IV|HUMANCYC|ARGININE-SYN4-PWY * Description = Gene set Name * example --> ATP-dependent protein binding '''OR''' arginine biosynthesis IV * Gene = identified by one of the three possible identifiers (Engrez gene id, !UniProt accession or gene symbols) == Sources == * '''Human''' || '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathwayss''' || || [[http://www.genome.jp/kegg/|KEGG]] || KEGG ftp site (July 2011) || gmt || symbol || static as of July 1, 2011 || 236 || || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c2]] <
> (other + Biocarta) || static (needs to be updated manually) || gmt || Entrez gene || sporadically || Biocarta - 217<
> Other - 47 || || [[http://pid.nci.nih.gov/|NCI]] || [[http://pid.nci.nih.gov/download.shtml|NCI]] || biopax || Entrez gene || sporadically || 219 pathways || || IOB || directly from IOB - static (July 2011) || biopax || Entrez gene || sporadically || 35 pathways - <
> 10 are the same as !CellMap,<
> 1 is the same as !NetPath|| || [[www.netpath.org/browse|NetPath]] || [[www.netpath.org/browse]] (scripted grab of file numbered 1-25) || biopax || Entrez gene || static || 25 pathways - <
> 12 are cancer pathways (10 are !CellMap) <
> 13 are immunity pathways || || [[http://humancyc.org/|HumanCyc]] || scripted grab of zipped release from password protected website. || biopax || Uniprot || updated periodically || 249 Pathways || || [[http://www.reactome.org/ReactomeGWT/entrypoint.html|Reactome]] || scripted grab of zipped release from website || biopax || Uniprot || updated release || 1117 pathways (release 37) || || [[http://www.ebi.ac.uk/GO/|GO]] || scripted grab from EBI ftp site (human) || GAF || Uniprot || released once a month || 13,034 no GO IEA <
> 15,181 with GO IEA || || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c3]] <
> Specialty GMTs <
> mirs, transcription factors || grab from Msigdb || gmt || Entrez gene || sporadically || 221 miRs <
> 616 TFs || * '''Mouse''' || '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathwayss''' || || [[http://www.reactome.org/ReactomeGWT/entrypoint.html|Reactome]] || scripted grab of zipped release from website || biopax || Uniprot || updated release || 946 pathways (release 37) || || GO || scripted grab from MGI ftp site (mouse) || GAF || MGI || released once a month || 14,563 no GO IEA <
> 15,041 with GO IEA || || [[http://www.genome.jp/kegg/|KEGG]] || ''translated from Human using Homologene'' || gmt || Entrezgene || static as of July 1, 2011 || 236 || || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c2]] <
> (other + Biocarta)|| ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || total 880:<
> Kegg -186<
> Reactome - 430<
> Biocarta - 217<
> Other - 47 || || [[http://pid.nci.nih.gov/|NCI]] || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 219 pathways || || IOB || ''translated from Human using Homologene'' || gmt || Entrez gene || sporadically || 35 pathways - <
> 10 are the same as !CellMap,<
> 1 is the same as !NetPath|| need biopax pathways fixed so species info is correct but information is still extractable. || || [[www.netpath.org/browse|!NetPath]] || ''translated from Human using Homologene'' || gmt || Entrez gene || static || 25 pathways - <
> 12 are cancer pathways (10 are !CellMap) <
> 13 are immunity pathways || || [[http://humancyc.org/|!HumanCyc]] || ''translated from Human using Homologene'' || gmt || Entrez gene || updated periodically || 249 Pathways || == File Structure == '''< > denotes directory''' * - directory is named according to date sets were updated. * * - (either Entrez gene, !UniProt, Gene symbol) * * BP = biological process * MF = molecular function * CC = Cellular component * All = BP + MF + CC * no_GO_IEA - indicates that the file '''excludes''' GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis) * with_GO_IEA - indicates that the file '''includes''' GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis) * * * * * In each directory There are amalgamated gene set files: * AllPathways - contains all pathway sources in the Pathways directory * GOPathways - contains all GO (mf, bp, cc) and all Pathway sources in the Pathways directory. == Creating customized Genesets == 1. Download the desired gene set files you would like to use in your customized set. (For example Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt ) {{{ cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt > MyCustomizedSet.gmt }}}