## page was renamed from UpToDateGeneSets/EmGeneSetsReadme #acl All:read = Enrichment Map Gene Sets = <> == Summary == * Gene Set Files can be downloaded from : [[http://download.baderlab.org/EM_Genesets/]] * Enrichment Map Gene Sets are a set of Gene Set files in GMT format (compatible with [[http://www.broadinstitute.org/gsea/index.jsp| GSEA]]) updated '''monthly''' from original source locations available with: 1. Entrez gene ids 1. !UniProt accessions 1. Gene symbols * The GMT File format contains one Gene Set per line. Each line contains: * Name (tab) Description (tab) Gene (tab) Gene (tab) ... * In our format: * Name = Gene Set Name | Gene Set Source | Gene Set Source identifier * Example --> ATP-dependent protein binding|GO|GO:0043008 '''OR''' arginine biosynthesis IV|HUMANCYC|ARGININE-SYN4-PWY * Description = Gene Set Name * Example --> ATP-dependent protein binding '''OR''' arginine biosynthesis IV * Gene = identified by one of the three possible identifiers (Entrez gene id, !UniProt accession or gene symbols) == Sources == * '''Human''' || '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathways''' || || [[http://www.genome.jp/kegg/|KEGG]] ([[#ref1|1]]) || KEGG ftp site (July 2011) || GMT || Symbol || static as of July 1, 2011 || 236 || || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c2]] ([[#ref2|2]]) <
> (other + Biocarta) || manual download from Msigdb || GMT || Entrez gene || sporadically || Biocarta - 217<
> Other - 47 || || [[http://pid.nci.nih.gov/|NCI]] ([[#ref3|3]]) || scripted download of zipped release from website || BioPAX || Entrez gene || sporadically || 219 pathways || || IOB || received directly from IOB - static (July 2011) || BioPAX || Entrez gene || sporadically || 35 pathways - <
> 10 are the same as !CellMap,<
> 1 is the same as !NetPath|| || [[http://www.netpath.org/browse/|NetPath]]([[#ref4|4]]) || scripted download of files numbered 1-25 || BioPAX || Entrez gene || static || 25 pathways - <
> 12 are cancer pathways (10 are !CellMap) <
> 13 are immunity pathways || || [[http://humancyc.org/|HumanCyc]] ([[#ref5|5]]) || scripted download of zipped release from password protected website. || BioPAX || !UniProt || updated periodically || 249 Pathways || || [[http://www.reactome.org/ReactomeGWT/entrypoint.html|Reactome]] ([[#ref6|6]]) || scripted download of zipped release from website || BioPAX || !UniProt || updated release || 1117 pathways (release 37) || || [[http://www.ebi.ac.uk/GO/|GO]] ([[#ref7|7]]) || scripted download from EBI ftp site (human) || GAF || Uniprot || released once a month || 13,034 no GO IEA <
> 15,181 with GO IEA || || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c3]] ([[#ref2|2]]) <
> Specialty GMTs <
> mirs, transcription factors || manual download from Msigdb || GMT || Entrez gene || sporadically || 221 miRs <
> 616 TFs || * '''Mouse''' || '''Source''' || '''File Origin''' || '''File Type''' || '''ID extracted''' || '''Frequency source is updated''' || '''Number of pathways''' || || [[http://www.reactome.org/ReactomeGWT/entrypoint.html|Reactome]] ([[#ref6|6]]) || scripted download of zipped release from website || BioPAX || !UniProt || updated release || 946 pathways (release 37) || || [[http://www.informatics.jax.org/mgihome/GO/project.shtml|GO]] ([[#ref7|7]]) || scripted download from MGI ftp site (mouse) || GAF || MGI || released once a month || 14,563 no GO IEA <
> 15,041 with GO IEA || || [[http://www.genome.jp/kegg/|KEGG]] ([[#ref1|1]]) || ''translated from Human using Homologene'' || GMT || Entrez gene || static as of July 1, 2011 || 236 || || [[http://www.broadinstitute.org/gsea/msigdb/index.jsp|Msigdb - c2]] ([[#ref2|2]]) <
> (other + Biocarta)|| ''translated from Human using Homologene'' || GMT || Entrez gene || sporadically || total 880:<
> Kegg -186<
> Reactome - 430<
> Biocarta - 217<
> Other - 47 || || [[http://pid.nci.nih.gov/|NCI]] ([[#ref3|3]]) || ''translated from Human using Homologene'' || GMT || Entrez gene || sporadically || 219 pathways || || IOB || ''translated from Human using Homologene'' || GMT || Entrez gene || sporadically || 35 pathways - <
> 10 are the same as !CellMap,<
> 1 is the same as !NetPath|| || [[http://www.netpath.org/browse/|NetPath]] ([[#ref4|4]]) || ''translated from Human using Homologene'' || GMT || Entrez gene || static || 25 pathways - <
> 12 are cancer pathways (10 are !CellMap) <
> 13 are immunity pathways || || [[http://humancyc.org/|HumanCyc]] ([[#ref5|5]]) || ''translated from Human using Homologene'' || GMT || Entrez gene || updated periodically || 249 Pathways || == File Structure == '''< > denotes directory''' * - directory is named according to date sets were updated. * * - (either Entrez gene, !UniProt, Gene symbol) * * BP = biological process * MF = molecular function * CC = Cellular component * All = BP + MF + CC * no_GO_IEA - indicates that the file '''excludes''' GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis) * with_GO_IEA - indicates that the file '''includes''' GO annotations with evidence codes - 'IEA' (inferred from electronic annotation), 'ND' (No biological data available), 'RCA' (inferred from reviewed computational analysis) * * * * * In each directory There are amalgamated Gene Set files: * !AllPathways - contains all pathway sources in the Pathways directory * GOPathways - contains all GO (MF, BP, CC) and all Pathway sources in the Pathways directory. == Creating customized Gene Sets == 1. Download the desired gene set files you would like to use in your customized set and concatenate the files.<
>For example, to combine Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt, you can use the following linux command: {{{ cat Human_IOB_Entrezgene.gmt Human_NetPath_Entrezgene.gmt > MyCustomizedSet.gmt }}} == References == 1. <> Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. '''KEGG for integration and interpretation of large-scale molecular data sets.''' Nucleic Acids Res. 2011 Nov 10. PMID: 22080510 <
> [[http://www.ncbi.nlm.nih.gov/pubmed/22080510|Pubmed]] 2. <> Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. '''Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.''' Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. PMID: 16199517 <
> [[http://www.ncbi.nlm.nih.gov/pubmed/16199517|Pubmed]] 3. <> Schaefer CF, Anthony K, Krupa S, Buchoff J, Day M, Hannay T, Buetow KH. '''PID: the Pathway Interaction Database.''' Nucleic Acids Res. 2009 Jan;37(Database issue):D674-9. PMID: 18832364 <
> [[http://www.ncbi.nlm.nih.gov/pubmed/18832364|Pubmed]] 4. <> Kandasamy K, ''et a'' '''!NetPath: a public resource of curated signal transduction pathways.'''Genome Biol. 2010 Jan 12;11(1):R3. PMID: 20067622<
> [[http://www.ncbi.nlm.nih.gov/pubmed/20067622|Pubmed]] 5. <> Romero P, Wagg J, Green ML, Kaiser D, Krummenacker M, Karp PD. '''Computational prediction of human metabolic pathways from the complete human genome.''' Genome Biol. 2005;6(1):R2. Epub 2004 Dec 22. PMID: 15642094 <
> [[http://www.ncbi.nlm.nih.gov/pubmed/15642094|Pubmed]] 6. <> Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, Caudy M, Garapati P, Gopinath G, Jassal B, Jupe S, Kalatskaya I, Mahajan S, May B, Ndegwa N, Schmidt E, Shamovsky V, Yung C, Birney E, Hermjakob H, D'Eustachio P, Stein L. '''Reactome: a database of reactions, pathways and biological processes''' Nucleic Acids Res. 2011 Jan;39(Database issue):D691-7. PMID: 21067998 <
> [[http://www.ncbi.nlm.nih.gov/pubmed/21067998|Pubmed]] 7. <> Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. '''Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.''' Nat Genet. 2000 May;25(1):25-9. PMID: 10802651 <
> [[http://www.ncbi.nlm.nih.gov/pubmed/10802651|Pubmed]]