[ATTACH]

Enrichment Map User Guide

Overview

The Enrichment Map Cytoscape Plugin allows you to visualize the results of gene-set enrichment as a network. It will operate on any generic enrichment results as well as specifically on Gene Set Enrichment Analysis (GSEA) results. Nodes represent gene-sets and edges represent mutual overlap; in this way, highly redundant gene-sets are grouped together as clusters, dramatically improving the capability to navigate and interpret enrichment results.

Gene-set enrichment is a data analysis technique taking as input

  1. a (ranked) gene list, from a genomic experiment

  2. gene-sets, grouping genes on the basis of a-priori knowledge (e.g. Gene Ontology) or experimental data (e.g. co-expression modules)

and generating as output the list of enriched gene-sets, i.e. best sets that summarizing the gene-list. It is common to refer to gene-set enrichment as functional enrichment because functional categories (e.g. Gene Ontology) are commonly used as gene-sets.

EM_example_2.png


Installation

The Enrichment Map Plugin requires Cytoscape Version 2.6.x. If you don't have Cytoscape or an older Version (2.5 or older), please download the latest Release from http://www.cytoscape.org/ and install it on your computer.


Quick Start Guide

Creating an Enrichment Map

You have a few different options:

The only difference between the above modes is the structure of the enrichment table(s). In either case, to use the plugin you will need the following files:

(*) GSEA saves the enrichment table as a .xls file; however, these are not true Excel files, they are tab-separated text files with a modified extension; Enrichment Map does not work with "true" Excel .xls files.

If your enrichment results were generated from GSEA, you will just have to pick the right files from your results folder. If you have generated the enrichment results using another method, you will have to go to the Full User Guide, File Format section, and make sure that the file format complies with Enrichment Map requirements.

You can use the parameter defaults. For a more careful choice of the parameter settings, please go to the Full User Guide, Tips on Parameter Choice.

Graphical Mapping of Enrichment

Exploring the Enrichment Map

Advanced tips


Full User Guide

File Formats

Gene sets file (GMT file)

Expression Data file (GCT, TXT or RNK file) [OPTIONAL]

GCT (GSEA file type)

RNK (GSEA file type)

Additional Information on GSEA File Formats can be found here

TXT

Enrichment Results files

GSEA result files

Additional Information on GSEA File Formats can be found here

Generic results files

Notes:

  1. description and FDR columns can have empty or NA values, but the column and the column header must exist
  2. if no value is provided under phenotype, Enrichment Map will assume there is only one phenotype, and will map enrichment p-values to red

See here for examples

DAVID Enrichment Result File

Notes:

  1. The DAVID option expects a file as generated by the DAVID web interface.
  2. DISCLAIMER : In the absence of a gmt gene sets are constructed based on the field Genes in the DAVID output. This only considers the genes entered in your query set and not the genes in your background set. This will drastically affect the amount of overlap you see in the resulting Enrichment Map.

See here for tutorial on how to generate David output files for Enrichment maps

BiNGO Enrichment Result File

Notes:

  1. The BiNGO option expects a file as generated by the BiNGO Cytsocape Plugin.
  2. DISCLAIMER : In the absence of a gmt gene sets are constructed based on the field Genes in the BiNGO output. This only considers the genes entered in your query set and not the genes in your background set. This will drastically affect the amount of overlap you see in the resulting Enrichment Map.

See here for tutorial on how to generate Bingo output files for Enrichment maps

RPT files

                '''producer_class'''    xtools.gsea.Gsea
                '''producer_timestamp'''        1367261057110
                param   collapse        false
                param   '''cls'''       WHOLE_PATH_TO_FILE/EM_EstrogenMCF7_TestData/ES_NT.cls#ES24_versus_NT24
                param   plot_top_x      20
                param   norm    meandiv
                param   save_rnd_lists  false
                param   median  false
                param   num     100
                param   scoring_scheme  weighted
                param   make_sets       true
                param   mode    Max_probe
                param   '''gmx'''       WHOLE_PATH_TO_FILE/EM_EstrogenMCF7_TestData/Human_GO_AllPathways_no_GO_iea_April_15_2013_symbol.gmt
                param   gui     false
                param   metric  Signal2Noise
                param   '''rpt_label''' ES24vsNT24
                param   help    false
                param   order   descending
                param   '''out'''       WHOLE_PATH_TO_FILE/EM_EstrogenMCF7_TestData
                param   permute gene_set
                param   rnd_type        no_balance
                param   set_min 15
                param   include_only_symbols    true
                param   sort    real
                param   rnd_seed        timestamp
                param   nperm   1000
                param   zip_report      false
                param   set_max 500
                param   '''res'''       WHOLE_PATH_TO_FILE/EM_EstrogenMCF7_TestData/MCF7_ExprMx_v2_names.gct

                file    WHOLE_PATH_TO_FILE/EM_EstrogenMCF7_TestData/ES24vsNT24.Gsea.1367261057110/index.html

Enrichment File 1 --> {out}(--File.separator--){rpt_label} + "." + {producer_class} + "." + {producer_timestamp}(--File.separator--) "gsea_report_for_" + phenotype1 + "_" + timestamp + ".xls"
Enrichment File 2 --> {out}(--File.separator--){rpt_label} + "." + {producer_class} + "." + {producer_timestamp}(--File.separator--) "gsea_report_for_" + phenotype2 + "_" + timestamp + ".xls"
Ranks File --> {out}(--File.separator--){rpt_label} + "." + {producer_class} + "." + {producer_timestamp}(--File.separator--) "ranked_gene_list_" + phenotype1 + "_versus_" + phenotype2 +"_" + timestamp + ".xls";      

EDB File (GSEA file type)

  * NOTE: The gene_sets.gmt file contained in the edb directory is filtered according to the expression file.  If you are doing a two dataset analysis where the expression files are from different platforms or contain different sets of genes the edb gene_sets.gmt file can not be used as genes found in one analysis might be lacking in the other.  In this case use the original gmt file (prior to GSEA filtering) and EM will filter each the gene sets separately according to each dataset.

Advanced Settings - Additional Files

Parameters

Node (Gene Set inclusion) Parameters

P-value

FDR Q-value

Edge (Gene Set relationship) Parameters

Jaccard Coefficient

                Jaccard Coefficient = [size of (A intersect B)] / [size of (A union B)]

Overlap Coefficient

                Overlap Coefficient = [size of (A intersect B)] / [size of (minimum( A , B))]

Combined Coefficient

                Jaccard Coefficient = [size of (A intersect B)] / [size of (A union B)]
                Overlap Coefficient = [size of (A intersect B)] / [size of (minimum( A , B))]
                
                Combined Constant = k

                Combined Coefficient = (k * Overlap) + ((1-k) * Jaccard)

Tips on Parameter Choice

P-value and FDR Thresholds

GSEA can be used with two different significance estimation settings: gene-set permutation and phenotype permutation. Gene-set permutation was used for Enrichment Map application examples.

Gene-set Permutation

Here are different sets of thresholds you may consider for gene-set permutation:

For high quality, high coverage transcriptomic data, the number of enriched terms at the very conservative threshold is usually 100-250 when using gene-set permutation.

Phenotype Permutation

In general, we recommend to use permissive thresholds only if your having a hard time finding any enriched terms.

Jaccard vs. Overlap Coefficient

Overlap Thresholds

Jaccard Thresholds

Interfaces

The Input Panel

Screenshot EnrichmentMap InputPanel

  1. Analysis Type

    • There are two distinct types of Enrichment map analyses, GSEA or Generic.
      • GSEA - takes as inputs the output files created in a GSEA analysis. File formats are specific to files created by GSEA. The main difference between this and generic is the number and format of the Enrichment results files. GSEA analysis always has two enrichment results files, one for each of the phenotypes compared.

      • Generic - takes as inputs the same file formats as a GSEA analysis except the Enrichment results file is a different format and there is only one enrichment file. Generic File description

      • DAVID - (implemented in v1.0 and higher) has no gmt or expression file requirement and takes as input enrichment result file as produced by DAVID David Enrichment Result File description

  2. Genesets - path to gmt file describing genesets. User can browse hard drive to find file by pressing ... button.

  3. Dataset 1 - User can specify expression and enrichment files or alternatively, an rpt file which will populate all the fields in genesets,dataset # and advanced sections.

  4. Advanced - Initially collapsed (expand by clicking on arrow head directly next to Advanced), users have the option of modifying the phenotype labels or loading gene rank files.

  5. Parameters - User can specify p-value, fdr and overlap/jaccard cutoffs. Choosing Optimal parameter values

  6. Actions - The user has three choices, Reset (clears input panel), Close (closes input panel), and Build Enrichment map (takes all parameters in panel and builds an Enrichment map)













The Data Panel

Expression Viewer

Screenshot Heatmap Expression Viewer Panel

Node Attributes

Edge Attributes

The Results Panel

Parameters pane

Screenshot Results Parameter Panel

  1. Phenotype 1
  2. Phenotype 2
  3. P-value Cutoff tuner - allows you to adjust the p-value threshold used to filter the gene sets.

    • By moving the slider to the left you can decrease the p-value threshold causing nodes (and their edges) to be removed from the network.
    • Moving the slider back to the right will restore the nodes (and their edges).
    • You can NOT increase the p-value threshold above what you specified when you built the network.
  4. Q-value Cutoff tuner - allows you to adjust the q-value threshold used to filter the gene sets.

    • By moving the slider to the left you can decrease the q-value threshold causing nodes (and their edges) to be removed from the network.
    • Moving the slider back to the right will restore the nodes (and their edges).
    • You can NOT increase the q-value threshold above what you specified when you built the network.
  5. Similarity Cutoff tuner - allows you to adjust the similarity threshold used to filter the gene set overlaps (edges).

    • By moving the slider to the right you can increase the similarity threshold causing edges to be removed from the network.
    • Moving the slider back to the left will restore the edges.
    • You can NOT decrease the similarity threshold below what you specified when you built the network.
  6. Button to launch index of GSEA results in a web browser.
  7. List of parameters used to create the EM.
  8. Heatmap Autofocus

    • selected by default
    • When you click on any node or edge in the network EM automatically updates the expression viewer and makes the focus of the Data panel the overlap expression viewer. When using other plugins in conjunction with EM this feature can get cumbersome.

    • To turn this off unselect "heatmap Autofocus".
  9. Default Sorting order - in the expression viewer genes can be sorted by Hierarchical clustering, Ranks, Columns, or No sort. To set the default change selection.

  10. Default Distance Metric - for hierarchical clustering there are three available distance metrics that can be used to compute distances between genes. By default this is set of pearson correlation. Update this parameter if you which to use one of the other distance metrics.




PostAnalysis Input Panel

To access Post Analysis go to the menu path: Apps > Enrichment Map > Load Post Analysis Panel.

There are currently two types of Post Analysis Available: Known Signature and Signature Discovery. The contents of the panel will change depending on the type of analysis chosen. Known signature mode calculates post analysis edges for a small subset of known gene-sets. Signature discovery mode allows for filtering of large set of potential signatures to help uncover most likely sets.

The result of running Post Analysis is a new node for each signature gene set (yellow triangle) and edges from the signature gene set to each existing gene set when the similarity passes the cutoff test.

A new visual style is also created and applied to the network after post analysis runs. This visual style is the same as the enrichment map style but with a few additions. Signature edges are pink, signature nodes are yellow triangles, and edge width mapping is calculated differently.

Screenshot PostAnalysis InputPanel SignatureHubs

Known Signature

  1. Post Analysis Type

    • Known Signature: Calculates the overlap between gene-sets of the current Enrichment Map and all the gene sets contained in the provided signature file.

  2. Gene Sets

    • SigGMT: The gmt file with the signature-genesets. These will be compared against the gene-sets from the current Enrichment Map.
  3. Edge Weight Parameters: Choose a method for generating an edge between a signature-geneset and an enrichment geneset. Described in detail below.

  4. Actions: Reset (clears input panel), Close (closes input panel), and Run (takes all parameters in panel and performs the Post-Analysis)



Screenshot PostAnalysis InputPanel

Signature Discovery

  1. Post Analysis Type

    • Signature Discovery: Calculates the overlap between gene-sets of the current Enrichment Map and the selected genesets.

  2. Gene-Sets

    • The gmt file with the signature-genesets.
    • Filter: Genesets from the gmt file that do not pass the filter test will not be loaded.
    • Load Gene-Sets: Press after the gmt file and filter have been chosen to load the signature-genesets.
  3. Available Signature Genesets: Once the genesets have been loaded this box will contain a list of all the genesets in the SigGMT file (that passed the filter).

    • To highlight more htan one geneset at at time hold the Shift, Command or Ctrl keys while clicking with the mouse.
  4. Selected Signature Genesets: The analysis will be performed with all genesets in this list. Use the down- and up-buttons to move highlighted genesets from one list to the other.

  5. Edge Weight Parameters: Choose a method for generating an edge between a signature-geneset and an enrichment geneset. Described in detail below.

  6. Actions: Reset (clears input panel), Close (closes input panel), and Run (takes all parameters in panel and performs the Post-Analysis)

Screenshot PostAnalysis InputPanel

Edge Weight Parameters

  1. Test: Select the type of statistical test to use for edge width.

  2. Cutoff: Edges with a similarity value lower than the cutoff will not be created.

  3. Data Set: If the enrichment map contains multiple data sets choose the one to use here.

  4. Notes:

    • The results of the calculations will be available in the edge table after post analysis runs.
    • The edge “interaction type” will be sig.

    • The hypergeometric test is always calculated, even if it is not used for the cutoff. The results are made available in the edge table.
  5. Available Tests

    • Hypergeometric Test is the probability (p-value) to find an overlap of k or more genes between a signature geneset and an enrichment geneset by chance.

      • Formular Hypergeometric Test
        with:
        k (successes in the sample) : size of the Overlap,
        n (size of the sample) : size of the Signature geneset
        m (total number of successes) : size of the Enrichment Geneset
        N (total number of elements) : size of the union of all Enrichment Genesets

      • Advanced Hypergeometric Universe: Allows to choose the value for N. (GMT: all the genes in the original gmt file, Expression Set: number of genes in the expression set, Intersection: number of genes in the intersection of the gmt file and expression set, User Defined: manually enter a value).
    • Overlap has at least X genes

      • The number of genes in the overlap between the enrichment map gene set and the signature gene set must be at least X for the edge to be created.
    • Overlap is X percent of EM gs

      • The size of the overlap must be at least X percent of the size of the Enrichment Map gene set.
    • Overlap is X percent of Sig gs

      • The size of the overlap must be at least X percent of the size of the Signature gene set.
    • Mann-Whitney (Two-sided, one-sided greater, one-sided less)

      • Note: The Mann-Whitney test requires ranks. It will not be available if the enrichment map was created without ranks.
      • Calculates the p-value using the Mann-Whitney U test where the first sample is the ranks in the overlap and the second sample is all of the ranks in the expression set.

Screenshot PostAnalysis InputPanel

Edge Width

Additional Features

Launch Enrichment Map from the command line

  enrichmentmap build: Build an enrichment map from GSEA results (in an edb directory)
    Arguments:
      [edbdir=value] --> REQUIRED
      [expressionfile=value] --> OPTIONAL
      [overlap=value] --> OPTIONAL
      [pvalue=value] --> OPTIONAL
      [qvalue=value] --> OPTIONAL
      [similaritymetric=value] --> OPTIONAL
      [combinedconstant=value] --> OPTIONAL

enrichmentmap build edbdir="{path_to_edb_directory}" pvalue=0.01 qvalue=0.1 overlap=0.5 similaritymetric="jaccard"
                         expressionfile="{path_to_expression_file}"

Distinct Species or Platform Analysis

Bulk Enrichment Map Build

Calculate Gene set relationships

GSEA Leading Edge Functionality

"the subset of members that contribute most to the ES. For a positive ES, the leading edge subset is the set of members that appear in the ranked list prior to the peak score. For a negative ES, it is the set of members that appear subsequent to the peak score."

GSEA EM leading edge

  1. To access GSEA leading edge information click on an individual Node. Leading edge information is currently only available when looking at a single gene set.
  2. In the Data Panel the expression profile for the selected gene set should appear in the EM GenesetExpression viewer tab.

  3. Change the Normalization to your desired metric.
  4. Change the Sorting method to GSEARanking.

  5. Genes part of the leading edge are highlighted in Yellow.

Customizing Defaults with Cytoscape Properties

The Enrichment Map Plugin evaluates a number of Cytoscape Properties with which a user can define some customized default values.
These can be added and changed with the Cytoscape Preferences Editor (Edit / Preferences / Properties...) or by directly editing the file cytoscape.props within the .cytoscape folder in the User's HOME directory.

Supported Cytoscape Properties are:

EnrichmentMap.default_pvalue
Default P-value cutoff for Building Enrichment Maps

Default Value: 0.05

valid Values: float >0.0, <1.0

EnrichmentMap.default_qvalue
Default Q-value cutoff for Building Enrichment Maps
Default Value: 0.25

valid Values: float >0.0, <1.0

EnrichmentMap.default_overlap
Default Overlap coefficient cutoff for Building Enrichment Maps
Default Value: 0.50

valid Values: float >0.0, <1.0

EnrichmentMap.default_jaccard
Default Jaccard coefficient cutoff for Building Enrichment Maps
Default Value: 0.25

valid Values: float >0.0, <1.0

EnrichmentMap.default_overlap_metric
Default choice of similarity metric for Building Enrichment Maps

Default Value: Jaccard

valid Values: Jaccard, Overlap

EnrichmentMap.default_sort_method
Set the default sorting in the legend/parameters panel to Hierarchical Clustering,
  • Ranks (default the first rank file, if no ranks then it is no sort), Column (default is the first column) or no sort.

Default Value: Hierarchical Cluster

valid Values: Hierarchical Cluster, Ranks, Columns, No Sort

EnrichmentMap.hieracical_clusteting_theshold
Threshold for the maximum number of Genes before a dialogue opens to confirm if clustering should be performed.
Default Value: 1000
valid Values: Integer
nodelinkouturl.MSigDb.GSEA Gene sets

LinkOut URL for MSigDb.GESA Gene sets.

Default Value: http://www.broad.mit.edu/gsea/msigdb/cards/%ID%.html

valid Values: URL
EnrichmentMap.disable_heatmap_autofocus
Flag to override the automatic focus on the Heatmap once a Node or Edge is selected.

Default Value: FALSE

valid Values: TRUE, FALSE

FAQ

Software/EnrichmentMap/UserManual (last edited 2017-10-23 18:21:13 by RuthIsserlin)

MoinMoin Appliance - Powered by TurnKey Linux