#acl All:read
[[TableOfContents(5)]]

= User Manual for the Enrichment Cytoscape Map Plugin =

== Overview ==

The Enrichment Map Cytoscape Plugin enables to visualize the results of ''gene-set enrichment'' as a network. Nodes represent gene-sets and edges represent mutual overlap; in this way, highly redundant gene-sets are grouped together as clusters, dramatically improving the capability to navigate and interpret results.
Gene Set Enrichment

Gene-set enrichment is a data analysis technique receiving in input

    * a (ranked) gene list, from a genomic experiment
    * gene-sets, grouping genes on the basis of a-priori knowledge (e.g. Gene Ontology) or experimental data (e.g. co-expression modules) 

and generating in output the list of enriched gene-sets, i.e. best summarizing the gene-list.
It is common to refer to gene-set enrichment as ''functional enrichment'' because functional categories (e.g. Gene Ontology) are commonly used as gene-sets.

== Installation ==

The Enrichment Map Plugin requires Cytoscape Version 2.6.x. If you don't have Cytoscape or an older Version (2.5 or older), please download the latest Release from [http://www.cytoscape.org/] and install it on your computer.

    * Download the Enrichment Map plugin from [Software/EnrichmentMaps/EnrichmentMap.jar] and manually place the file `EnrichmentMap.jar` in the `Cytoscape/plugins` folder. 

== Quick Start Guide ==

=== Creating an Enrichment Map ===

You have two main options:

    * Load GSEA Results
    * Load Generic Results 

In either case, to use the plugin you will need the following files:

    * file.gmt: gene-set to gene ID
    * file.txt or .gct: expression matrix
    * file.txt or .xls (*): enrichment table(s) 

~-(*) GSEA saves the enrichment table as a .xls file; however, these are not true Excel files, they are tab-separated text files with a modified extension; Enrichment Map does not work with "true" Excel .xls files.-~

If your enrichment results were generated from GSEA, you will just have to pick the right files from your results folder. If you have generated the enrichment results using another method, you will have to go to the Full User Guide, File Format section, and make sure that the file format complies with Enrichment Map requirements.

You can use the parameter defaults. For a more careful choice of the parameter settings, please go to the Full User Guide, Tips on Parameter Choice.

=== Graphical Mapping of Enrichment ===

    * Nodes represent gene-sets.
    * Edges represent mutual overlap.
    * Enrichment significance (p-value) is conveyed as node color intensity.
    * The enriched phenotype is conveyed by node color hue.[[BR]]
      ~-Note: In standard two-class designs, where two phenotypes are comared (e.g. treated vs untreated) the color hue conveys the enriched phenotype; this is equivalent to mapping enrichment in up- and down-regulated genes, if one of the two phenotypes is assumed as reference (e.g. untreated), and the other phenotype is the one of interest; in such a case, enriched in the phenotype of interest means up, and enrichment in the reference phenotype means down.-~
    * Node size represents how many genes are in the gene-set. 

=== Exploring the Enrichment Map ===

    * The “Parameters” tab in the “Results Panel” on the right side of the window contains a legend mapping the colours to the phenotypes and displaying the parameters used to create the map (cut-off values and data files).
    * The “Network” tab in the “Control Panel” on the left lists all available networks in the current session and at the bottom has a overview of the current network which allows to easily navigate in a network even at higher zoom levels by dragging the blue rectangle (the current view) over the network.
    * Clicking on a node (the circle that represents a gene set) will open the “EM Geneset Expression Viewer” tab in the “Data Panel” showing a heatmap of the expression values of all genes in the selected gene set.
    * Clicking on an edge (the line between two nodes) will open the “EM Overlap Expression Viewer” tab in the “Data Panel” showing a heatmap of the expression values of all genes both gene sets that are connected by this edge have in common.
    * If several nodes and edges are selected (e.g. by dragging a selection box around the desired gene sets) the “EM Geneset Expression Viewer” will show the union of all genes in the selected gene sets and the “EM Overlap Expression Viewer” will show only those genes that all selected gene sets have in common.
    * The “Geneset Summary” tab in the “Results Panel” on the right contains information about which nodes and edges are selected. 

=== Advanced tips ===

    * With large networks and low zoom-levels Cytoscape automatically reduces the details (such as hiding the node labels and not showing the node borders). To override this mechanism click on “View / Show Graphics Details”
    * The VizMapper and the Node- and Edge Attribute Browser open up a lot more visualization options like linking the label size to Enrichment Scores or p-values. Refer to the Cytoscape manual at www.cytoscape.org for more information.
    * If you have used Genesets from GSEAs MSigDb, you can access additional informations for each gene set, by adding the a new property: [[BR]]
      ''(Edit / Preferences / Properties... / Add -> enter property name: nodelinkouturl.MSigDb -> enter property value: `http://www.broad.mit.edu/gsea/msigdb/cards/%ID%.html` -> [ (./) ] Make Current Cytoscape Properties default -> (OK) )''.
      Now you can right-click on a node and choose LinkOut/MSigDb to open the Database entry of the Geneset represented by that node in your Browser. 

== Full User Guide ==

=== File Formats ===

==== Gene sets file (GMT file) ====

    * The gene set file describes the genesets used for the analysis. These files can be obtained
          * directly downloading gene-sets collected in the [http://www.broad.mit.edu/gsea/msigdb/index.jsp MSigDB][[BR]]
            Note: if you use MSigDB Gene Ontology gene-sets, please consider that they do not include all nnottions, as an evidence code filter is applied; if you are interested in achieving maximum coverage, download the original annotations
          * converting gene annotations / pathways from public databases[[BR]]
            ~-Note: if you are a R user, [http://www.bioconductor.org/ Bioconductor] offers annotation packages such as `GO.db`, `org.Hs.eg.db`, `KEGG.db`-~
    * Each row of the geneset file represents one geneset and consists of:
          * geneset name (--tab--) description (--tab--) a list of tab-delimited genes that are part of that geneset. 
    * The geneset names must be unique. 

==== Expression Data file (GCT, or TXT file) ====

    * The expression data can be loaded in two different formats: gct or txt.
          * Gct differs from txt only because two additional lines are required at the top part of the file. 
    * Each line of expression file contains a:
          * name (tab) description (tab) followed by a list of tab delimited expression values. 
    * The first line in the txt file and third line in the gct file consists of column headings.
    * The GCT file contains two additional lines at the top of the file.
          * The first line contains #1.2.
          * The second line contains the number of data rows (tab) the number of data columns. 
    * If the GCT file contains Probeset ID's as primary keys (e.g. as you had GSEA collapse your data file to gene symbols) you need to convert the gct file to use the same primary key as used in the gene sets file (GMT file). Until this Feature is implemented in the [:Software/EnrichmentMaps: EnrichmentMapPlugin], his can be done with the Python script attachment:replace_probeSetIDs.py using the Chip platform file that was used by GSEA.

{{{
 $ replace_probeSetIDs.py --help
 Usage: replace_probeSetIDs.py [options] -i input.gct -o output.gct -c platform.chip

 Options:
   --version             show program's version number and exit
   -h, --help            show this help message and exit
   -i FILE, --input=FILE
                         input .gct file
   -o FILE, --output=FILE
                         output .gct file
   -c FILE, --chip=FILE  Chip File}}}

==== Enrichment Results files ====

===== GSEA result files =====

    * For each analysis GSEA produces two output files. One representing the enriched genesets in phenotype A and the other representing the enriched genesets in phenotype B.
    * These files are usually named "gsea_report_for_phenotypeA.Gsea.########.xls" and "gsea_report_for_phenotypeB.Gsea.########.xls"
    * The files should be loaded in as is and require no pre-processing. 

===== Generic results files =====

    * The generic results file is a tab delimited file with enriched terms and their corresponding p-values (and optionally, FDR corrections)
    * The enrichments file needs:
          * a term (must match the name in the gmt file),
          * a description (can be empty but the 2nd column is assumed to be the description),
          * a p-value,
          * fdr correction value (is optional). 

==== Additional Information on GSEA File Formats ====

Additional Information on GSEA File Formats can be found [http://www.broad.mit.edu/cancer/software/gsea/wiki/index.php/Data_formats here]

=== The 'Load GSEA Results' Dialog ===

=== The 'Load Generic Results' Dialog ===

=== Tips on Parameter Choice ===

==== P-value and FDR Thresholds ====
Here are different sets of thresholds you may consider for GSEA:
    * Very permissive:
          * p-value < 0.05
          * FDR < 0.25 
    * Moderately permissive:
          * p-value < 0.01
          * FDR < 0.1 
    * Moderately conservative:
          * p-value < 0.005
          * FDR < 0.075 
    * Conservative:
          * p-value < 0.001
          * FDR < 0.05 

We recommend to use permissive thresholds only if your having a hard time finding any enriched terms. For high quality, high coverage transcriptomic data, the number of enriched terms at the very conservative threshold is usually 100-250.

==== Jaccard vs. Overlap Coefficient ====
    * The Overlap Coefficient is recommended when relations are expected to occur between large-size and small-size gene-sets, as in the case of the Gene Ontology.
    * The Jaccard Coefficient is recommended in the opposite case.
    * When the gene-sets are about the same size, Jaccard is about the half of the Overlap Coefficient for gene-set pairs with a small intersection, whereas it is about the same as the Overlap Coefficient for gene-sets with large intersections.
    * When using the Overlap Coefficient generates a the map with several large gene-sets overly connected to many other gene-sets, we recommend switching to the Jaccard Coefficient. 

==== Overlap Thresholds ====
    * 0.5 is moderately conservative, and is recommended for most of the analyses.
    * 0.3 is permissive, and might result in a messier map. 

==== Jaccard Thresholds ====
    * 0.5 is very conservative
    * 0.25 is moderately conservative 

=== The Data Panel ===
 * ''under construction''

=== EM Overlap Expression viewer ===
 * ''under construction''

=== EM Geneset Expression viewer ===
 * ''under construction''

=== Node Attributes ===
 * ''under construction''

=== Edge Attributes ===
 * ''under construction''

=== The Results Panel ===
 * ''under construction''

=== Parameters pane ===
 * ''under construction''

=== Geneset Summary panel ===
    * ''will be removed in the next version''