Differences between revisions 2 and 10 (spanning 8 versions)

ExpressionCorrelation Documentation

ExpressionCorrelation is a plug-in built for [http://cytoscape.org Cytoscape] that computes a similarity network from either the genes or conditions in an expression matrix.

TableOfContents()

1. Introduction

The ExpressionCorrelation plugin computes a similarity network from either the genes or conditions in an expression matrix. Nodes in a similarity network represent genes or conditions. Links represent similarity between vectors of the expression levels of genes across all given conditions (gene correlation network) or the similarity between vectors of the expression levels of all genes in a single condition (condition correlation network). The plugin allows the user to select an Expression Matrix of micro-array data directly from Cytoscape and convert it to a visible interaction Network in Cytoscape. The Similarity Matrix is computed using the Pearson Correlation Coefficient. A histogram tool is available for choosing a similarity strength threshold, in order to ease creation of a reasonably sized network. No statistical significance is currently implemented for the similarity network.

2. About

ExpressionCorrelation is freely available and open-source molecular profile visualization software. Gene expression data (loaded in via Cytoscape) can be used to create a gene or a condition correlation matrix. Any correlation above or below given threshold values, is displayed in Cytoscape as an 'edge' between two 'nodes' (the nodes are the two genes or conditions that are correlated). However, a correlation matrix can be very large, and often cannot be stored in memory, so this program saves only the relevant correlations as they are calculated. Calculation of the correlation matrix is relatively fast.

One problem with this approach is that the cutoff values cannot be lowered without recalculating the entire correlation matrix (they could be raised but a method to do this is not implemented here, instead to ignore low threshold values Cytoscape can be set up to not display them). In addition to losing the values below the threshold, another problem is that Cytoscape begins to have trouble displaying networks above several tens of thousands of edges. This means that good cutoff values must be chosen before the network is created: good cutoff values display as much of the network as possible without causing problems with CPU memory or creating cluttered networks.

To help users choose a good cutoff value, we added a histogram feature, which shows the number of edges associated with particular cutoff values and vice-versa. To view the histogram, the correlation matrix must be calculated (so it will be calculated once for the histogram and once for the network creation), which could cause the entire process to take up to twice as long. The process will be twice as long if the matrix calculation is the time limiting process, which is usually the case when networks contain a few thousand edges or less. However, the edge/node creation process quickly becomes the time limiting process when more than a few thousand edges are created (in this case it could take 100 times longer rather than just twice as long).

It is recommended that if the distribution of the correlation values is not known by the user, then the histogram should be used to limit the networks to a few thousand edges.

When using the histogram to limit the network size, note that depending on what is specified as the cutoffs there may be a difference in number of edges in the generated network than interactions reported in the histogram. This is because the network edges are computed precisely while the histogram number of interactions is based on the number of counts in each bin (corresponding to the cutoffs). Since each bin contains counts of correlations falling within a range of values, extra correlations may be included leading to a discrepancy.

Future directions include using weights in the correlation calculations in order to reduce data noise and down-weight multiple Affymetrix probe set IDs. Also, other similarity metrics and a statistical significance score for similarity links will be considered.

3. Installation Instructions

To use the ExpressionCorrelation Plugin, the user must first obtain a copy of Cytoscape, Version 2.0 or greater. The user can download a copy from: http://www.cytoscape.org/download_list.php.

Once the user has downloaded Cytoscape and verified that it works, the user can install the ExpressionCorrelation Plugin in one of two ways:

Download the plugin: attachment:ExpressionCorrelation.jar and copy the ExpressionCorrelation.jar file to the user [Cytoscape_Home]/plugins directory.
Open Cytoscape. Under the 'Plugins' manager, select 'Manage Plugins', which will open the plugin manager. Find and select the 'Expression Correlation' plugin under the (TODO: enter category here) folder, and click on the 'Install' button.

The Plugin installation is now complete.

4. Using the ExpressionCorrelation Plugin

To use the ExpressionCorrelation Plugin:

Start Cytoscape. This can be done by double clicking the Cytoscape icon in your [Cytoscape_Home] directory, or via the command line.
- On Unix/Linux or MacOS X, run: cytoscape.sh
- On Windows, run: cytoscape.bat
From the Main Menu, Select "File" ---> "Import" ---> "Attribute/Expression Matrix...", select the desired file and click the "Import" button.
From the Main Menu, Select "Plugins" ---> "Expression Correlation Network" --->
1. "Construct Correlation Network"
  This option will create the condition network and the gene network simultaneously using the default cutoffs "-0.95 & 0.95" or the user selected cutoffs from the previous run of the ExpressionCorrelation Plugin, but will not create the histogram of the data distribution. The two network file name extensions along with the default cutoffs used will appear in the 'Network' frame of the Cytoscape panel. If the network has fewer than the number of nodes specified in the Cytoscape viewThreshold property, a view will be created automatically and the network will appear in the right frame of Cytoscape. The viewThreshold property can be modified in the "Cytoscape Preferences Editor" from the Main Menu by selecting "Edit" ---> "Preferences" ---> "Properties". Otherwise, a view will not be created. In this case, to view the network: select the network by clicking on its file name extension (it will turn green), and from the Main Menu select "Edit" ---> "Create View".
2. "Advanced Options"
  1. "Condition Network: Preview Histogram"BRThis option will calculate and display the histogram of the condition matrix expression data distribution. In the histogram window the user can select the low and high cutoffs by manually typing them into the appropriate "Cutoff" text boxes. The user can choose to use only one set of cutoffs by deselecting the "low" or "high" checkbox. The user can select the number or percent of interactions to be displayed, rather than selecting cutoffs, by typing the number into the "Enter" text box and choosing "Number of Interactions" or "Percent of Interactions". Select "OK" to create the Condition Network using the parameters specified. The parameters specified will be saved for the duration of the Cytoscape session.
  2. "Condition Network: Using Defaults"BRThis option will create the condition network using the default cutoffs or the user selected cutoffs from the previous execution of the ExpressionCorrelation Plugin in this Cytoscape session.
  3. "Gene Network: Preview Histogram"BRThis option will calculate and display the histogram of the gene matrix expression data distribution and create the gene network according to the parameters specified by the user.
  4. "Gene Network: Using Defaults"BRThis option will create the gene network using the default cutoffs or the user selected cutoffs from the previous run of the ExpressionCorrelation Plugin in this Cytoscape session.

5. Biological Relevance

The ExpressionCorrelation Plugin allows for comparison of multiple networks of similarity relationships between genes that are derived from different subsets of conditions. It may be used to define modules (sets of genes - network nodes - in the simplest form) that can differentiate between stages or types of cancer. The differences between the networks can be computed using an already existing Cytoscape Plugin Diff.

6. Sample Data

Sample data containing 300 expression experiments from the Rosetta yeast compendium is packaged with the plugin, or it can be downloaded from here: attachment:Rosetta.mrna

7. Contacts

This plugin is maintained by Laetitia Morrison and Shirley Hui in the [http://baderlab.org Bader Lab, University of Toronto].

This plugin was originally developed by Elena Potylitsine, Weston Whitaker and Gary Bader in the [http://www.cbio.mskcc.org/ Sander Group, Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York City] and has been updated by Shirley Hui and Laetitia Morrison in the Bader lab.

This software is made available under the LGPL (Lesser General Public License), which means that you can freely use it within your own software, but if you alter the code itself and distribute it, you must make the source code alterations freely available as well.

Source code is available at http://chianti.ucsd.edu/svn/csplugins/trunk/mskcc/summerstudents/ExpressionCorrelation/

This product includes jmathplot developed by the Yann Richet (http://jmathplot.sourceforge.net/).

-  ⇤ ← Revision 2 as of 2008-08-08 19:04:46 → 
  Size: 9487
  Editor: LaetitiaMorrison
  Comment:
+   ← Revision 10 as of 2008-09-23 02:48:14 → ⇥
  Size: 9798
  Editor: GaryBader
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 6:
-!ExpressionCorrelation is a plug-in built for Cytoscape that computes a similarity network from either the genes or conditions in an expression matrix.
+!ExpressionCorrelation is a plug-in built for [http://cytoscape.org Cytoscape] that computes a similarity network from either the genes or conditions in an expression matrix.
 Line 17:
-!ExpressionCorrelation is freely available and open-source molecular profile visualization software. Future directions include using weights in the correlation calculations in order to reduce data noise and downweight multiple Affymetrix probe set IDs.  Also, other similarity metrics will be considered and a statistical significance score for similarity links.
+!ExpressionCorrelation is freely available and open-source molecular profile visualization software. Gene expression data (loaded in via Cytoscape) can be used to create a gene or a condition correlation matrix. Any correlation above or below given threshold values, is displayed in Cytoscape as an 'edge' between two 'nodes' (the nodes are the two genes or conditions that are correlated).  However, a correlation matrix can be very large, and often cannot be stored in memory, so this program saves only the relevant correlations as they are calculated. Calculation of the correlation matrix is relatively fast.
 Line 19:
-Gene expression data (loaded in via Cytoscape) can be used to create a gene or a condition correlation matrix. Any correlation above or below given threshold values, is displayed in Cytoscape as an 'edge' between two 'nodes' (the nodes are the two genes or conditions that are correlated).  However, a correlation matrix can be very large, and often cannot be stored in memory, so this program saves only the relevant correlations as they are calculated. Calculation of the correlation matrix is relatively fast.
+One problem with this approach is that the cutoff values cannot be lowered without recalculating the entire correlation matrix
(they could be raised but a method to do this is not implemented here, instead to ignore low threshold values Cytoscape can be set up to not display them). In addition to losing the values below the threshold, another problem is that Cytoscape begins to have trouble displaying networks above several tens of thousands of edges. This means that good cutoff values must be chosen before the network is created: good cutoff values display as much of the network as possible without causing problems with CPU memory or creating cluttered networks.
-Line 21:
+Line 22:
-One problem with this is that the cutoff values cannot be lowered without recalculating the entire correlation matrix
(they could be raised but a method to do this is not implemented here, instead to ignore low threshold values Cytoscape can be set up to not display them). In addition to losing the values below the threshold, another problem is that Cytoscape begins to have trouble displaying networks above several thousand edges. This means that good cutoff values must be chosen before the network is created: good cutoff values display as much of the network as possible without causing problems with CPU memory or creating cluttered networks.
+To help users choose a good cutoff value, we added a histogram feature, which shows the number of edges associated with particular cutoff values and vice-versa. To view the histogram, the correlation matrix must be calculated (so it will be calculated once for the histogram and once for the network creation), which could cause the entire process to take up to twice as long. The process will be twice as long if the matrix calculation is the time limiting process, which is usually the case when networks contain a few thousand edges or less. However, the edge/node creation process quickly becomes the time limiting process when more than a few thousand edges are created (in this case it could take 100 times longer rather than just twice as long).
 Line 24:
-To help users choose a good cutoff value, we added a histogram feature, which shows the number of edges associated with particular cutoff values and vice-versa. To get the histogram, the correlation matrix must be calculated (so it will be calculated once for the histogram and once for the network creation), which could cause the entire process to take up to twice as long. The process will be twice as long if the matrix calculation is the time limiting process, which is usually the case when networks contain a
few thousand edges or less. However, the edge/node creation process quickly becomes the time limiting process when more than a few thousand edges are created (in this case it could take 100 times longer rather than just twice as long).
+It is recommended that if the distribution of the correlation values is not known by the user, then the histogram should be used to limit the networks to a few thousand edges.
-Line 27:
+Line 26:
-It is recommended that if the distribution of the correlation values is not know by the user, then the histogram should be used to limit the networks to a few thousand edges.
+When using the histogram to limit the network size, note that depending on what is specified as the cutoffs there may be a difference in number of edges in the generated network than interactions reported in the histogram.  This is because the network edges are computed precisely while the histogram number of interactions is based on the number of counts in each bin (corresponding to the cutoffs).  Since each bin contains counts of correlations falling within a range of values, extra correlations may be included leading to a discrepancy.
-Line 29:
+Line 28:
-When using the histogram to limit the network size, please note that depending on what is specified as the cutoffs there 
may be a difference in number of edges in the generated network than interactions reported in the histogram.  This is because the network edges are computed precisely while the histogram number of interactions is based on the number of counts in each bin (corresponding to the cutoffs).  Since each bin contains counts of correlations falling within a range of values, extra correlations may be included leading to a discrepancy.

!ExpressionCorrelation is currently available at:
http://www.cytoscape.org/plugins2.php
+Future directions include using weights in the correlation calculations in order to reduce data noise and down-weight multiple Affymetrix probe set IDs.  Also, other similarity metrics and a statistical significance score for similarity links will be considered.
-Line 37:
+Line 32:
-To use the !ExpressionCorrelation Plugin, the user must first obtain a copy of Cytoscape, Version 2.0 or greater (current version 
is Version 2.4.1).  The user can download a copy from: http://www.cytoscape.org/download_list.php.
+To use the !ExpressionCorrelation Plugin, the user must first obtain a copy of Cytoscape, Version 2.0 or greater.  The user can download a copy from: http://www.cytoscape.org/download_list.php.
-Line 40:
+Line 34:
-(Important Note:  The user must be using Cytoscape Version 2.0 or greater.  The !ExpressionCorrelation Plugin does *not* work with 
earlier versions of Cytoscape.)

Once the user has downloaded Cytoscape and verified that it works, the user can install the !ExpressionCorrelation Plugin by copying the !ExpressionCorrelation.jar file to the user [Cytoscape_Home]/plugins directory.
+Once the user has downloaded Cytoscape and verified that it works, the user can install the !ExpressionCorrelation Plugin in one of two ways:
 1. Download the plugin: attachment:ExpressionCorrelation.jar and copy the !ExpressionCorrelation.jar file to the user [Cytoscape_Home]/plugins directory.
 1. Open Cytoscape.  Under the 'Plugins' manager, select 'Manage Plugins', which will open the plugin manager.  Find and select the 'Expression Correlation' plugin under the (TODO: enter category here) folder, and click on the 'Install' button.
-Line 54:
+Line 47:
-. From the Main Menu, Select "File" ---> "Import" ---> "Attribute/Expression Matrix..."
+. From the Main Menu, Select "File" ---> "Import" ---> "Attribute/Expression Matrix...", select the desired file and click the "Import" button.
-Line 57:
+Line 50:
-   This option will create the condition network and the gene network simultaneously using the default cutoffs "-0.95 & 0.95" or the user selected cutoffs from the previous run of the !ExpressionCorrelation Plugin, but will not create the histogram of the data distribution. The two network file name extensions along with the default cutoffs used will appear in the top left frame of Cytoscape. If the network has fewer than the number of nodes specified in the Cytoscape viewThreshold property, a view will be created automatically and the network will appear in the right frame of Cytoscape.  The viewThreshold property can be modified in the "Cytoscape Preferences Editor" from the Main Menu by selecting "Edit" ---> "Preferences" ---> "Properties". Otherwise, a view will not be created. In this case, to view the network: select the network by clicking on its file name extension (it will turn green), and from the Main Menu select "Edit" ---> "Create View".
+   This option will create the condition network and the gene network simultaneously using the default cutoffs "-0.95 & 0.95" or the user selected cutoffs from the previous run of the !ExpressionCorrelation Plugin, but will not create the histogram of the data distribution. The two network file name extensions along with the default cutoffs used will appear in the 'Network' frame of the Cytoscape panel. If the network has fewer than the number of nodes specified in the Cytoscape viewThreshold property, a view will be created automatically and the network will appear in the right frame of Cytoscape.  The viewThreshold property can be modified in the "Cytoscape Preferences Editor" from the Main Menu by selecting "Edit" ---> "Preferences" ---> "Properties". Otherwise, a view will not be created. In this case, to view the network: select the network by clicking on its file name extension (it will turn green), and from the Main Menu select "Edit" ---> "Create View".
-Line 72:
+Line 65:
-Line 75:
+Line 67:
-Sample data containing 300 expression experiments from the Rosetta yeast compendium is packaged with the plugin.


== Bugs / Feature Requests ==

Please use the contact details below to report bugs.
+Sample data containing 300 expression experiments from the Rosetta yeast compendium is packaged with the plugin, or it can be downloaded from here: attachment:Rosetta.mrna
-Line 85:
+Line 71:
-Sander Group, Computational Biology Center[[BR]]
Memorial Sloan-Kettering Cancer Center, New York City[[BR]]
http://www.cbio.mskcc.org/
+This plugin is maintained by Laetitia Morrison and Shirley Hui in the [http://baderlab.org Bader Lab, University of Toronto].
-Line 89:
+Line 73:
-For any questions concerning this Plugin, please contact:

Gary Bader:  baderg AT mskcc.org[[BR]]
Elena Potylitsine: elena AT cbio.mskcc.org[[BR]]
Weston Whitakaer: weston AT cbio.mskcc.org
+This plugin was originally developed by Elena Potylitsine, Weston Whitaker and Gary Bader in the [http://www.cbio.mskcc.org/ Sander Group, Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York City] and has been updated by Shirley Hui and Laetitia Morrison in the Bader lab.
-Line 97:
+Line 77:
+Source code is available at http://chianti.ucsd.edu/svn/csplugins/trunk/mskcc/summerstudents/ExpressionCorrelation/