## page was renamed from GSoC2010_ParameterTutorial
#acl All:read LaylaOesper:write,delete,revert
= WordCloud Parameter Tutorial =
== Outline ==
This tutorial will guide a user through how to use and manipulate the parameters associated with the WordCloud plugin using the Cytoscape session file provided. See the WordCloud [[Software/WordCloudPlugin/BasicTutorial|Basic Tutorial]] for an introduction to using the basic functionality of the WordCloud plugin.

Pre-requisites -

 * Cytoscape >= 2.6.3 must be installed
 * The WordCloud plugin must be in the Cytoscape-v2.x.x/plugins folder
 * Download the test data

Go to this page to [[Software/WordCloudPlugin|download the plugin and test data]]

== Instructions ==
=== Version 0.5 or newer ===
1. Open Cytoscape

2. Open the provided [[Software/WordCloudPlugin|sample data file]] (File / Open / select the file AlzheimerEM.cys)

3. Be careful not to change the set of selected nodes for the network titled "EM1_Enrichment Map" as this will change the results that you will get.

'''The example network with the correct set of nodes selected:'''

''' {{attachment:Selected_Nodes.png|Selected_Nodes.jpg}} '''

4. Create a cloud using the nodes already selected in the nodes already pre-selected in the network.  Change the node attribute used for the semantic analysis to EM1_GS_DESCR and update the cloud.  From here forth we will refer to this as the "Original Cloud".

'''Expected Original Cloud:'''

{{attachment:Original_Cloud.png|Original_Cloud.jpg}}

5. Expand the Advanced section of the Input Panel.  Change the '''Max Num of Words''' from the default of 250 to 5 and create a new cloud.  This will cause only the top 5 most significant words to appear in your cloud.

 * Word significance is correlated directly with the size of the word in the display.  If you have cloud display style selected that includes clustering (which you do for this example) ties are broken using cluster membership.  Also, notice that clusters are organized by decreasing order of importance where importance is determined using both the number of words appearing in a cluster as well as their size.

'''Expected Result:'''

''' {{attachment:Max_Word_Cloud.png|Max_Word_Cloud.jpg}} '''

6. Select the original cloud from you list of clouds.  Change the '''Word Aggregation Cutoff '''from 1 to 50 and create a new cloud.

 * Setting the Word Aggregation Cutoff to 50 for this cloud places this value higher than the word aggregation value for all pairs of words that appear in the selected nodes. As a result, each word will be in its own cluster for this example.
 * In general, a higher Word Aggregation Cutoff value means that the requirements for clustering are more stringent and as a result there will be more, smaller clusters.
 * In general, a lower Word Aggregation Cutoff value (minumum of 0) means that the requirements for clustering are less stringent and as a result there will be fewer, larger clusters. However, since our clustering algorithm takes into account the order that the words appear, it is unlikely that a Word Aggregation Cutoff value of 0 will result in a single large cluster.

'''Expected Result:'''

{{attachment:Word_Aggregation_Cutoff.png|Word_Aggregation_Cutoff.jpg}}

7. Select the original cloud from your list of clouds.  Create a new cloud from the original cloud.  For the newly created cloud, select the checkbox titled "Normalize word size using selection/network ratios".  This will cause a slider bar titled '''Network Normalization '''to appear.  Previously, the size of words in the word tag cloud was based entirely on the selected nodes.  Network Normalization allows the size of words to be calculated also using the make-up of the entire network.  Try dragging the slider bar all the way from 0.0 to 1.0 and watch how the word tag cloud changes in real time.

 * Setting the Network Normalization to 0 means that the size that the words appear in the cloud is directly proportional to how often they appear in the selected nodes - no weight is given to how often they appear in the whole network.  In this example, Cancer is the largest word in the cloud, which means that it is the most frequently appearing word in the selected nodes.
 * Since changing the Network Normalization parameter affects the relative importance for each word, changing its value also affects how clustering occurs.  A user should expect that changing this parameter will likely change how the words for a cloud are clustered.

'''Expected Result with Network Normalization = 1.0:'''

''' {{attachment:Network_Normalization.png|Network_Normalization.jpg}} '''

8. Select the original cloud from your list of clouds.  In the '''Layout '''section of the input panel select Clustered-Boxes as the Cloud Style and create a new cloud.

'''Expected Result:'''

''' {{attachment:Boxes.png|Boxes.jpg}} '''

9. Select the original cloud from your list of clouds.  In the '''Layout '''section of the input panel select No-Clustering as the Cloud Style and create a new cloud.

'''Expected Result:'''

{{attachment:No_Clustering.png|No_Clustering.jpg}}

10. Select the original cloud from your list of clouds.  In the '''Layout '''section of the input panel press the "Export Cloud to Network" button.  This will create a Cytoscape network whose nodes represent the words in the cloud and whose edges represent the co-occurence of words.

'''Expected Result:'''

{{attachment:Network_View.png}}

Make sure to re-select the EM1_Enrichment Map network before continuing.

11. Select the original cloud from your list of clouds.  In the '''Word Exclusion List '''section of the input panel add the word "cancer" to be excluded (hit the add button after typing the word).  Create a new cloud.

 * The word cancer will no longer appear in the newly created cloud.

'''Expected Result:'''

''' {{attachment:Add_Cancer.png|Add_Cancer.jpg}} '''

12. Select the original cloud from your list of clouds.  In the '''Word Exclusion List''' section of the input panel expand the word removal list.  Under the section with the heading --Flagged Words-- select the word "kegg".  Hit the Remove button and create a new cloud.

 * The word "kegg" is no longer being filtered out and will now appear in the word tag cloud.
 * Since the word exclusion list is stored at the network level, the word "cancer" will continue to no longer appear in the newly created cloud.

'''Expected Result:'''

''' {{attachment:Remove_KEGG.png|Remove_KEGG.jpg}} '''

13. Select the original cloud from your list of clouds.  In the '''Word Tokenization '''section of the input panel expand the Remove Delimiter List.  Under the section with the heading --Common Delimiter-- select the "space" option.  Hit the Remove button and create a new cloud.

 * The space marker is no longer used as a word delimiter when doing tokenization.  As a result you can create you cloud based on word phrases.

'''Expected Result:'''

''' {{attachment:Remove_Space_Delim.png}} '''

14. Select the original cloud from your list of clouds and add the space character back to be used for tokenization.  In the Word Stemming section of the input panel, select the option to "Enable Stemming" and create a new cloud.

 * Words are now all mapped to their stem using the Porter Stemming Algorithm.  This will allow words like "cell" and "cells" to both be mapped to their common stem "cell" in the cloud display.
 * However, the user should notice that the stem chosen for a word may be somewhat unexpected.  For example, in the cloud used thoughout this tutorial the word "endometrial" will now be displayed as "endometri" because the ending suffix has been removed in orde to isolate the word stem.  Also, the word "pathway" is now represented with the stem "pathwai".

'''Expected Result:''''''  '''

''' {{attachment:stemmingExample.png}} '''