Software/BRAIN - Bader Lab @ The University of Toronto

BRAIN Project

The Biologically Relevant Analysis of Interaction Networks (BRAIN) is a set of algorithms for predicting and analyzing protein domain-peptide ligand interactions based on experimentally known binding evidence (e.g. from protein chip or phage display experiments).

The BRAIN can be accessed as a Cytoscape plugin which reads peptide binding profiles and generates interactions displayed as a Cytoscape network.

BRAIN consists of a library or API, and of a Cytoscape plugin.

TableOfContents

BRAIN Library

The BRAIN library holds the algorithms and methods for interaction prediction. This library is required to build the BRAIN plugin.

BRAIN Plugin

The BRAIN plugin is a way of running various BRAIN algorithms from the Cytoscape user interface. The plugin runs a prediction analysis and presents the results visually as an interactive Cytoscape network.

Latest Build

Brain Plugin: Version 1.0.5 alpha (2007 May 23) - attachment:brainPlugin.jar

Brain Library: Version 1.0.6 (2007 July 26) - attachment:brainlib-1.1.jar

Brain Dependencies: Additional JARs required by BRAIN - attachment:brainDeps.tar.gz

Release Notes

BRAIN Library 1.1 (2007 July 26)

Added PDZ symbol style for PDZ type residue colouring in sequence logo generation code (required by LoLA tool)

1.0.6 (2007 June 12)

brain.jar: Ability to retrieve additional database references from ProteinProfile object (previously was only able to retrieve the first ID in the Accession field of the peptide file).

1.0.5 (2007 May 23)

Code reorganization: brainPlugin.jar now holds only those classes related to the Cytoscape plugin. All other code has been moved to a new Brain Library project.

1.0.4 (2007 May 15)

Network node can represent a single domain or a protein containing multiple domain instances (via new Advanced Options tab)
Lower scoring motif hits are now reported
New node attribute Domain Name to hold a semantic domain identifier (Gene Name + Domain Number, for now)

1.0.3 (2007 April 23)

Support for new peptide file format version 1.1 (and back-compatible with format version 1.0)
New node attributes Sequence Start, Sequence Stop, Comment for profiles generated from new peptide file format

1.0.2 (2007 April 11)

Report SNPs for entire protein sequence and for domain subsequence only.
Option to specify codon bias file
Option to load unique peptides only from peptide file(s)

1.0.1 (2007 February 21)

Port to Cytoscape API 2.4 and Java SE 5.0

Running the Plugin

Place the plugin (brain.jar) in your Cytoscape 'plugins' folder
Launch Cytsocape
Set parameters (Plugins-->BRAIN-->Set Parameters)
Run (Plugins-->BRAIN-->Run BRAIN)

Source

SVN path: svn1.ccbr.utoronto.ca/svn/baderlab/csplugins/brain

Tagged Releases are available under svn1.ccbr.utoronto.ca/svn/baderlab/csplugsin/tags/.

Resources

Human RefSeq Database (GenPept Format, 2007 Feb 21): attachment:human.protein.gpff.gz

Peptide file (new 1.1 format): attachment:peptide_file_v1.1.txt

A peptide file contains a set of experimentally observed ligands for a given domain-containing query protein. Information in the peptide file includes query protein annotation (database identifiers), domain sequence, ligand sequence, and ligand expression data.

Peptide file conversion script: attachment:ConvertBrainPeptideFile.pl

Converts peptide files from format version 1.0 to version 1.1. It takes an input file or directory path, and an output directory path.

e.g. ConvertBrainPeptideFile.pl ~/data/oldfiles/ ~/data/newfiles/

This will convert all peptide files in ~/data/oldfiles to the new format; converted files are written to the directory ~/data/newfiles/. To convert a single file, provide the path (including filename) to the peptide file in the first argument. If the output directory doesn't exist, it is created. Make sure read permissions are correctly set for the input file/directory, and write permissions are correctly set for the output directory. Paths can be relative or absolute.

For usage, run the script with no arguments.

Requirements

Java Runtime Environment (JRE) 1.5 or later is required to run LOLA. All other dependencies are included in the download.

Installing and Running

Extract the TAR file. This will create a directory named "lola".
On Linux, open the command shell and run "lola.sh" from the "lola" directory.
On Windows, double click "lola.bat".
On Mac, double click lola-1.0-beta.jar
You can open a single peptide file, or a project file linked to multiple peptide files.

Here's a view of LOLA after opening a PDZ domain project file:

attachment:lolaScreenShot.png

Input Format

LOLA accepts one or more peptide file as shown in the example below. A peptide file describes a protein containing a specific domain, and provides known peptide ligands of this domain obtained by an experimental technique.

The peptide file consists of a Header Section that describes the protein and domain sequence, and a Peptide Section that lists and describes the peptide ligands.

Example:

Gene Name       DLG1
Accession       Refseq:NP_004078
Organism        Homo Sapiens (Human)
NCBITaxonomyID  9606
Domain Number   3
Domain Type     PDZ
Interpro ID     IPR001478
Technique       Phage Display High Valency
Domain sequence KVVLHRGSTGLGFNIVGGEDGEGIFISFILAGGPADLSGELRKGDRIISVNSVDLRAASHEQAAAALKNAGQAVTIVA
Domain Range    466-525
Comment
PeptideName     Peptide CloneFrequency  QuantData       ExternalIdentifier
1       XLHFWRESSV      66
2       XXRLWKQTSL      3
3       ILKIWRETSL      3
4       KRTIWRETSL      2
A       KNLRSNSMLG      2
6       HLKFWRSTRV      2
7       AHSKWRSTSV      2
8       XXXHRRETTV      1
9       VISRWRQTSL      1
10      TTWLGRQTRV      1
11      SRSSYRETSV      1
12      XXXSRRETSV      1
13      RLFRYRETSL      1
B       PIRKRWTMTL      1
15      XXXNHRETSV      1
16      KIVRWKNTSV      1
17      KHRTWYETSV      1
18      XXXXFKQTSV      1
19      ARPKWRTTRV      1
20      ALPRRRETSV      1

Header Section

Describes the protein, domain, and experiment. Required fields are indicated with a *.

NOTE: This section is in a 2 column format. Field names must be separated from their values with a single TAB character. Multiple TABs or spaces are not accepted.

Gene Name:* An identifier that represents the gene or protein sequence. Not required to be unique.
Accession:* A space-separated list of database accession identifiers for the protein or corresponding gene.
Organism: Description of taxon of the protein.
NCBITaxonomyID*: Taxon identifier from NCBI's Taxonomy repository.
Domain Number*: A number that represents the position of the domain sequence within the protein. For proteins containing multiple instances of the domain, this number helps distinguish the position of these instances. Set to "0" if instance information is not known.
Domain Type:* The formal name of the domain, e.g. WW, PDZ, SH3.
Interpro ID: The Interpro database identifier for the domain.
Technique: The experimental method used to identify potential ligands of the protein.
Domain Sequence:* The amino-acid sequence of the domain region.
Domain Range: The amino-acid position range for the domain region within the protein.
Comment: Notes, additional information, personal comments pertaining to this file.

Peptide Section

Describes the experimentally determined peptide ligands. The peptide sequences must be in multiple alignment format. The sequences should contain no gaps, and should be padded with the X symbol on both sides, where required, such that all sequences have identical length.

NOTE: This section is in a 5-column format. Column headers and values must be separated with a single TAB character. Multiple TABs or spaces are not accepted.

Required fields are indicated with a *.

PeptideName:* A unique numerical symbol assigned to each peptide ligand. To omit a peptide, set to a non-numeric value (e.g. "A"). Values in this column must be unique.
Peptide:* The peptide ligand sequence.
CloneFrequency: Applies only to phage display data: the observed frequency of the peptide in the cloning step.
QuantData: A number that relatively or absolutely quantifies the protein-ligand interaction. E.g. The optical density (OD) from a protein chip experiment.
ExternalIdentifier: A database identifier for the peptide.

Using Project Files

To open several peptide files at once, simply link them all in a single project file. A project file is a text file containing the absolute paths of multiple peptide files. Opening the project file in LOLA will open each of the underlying peptide files in a single step, allowing logos to be constructed for multiple profiles.

Example:

#ProjectFile
/Users/moyez/research/ppi/profiles/PDZ/Human/SidhuPhage/APBA3-1.pep.txt
/Users/moyez/research/ppi/profiles/PDZ/Human/SidhuPhage/CASK-1.pep.txt
/Users/moyez/research/ppi/profiles/PDZ/Human/SidhuPhage/DLG1-1.pep.txt
/Users/moyez/research/ppi/profiles/PDZ/Human/SidhuPhage/DLG1-2.pep.txt
/Users/moyez/research/ppi/profiles/PDZ/Human/SidhuPhage/DLG1-3.pep.txt
/Users/moyez/research/ppi/profiles/PDZ/Human/SidhuPhage/DLG2-3.pep.txt
/Users/moyez/research/ppi/profiles/PDZ/Human/SidhuPhage/DLG3-2.pep.txt
/Users/moyez/research/ppi/profiles/PDZ/Human/SidhuPhage/DLG4-3.pep.txt
/Users/moyez/research/ppi/profiles/PDZ/Human/SidhuPhage/DVL2-1.pep.txt
/Users/moyez/research/ppi/profiles/PDZ/Human/SidhuPhage/ERBB2IP-1-hi.pep.txt

NOTE: The first line of the project file must contain the text "#ProjectFile".

Future Developments

Generate a "logo tree" by hiearchically clustering logos [Initial PDZ (no alignment) use case completed in version 1.1]
Allow colours to be selected for individual amino-acids
Add support for nucleic acids
Additional visualization options (e.g. font, axis labels)

Contact

If you have any questions or feedback, please email Moyez Dharsee at mdharsee@infochromics.com.