Table of Contents

TableOfContents()

Goals

Strategy

Status

Tasks

  1. Learn SVN, Brain code (ResidueResidueCorrelation)

  2. Literature review related to domain specificity (background activity), PDZ domains (from Ioana's project)
  3. Run ResidueResidue correlation analysis on PDZ domain data: 1-1 version + try others e.g. 1-2 (Requires: PDZ profiles from Gary)

  4. MSA subproject
    1. Learn basics of multiple sequence alignment (Baxevanis, chapter 12)

    2. Find and evaluate MSA algorithms (compare notes with Stacy) + evaluate Superfamily, PFAM databases of protein family alignments
    3. Try different multiple sequence alignment algorithms (MSA) on the PDZ domain sequences to see if they affect the correlation results.
  5. Benchmark/validate correlation subproject
    1. We know H (PDZ), T @-2 (peptide) correlation
    2. Look at structures (e.g. 1N7T and 1BE9) to see if correlated residues/positions are close to each other and compatible (physicochemically). We need to focus on PDZ structures that have bound peptides (search in PDB)
    3. Build set of known true and false correlations for use in evaluating prediction algorithm (Note: also ask Dev Sidhu, when available). See [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&dopt=AbstractPlus&list_uids=10871264 Baldi et al. review]

  6. Amino acid group subproject
    1. Learn about amino acid groups
    2. Define an initial aa grouping (reasonable grouping from Levy paper)
    3. Add new feature to ResidueResidueCorrelation class so it considers grouping + run on PDZ data. This involves implementing the groups as a reduced alphabet (amino acids in a group are considered equivalent)

    4. Try all groupings to see how it affects the results (from Levy paper)
    5. See if we can incorporate aa similarity defined by substitution matrix approach (e.g. BLOSUM, PAM, GONNET) into our method, instead of grouping
    6. Similarly, evaluate aa similarity defined by factor analysis (Atchley et al paper)
  7. Think about new PDZ domain features that can be used for prediction.

Ideas

Courses

Biology

Protein Structure

Machine Learning

Committee Meetings

Team

Tools/Resources

Domains

Databases

Sequence Alignment

Multiple

Hierarhical Methods

Non Hierarchical Methods

Probabilistic Methods

Viewers

Background Literature

More General

Substitution Matrices

Specificity Prediction/Inference

Amino Acid Alphabets

Other


CategoryProject

DomainSpecificityPredictionProject (last edited 2007-09-17 16:08:15 by ShirleyHui)

MoinMoin Appliance - Powered by TurnKey Linux