The Fraenkel Lab Biological Engineering MIT
 
Overview Input
MD programs
Scoring Int. Output Clustering Output Try it!
The following motif discovery programs are offered through the WebMOTIFS interface:

De novo Motif Discovery

Bayesian Motif Discovery

All are run with double-stranded analysis, without masking, and with background files appropriate to the species and sequence type being analyzed. Details on the parameters with which these programs are run are given below.

De novo Motif Discovery

WebMOTIFS uses the programs AlignAce, MDscan, MEME, and Weeder. We gratefully acknowledge the authors of these programs.

Users of WebMOTIFS may be bound by copyrights and user agreements of these programs. Please see the following web pages for details:

AlignACE: http://arep.med.harvard.edu/mrnadata/mrnasoft.html
MDscan: Available under the MIT license http://motif.stanford.edu/distributions/mit_license.html
MEME: http://meme.sdsc.edu/meme/COPYRIGHT.htmlM
Weeder: http://159.149.109.16:8080/WeederNew/Register.html

AlignACE   AlignAce (Aligns Nucleic Acid Conserved Elements) finds motifs in a set of DNA sequences, using a Gibbs sampling strategy. AlignAce can be found at http://atlas.med.harvard.edu/.

Parameters: AlignACE does double-stranded analysis, with no masking. It searchs for motifs of width 10, and permits multiple occurrences of a motif in a sequence.
10 iterations of AlignACE are run, and the results from all runs are combined.

References: "Hughes, JD, Estep, PW, Tavazoie S, & GM Church, "Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae", Journal of Molecular Biology 2000 Mar 10;296(5):1205-14.

Roth, FR, Hughes, JD, Estep, PE and GM Church, "Finding DNA Regulatory Motifs within Unaligned Non-Coding Sequences Clustered by Whole-Genome mRNA Quantitation", Nature Biotechnology 1998 Oct;16(10):939-45.
 

MDscan   MDscan (Motif Discovery Scan) is a motif discovery program that combines word enumeration and position-specific weight matrix updating. MDscan can be found at http://ai.stanford.edu/~xsliu/MDscan/

Parameters: Mdscan does double-stranded analysis, with no masking. It searches for motifs of width at least 8 nucleotides and at most a user specificied maximum (8,10,or12). It allows multiple occurrences of a motif per sequence.

Reference: Liu, XS, Brutlag, DL, and Liu, JS. "An algorithm for finding protein-DNA binding sites with applications to chromatin immunoprecipitation microarray experiments." Nat Biotechnology 2002 20(8):835-9.
 

MEME   MEME is a motif discovery program that uses expectation maximization to fit a two-component finite mixture model to a set of sequences. MEME can be found at http://meme.sdsc.edu/meme/intro.html

Parameters: MEME does double-stranded analysis, with no masking. It searches for motifs of length at least 6 and at most 18. It assumes zero of one occurence of the motif per sequence, and returns the top 6 discovered motifs.

Reference: Bailey, Timothy and Elkan, Charles "Fitting a mixture model by expectation maximization to discover motifs in biopolymers", Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994.
 

Weeder   Weeder uses word enumeration to find possible motifs in a set of DNA sequences, and then determines which of these motifs are most likely to be correct. Weeder can be found at: http://159.149.109.16:8080/weederWeb/

Parameters: Weeder does double-stranded analysis, with no masking. It allows zero or once occurrence of a motif per sequence (each motif must appear in at least half the sequences.)
If user indicates that motifs are expected to be less than 10 nucleotides long, Weeder searches for motifs of width 6 with at most 1 error and motifs of width 8 with at most 2 errors.
If user indicates that motifs are expected to be less than 12 nucleotides, Weeder searches for all the above, plus motifs of width 10 with at most 3 error.
If user indicates that motifs are expected to be less than 14 nucleotides long, Weeder searches for the above, plus motifs of width 12 with at most 4 errors.

Reference: Pavesi, Giulio, et al. "Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes." Nucleic Acids Research. 2004 32: 199-203.

Bayesian motif discovery

THEME   THEME takes initial hypotheses for motifs, and refines them. THEME was developed by the Fraenkel lab and can be found at http://fraenkel.mit.edu/THEME.

Parameters: THEME does double-stranded analysis, with no masking. It is run with 2-fold cross validation, and with possible beta values (optimal weighting of the initial hypothesis) of .1, .22, .33, .5, and 1.

Reference: MacIssac et al., A hypothesis-based approach for identifying the binding specificity of regulatory proteins from chromatin immunoprecipitation data", Bioinformatics 22(4)2006: 423-429.


Questions or comments? Please email tamo@mit.edu.

Website created by Katherine Romer, MIT class of 2008
Last updated 1/15/2007