| |||||||||
|
The final output from WebMOTIFS is a table of motifs, with graphical logos and scores. The result is provided on our server, and can be downloaded. The outputs from de novo motif discovery and Bayesian motif discovery are given separately.
The output from de novo motif discovery is a table of clusters of similar motifs. Each cluster is represented by an average motif, a number of scores, and a table of all the individual motifs in the cluster.
The output from Bayesian motif discovery is a table of motifs, each refined from one of the selected initial hypothesis. Each motif is given with a number of scores, the motif from which it was refined, and the DNA-binding domain most likely to bind the motif.
Output from de novo motif discovery:
(1) Input sequences for motif discovery.
On submission, the user entered a list of gene or probe names. This link gives the sequences corresponding to the promoters of these genes,in FASTA format. (2) Output from each motif discovery program run. This is the intermediate output from motif discovery. It contains all motifs found, both significant and insignificant, in text-only format. For each program, we provide both raw output from each motif discovery program and text-based output in the TAMO format (the same for all programs) For a description of the format of this output, see here. (3) For details on WebMOTIFS' significance filtering of results from de novo motif discovery, see here. For details on WebMOTIFS' clustering of results from de novo motif discovery, see here. Each cluster of motifs is represented by one line in the table below. Each cluster contains a number of similar motifs, often found by a variety of motif discovery programs. These motifs are averaged, as described here, to produce a single motif. All the motifs in the cluster are variants of this one average motif. (4) A graphical sequence logo representation of the average motif for the cluster. The height of letters at each position is proportional to the information content at that position. In other words, given a multiple sequence alignment for a motif, the total height at a given position in the sequence logo reflects the sequence conservation at that position, and the relative height of the different letters at a position reflects the relative frequency of each nucleic acid at that position. For more information on sequences logos, see: Schneider TD, Stephens RM. "Sequence Logos: A New Way to Display Consensus Sequences." Nucleic Acids Res. 18 (1990): 6097-6100. The logo is generated with WebLogo. Reference for WebLogo: Crooks GE, Hon G, Chandonia JM, Brenner SE. "WebLogo: A sequence logo generator." Genome Research 14 (2004):1188-1190. (5) For details on WebMOTIFS' scoring methods for de novo motif discvoery, see here. The enrichment score, bits, and group specificity score for the cluster's average motif are calculated in the same way as scores for motifs found directly by MD programs. The median enrichment z-score is calculated from the z-scores of all the motifs in the cluster. (6) A list of motif discovery programs that found this motif. If a motif discovery program found at least one motif in this cluster, that program is listed here. (7) The number of motifs in this cluster (the motifs that were averaged together and used to derive the sequence logo and scores). The link goes to a table of all the motifs that were placed in this cluster.
This table represents part of the single cluster. It contains the significant motifs that were combined to produce that cluster. Each line in the table represents a different motif.
Output from Bayesian Motif Discovery
(1) Input sequences for motif discovery.
On submission, the user entered a list of gene or probe names. This link gives the sequences corresponding to the promoters of these genes,in FASTA format. (2) Text-only output for top three starting hypotheses. The user gave a list of starting hypotheses--transcription factor families that might regulate the input genes. For each chosen transcription factor family, WebMOTIFS retrieves a few characteristic binding sites and refines these characteristic motifs. Then, the chosen transcription factor families are ranked by the cross-validation error of the best motif derived from each family. For each family, WebMOTIFS provides two text output files: the raw output from THEME and a list of discovered motifs in TAMO format. To read about the format of these output files, see here. (3) Motifs are ranked by cross-validation: a lower cross-validation(cv) error is better. Motifs with a cv error less than or equal to .4 are in the "Most significant motifs" table. Motifs with a cv error greater than .4 are rarely significant: these are in the "Less significant motifs" table. (4) A graphical sequence logo representation of the average motif for the cluster. The height of letters at each position is proportional to the information content at that position. In other words, given a multiple sequence alignment for a motif, the total height at a given position in the sequence logo reflects the sequence conservation at that position, and the relative height of the different letters at a position reflects the relative frequency of each nucleic acid at that position. For more information on sequences logos, see: Schneider TD, Stephens RM. "Sequence Logos: A New Way to Display Consensus Sequences." Nucleic Acids Res. 18 (1990): 6097-6100. The logo is generated with WebLogo. Reference for WebLogo: Crooks GE, Hon G, Chandonia JM, Brenner SE. "WebLogo: A sequence logo generator." Genome Research 14 (2004):1188-1190. (5) To read how WebMOTIFS scores results from Bayesian motif discovery, see here. (6) Transcription factor family from which this motif was derived. WebMOTIFS retrieves a few characteristic binding sites for each chosen transcription factor family. Then, WebMOTIFS refines each of these characteristic motifs to better match repeated patterns in the input sequences. The motif logo represent the refined motifs; the "Starting hypothesis" shows the unrefined motif, and the transcription factor family for which it is a characteristic binding site. Questions or comments? Please email tamo@mit.edu. Website created by Katherine Romer, MIT class of 2008 Last updated 1/18/2007 |