PPMD usage: 'python ${TAMO}/TAMO/MD/PPMD.py input_sequences.fasta [-width W] [-kmerseeds] [-prior seedfile] [-pp pos_prior.score] [-bg genome.freq]' The JBD motif discovery tool requires 2 types of input: sequence data and a seeding method Sequence data is provided as a Fasta-formatted file. Seeding information can be provided in a number of ways described below. input_sequences.fsa: a Fasta formatted file containing the sequence data set [-width W]: the width of motif to search for [-kmerseeds]: use the most statistically enriched kmers to seed EM during motif discovery [-prior seedfile]: provide custom-computed seeds as text strings or in a TAMO-formatted file [-pp pos_prior.score]: provide positional priors on motif position i the file pos_prior.score [-bg genome.freq]: kmer background frequencies for use as motif discovery background model NOTES: A seeding method must be specified. The -kmerseeds option scans the input file for statistically enriched kmers and uses the top 20 kmers as seeds. If the -kmerseeds option is specified, the user must also specify a width using the [-width] option. Statistical enrichment is assessed relative to a library of all yeast intergenic regions. Custom-computed seeds can also be specified as text strings or provided as TAMO-formatted motif files using the [-prior] option (see examples below). If custom seeds are provided no width need be specified. Positional prior data provided to the PPMD program must have sequence identifiers consistent with the IDs in the Fasta-formatted sequences file. The default Markov background file is based on S. cerevisiae intergenic regions. Custom kmer frequencies can be provided in a separate file by using the [-bg genome.freq] option. The file genome.freq should be formatted as a simple 2-column tab-delimited text file, with kmers (in uppercase) in the first column and floating point kmer frequencies in the second column. Examples: To run motif discovery on the sequences in GCN4.fasta using statistically enriched 8-mers as seeds: python PPMD.py GCN4.fasta -width 8 -kmerseeds To run motif discovery on the sequences in GCN4.fasta using ATGACTCA and CTGAGTCA as seeds, and using pre-computed positional priors from the file GCN4.scores: python PPMD.py GCN4.fasta -prior ATGACTCA -prior CTGAGTCA -pp GCN4.scores To run motif discovery on the sequences in GCN4.fasta using position-weight matrices in GCN4.tamo as seeds, and using the custom Markov background specified in custom_bg.freq: python PPMD.py GCN4.fasta -prior GCN4.tamo -bg custom_bg.freq