Since transcription factor binding sites are usually degenerate, we first generate all possible permutated motifs with given length and number of mismatches. Preliminary results show that a search space including 7mer (1 mismatch), 8mer (1 mismatch) and 9mer (2 mismatches) is a good starting point for many transcription factors. However, the program can be easily modified to search longer motifs and to incorporate prior knowledge about the common motif of interest.
To maximally utilize biological information, the program achieves a better signal to noise ratio by setting a high cut-off so that at least 85% of the input sequences should include the motif of interest. Then the program summarizes the search result in a unique way taking advantage of the fact that the real motif will be represented multiple times in the short motif permutation.
Please contact Jichao Chen for details.