Fly TF motifs
This page provides motif models reported in Down, Bergman, Su & Hubbard (2007) Large scale discovery of promoter motifs in Drosophila melanogaster. PLoS Computational Biology 3:e7.
Included here are motif models in the form of position weight matrices (PWMs) for 56 Drosophila melanogaster transcription factors curated from 51 primary references. These curated data should be of use for verifying the results of computational motif inference projects in the genus Drosophila (e.g. the Tiffin database), and are derived from two different sources: 1) in vitro binding site selection experiments (e.g. SELEX-like methods), and 2) consensus sequences derived from compiled genomic binding site sequences. PWMs are reported as frequencies (not raw counts) to standardize across publications and are not rescaled relative to a background model.
A graphical representation of the data in these curated motif models can be found here.
These curated TF motifs reported here have been incorporated into the following external resources:
Matrices are presented in .xms format, an xml-like format for motif models developed by Thomas Down at the Sanger Institute, which can be viewed using his MotifExplorer tool distributed in the NestedMICA package. Tags in these files provide information such as the source, primary reference and (where available) IDs for similar matrices in the Transfac and JASPAR databases. Since data presented here were curated independently, there are small discrepancies for some motifs with those in JASPAR/Transfac, which were resolved here to reflect how motifs are reported in the primary literature. Of the 62 PWMs reported here, 13 can be found in JASPAR (core) and 24 can be found in Transfac 7.0. Since the JASPAR motifs are a subset of the Transfac motifs, 38 of the motifs reported here are found in neither of these other online resources as of 2007.
General notes: Two of the 56 factors have PWMs for different DNA binding domains in the same protein (prd-HD/prd-PD and shn-ZFP1/shn-ZFP2), four PWMs are from different isoforms of the same gene (br-Z1, br-Z2, br-Z3 & br-Z4), two PWMs are for heterodimeric factors (EcR-usp and dif-Rel), and two PWMs are reported for the same factor (dl-A and dl-B) from different experiments, giving a total of 62 PWMs. Site selection experiments were excluded from this dataset where a segment of the oligo was held constant to a partial recognition sequence (e.g. Dfd and ftz). Frequencies for compiled data are derived from the reported IUPAC consensus string, not the frequencies of the alignment block, since many of these are derived from small samples.
In addition to these curated motif models, .xms files corresponding Supplemental Figures 1-3 of of Down et al. (2007) Large scale discovery of promoter motifs in Drosophila melanogaster. PLoS Computational Biology 3:e7. are provided here to allow re-use of Supplemental data in this paper.
Download All Curated Motif Models: A flatfile in multi-xms format for 62 curated motif models (38 “selex” and 24 “consensus”). Note: this file includes 12 PWMs that were redundant with 13 JASPAR core PWMs not included in Supplemental File 3 below.
Download Selex Fasta Sequences: A .tar archive of available sequence data from in vitro site selection experiments used to generate 26 of the selex PWMs above.
Sequence data was kindly provided by Yasuko Akiyama-Oda, Olivier Bardot, Mark van Doren, Naoyuki Fuse, Mark Garfinkel, Shigeo Hiyashi, Jim Posakony, Kate Senger, David Wilson and Riqiang Yan. Please email firstname.lastname@example.org for questions or comments.
Please consider these data open access, and please cite Down, Bergman, Su & Hubbard (2007) Large scale discovery of promoter motifs in Drosophila melanogaster. PLoS Computational Biology 3:e7 if you use or redistribute these data.