Jonathan A. Eisen, Institute For Genomic Research
Robert S. Coyne, Institute For Genomic Research
Martin Wu, Institute For Genomic Research
Dongying Wu, Institute For Genomic Research
Mathangi Thiagarajan, Institute For Genomic Research
Jennifer R. Wortman, Institute For Genomic Research
Jonathan H. Badger, Institute For Genomic Research
Qinghu Ren, Institute For Genomic Research
Paolo Amedeo, Institute For Genomic Research
Kristie M. Jones, Institute For Genomic Research
Luke J. Tallon, Institute For Genomic Research
Arthur L. Delcher, Institute For Genomic Research
Steven L. Salzberg, Institute For Genomic Research
Joana C. Silva, Institute For Genomic Research
Brian J. Haas, Institute For Genomic Research
William H. Majoros, Institute For Genomic Research
Maryam Farzad, Institute For Genomic Research
Jane M. Carlton, Institute For Genomic Research
Roger K. Smith, Institute For Genomic Research
Jyoti Garg, York University
Ronald E. Pearlman, York University
Kathleen M. Karrer, Marquette UniversityFollow
Lei Sun, Marquette University
Gerard Manning, The Salk Institute for Biological Studies
Nels C. Elde, University of Chicago
Aaron P. Turkewitz, University of Chicago
David J. Asai, Harvey Mudd College
David E. Wilkes, Harvey Mudd College
Yufeng Wang, University of Texas at San Antonio
Hong Cai, University of Texas at San Antonio
Kathleen Collins, University of California - Berkeley
B. Andrew Stewart, University of California - Berkeley
Suzanne R. Lee, University of California - Berkeley
Katarzyna Wilamowska, University of Washington - Seattle Campus
Zasha Weinberg, University of Washington - Seattle Campus
Walter L. Ruzzo, University of Washington - Seattle Campus
Dorota Wloga, University of Georgia
Jacek Gaertig, University of Georgia
Joseph Frankel, University of Iowa
Che-Chia Tsao, University of Rochester
Martin A. Gorovsky, University of Rochester
Patrick J. Keeling, University of British Columbia
Ross F. Waller, University of British Columbia
Nicola J. Patron, University of British Columbia
J. Michael Cherry, Stanford University
Nicholas A. Stover, Stanford University
Cynthia J. Krieger, Stanford University
Christina del Toro, University of California - Santa Barbara
Hilary F. Ryder, University of California - Santa Barbara
Sondra C. Williamson, University of California - Santa Barbara
Rebecca A. Barbeau, University of California - Santa Barbara
Eileen P. Hamilton, University of California - Santa Barbara
Eduardo Orias, University of California - Santa Barbara

Document Type




Format of Original

23 p.

Publication Date



Public Library of Science

Source Publication

PLoS Biology

Source ISSN


Original Item ID

doi: 10.1371/journal.pbio.0040286


The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC) has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC), which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases), using diverse resources (e.g., proteases and transporters), and generating structural complexity (e.g., kinesins and dyneins). In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates), no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from other model organisms makes T. thermophila an ideal model for functional genomic studies to address biological, biomedical, and biotechnological questions of fundamental importance.


Published version. PLoS Biology, Vol. 4, No. 9 (September 2006): 1620-1642. DOI. © 2006 Public Library of Science. Published under Creative Commons Attribution License 2.5. (fs1).pdf (115 kB)
FIGURE S1 (A) Scaffolds larger than 1 Mb were sorted by size and concatenated to make a pseudo molecule. Statistics of nucleotide composition were calculated for 2,000 bp sliding windows with a shift length of 1,000 bp. Yellow, GC skew; blue, GC%; purple, χ2 score. The green lines delimit the scaffolds (long) or contigs within each scaffold (short). (B) Analysis of three T. thermophila scaffolds of diverse size. Red boxes, genes on forward strand; green boxes, genes on reverse strand; blue, χ2 score; orange, GC%; brown, GC skew; salmon, AT skew. The vertical light gray lines delimit contigs within each scaffold. Scaffold sizes: 8254645, 1,076 kb; 8254654, 510 kb; 8254072, 37.3 kb. (fs2).pdf (92 kB)
FIGURE S2 Using scaffolds larger than 100 kb, the percentage of predicted gene coding sequence was calculated within 10-kb windows. For the overall gene density (black bars), a sliding 10-kb window was applied at 2-kb intervals. Gray bars represent gene density in the 10-kb adjacent to each telomere. (fs3).pdf (17 kB)
FIGURE S3 Comparison of the percentage of introns in various size classes for both ab initio predicted genes (gray bars) and introns confirmed by EST sequencing (black bars). (fs4).pdf (420 kB)
FIGURE S4 (A) tRNA charging and expression. Total RNA was harvested from T. thermophila in log-phase growth (lanes 1 and 2) or after resuspension in 10 mM Tris starvation buffer for the times indicated. Total RNA samples were resolved by acid/urea acrylamide gel electrophoresis and transferred to nylon membrane; the same total RNA sample either untreated or deacylated at alkaline pH was used for lanes 1 and 2. Probing was performed using end-radiolabeled oligonucleotides specific for the tRNA of interest. (B) Expression levels of ncRNAs under various conditions. Total RNA was harvested from T. thermophila under the growth or development conditions indicated, resolved, transferred, and probed as in (A). As an internal control for even loading, the same blot was hybridized to detect tRNA-Sec and SRP RNA (RNA PolIII transcripts found predominantly in the cytoplasm and involved in translation) and also to U1 and U2 snRNAs (RNA PolII transcripts found predominantly in the nucleus and involved in mRNA splicing). (fs5).pdf (30 kB)
FIGURE S5 Orange points represent scaffolds that have been capped with telomeres at both ends. (fs6).pdf (71 kB)
FIGURE S6 Neighbor-joining tree built from ClustalW alignment of polo kinase domains. Species abbreviations: Hs, H. sapiens; Dm, D. melanogaster; Ce, Caenorhabditis elegans; Sc, S. cerevisiae; Dd, D. discoideum; Tt, T. thermophila. Note that T. thermophila has multiple members of both the polo and sak subfamilies, and that even within the T. thermophila–specific cluster, sequences are as divergent as orthologs from vertebrates and lower metazoans. The bar indicates scale of average substitutions per site. (fs7).pdf (39 kB)
FIGURE S7 Unrooted neighbor-joining tree for Rab GTPases. Bootstrap values over 40% (from 100 replicates) are indicated near corresponding branches. Predicted T. thermophila genes are in bold. Other Rabs are from H. sapiens (Hs), D. melanogaster (Dm), and S. cerevisiae (Sc). Proposed Rab families [157] are shown in colored blocks. Asterisks indicate Rabs for which there is functional evidence (**) or at least localization data (*) consistent with their groupings. T. thermophila genes cluster with the members of each Rab family except VII and IV (not shown in a box). There are three clades comprised exclusively of T. thermophila gene predictions (clades I, II, and III) shown in dark gray boxes. (ts1).doc (28 kB)
TABLE S1 (ts2).doc (52 kB)
TABLE S2 (ts3).doc (352 kB)
TABLE S3 (ts4).doc (167 kB)
TABLE S3 (ts5).doc (30 kB)
TABLE S5 (ts6).doc (1022 kB)
TABLE S6 (A) 5S. (B) tRNA. (C) Other ncRNAs. (D) tRNA gene IDs. (ts7).doc (388 kB)
TABLE S7 (ts8).doc (114 kB)
TABLE S8 (ts9).doc (73 kB)
TABLE S9 (ts10).doc (1966 kB)
TABLE S10 (ts11).doc (3611 kB)
TABLE S11 (A) Kinases. (B) Membrane transporters. (C) Proteases. (D) Cytoskeletal related. (ts12).doc (90 kB)
TABLE S12 (ts13).doc (134 kB)
TABLE S13 (ts14).doc (59 kB)
TABLE S14 (ts15).doc (159 kB)
TABLE S15 (ts16).doc (25 kB)
TABLE S16 (ts17).doc (93 kB)

Included in

Biology Commons