The ciliate Tetrahymena thermophila is a model organism for molecular and cellular biology. Like other ciliates, this species has separate germline and soma functions that are embodied by distinct nuclei within a single cell. The germline-like micronucleus (MIC) has its genome held in reserve for sexual reproduction. The soma-like macronucleus (MAC), which possesses a genome processed from that of the MIC, is the center of gene expression and does not directly contribute DNA to sexual progeny. We report here the shotgun sequencing, assembly, and analysis of the MAC genome of T. thermophila, which is approximately 104 Mb in length and composed of approximately 225 chromosomes. Overall, the gene set is robust, with more than 27,000 predicted protein-coding genes, 15,000 of which have strong matches to genes in other organisms. The functional diversity encoded by these genes is substantial and reflects the complexity of processes required for a free-living, predatory, single-celled organism. This is highlighted by the abundance of lineage-specific duplications of genes with predicted roles in sensing and responding to environmental conditions (e.g., kinases), using diverse resources (e.g., proteases and transporters), and generating structural complexity (e.g., kinesins and dyneins). In contrast to the other lineages of alveolates (apicomplexans and dinoflagellates), no compelling evidence could be found for plastid-derived genes in the genome. UGA, the only T. thermophila stop codon, is used in some genes to encode selenocysteine, thus making this organism the first known with the potential to translate all 64 codons in nuclear genes into amino acids. We present genomic evidence supporting the hypothesis that the excision of DNA from the MIC to generate the MAC specifically targets foreign DNA as a form of genome self-defense. The combination of the genome sequence, the functional diversity encoded therein, and the presence of some pathways missing from other model organisms makes T. thermophila an ideal model for functional genomic studies to address biological, biomedical, and biotechnological questions of fundamental importance.
FIGURE S1 (A) Scaffolds larger than 1 Mb were sorted by size and concatenated to make a pseudo molecule. Statistics of nucleotide composition were calculated for 2,000 bp sliding windows with a shift length of 1,000 bp. Yellow, GC skew; blue, GC%; purple, χ2 score. The green lines delimit the scaffolds (long) or contigs within each scaffold (short). (B) Analysis of three T. thermophila scaffolds of diverse size. Red boxes, genes on forward strand; green boxes, genes on reverse strand; blue, χ2 score; orange, GC%; brown, GC skew; salmon, AT skew. The vertical light gray lines delimit contigs within each scaffold. Scaffold sizes: 8254645, 1,076 kb; 8254654, 510 kb; 8254072, 37.3 kb.
karrer.bio.plos.4-9.2006 (fs2).pdf (92 kB)
FIGURE S2 Using scaffolds larger than 100 kb, the percentage of predicted gene coding sequence was calculated within 10-kb windows. For the overall gene density (black bars), a sliding 10-kb window was applied at 2-kb intervals. Gray bars represent gene density in the 10-kb adjacent to each telomere.
karrer.bio.plos.4-9.2006 (fs3).pdf (17 kB)
FIGURE S3 Comparison of the percentage of introns in various size classes for both ab initio predicted genes (gray bars) and introns confirmed by EST sequencing (black bars).
karrer.bio.plos.4-9.2006 (fs4).pdf (420 kB)
FIGURE S4 (A) tRNA charging and expression. Total RNA was harvested from T. thermophila in log-phase growth (lanes 1 and 2) or after resuspension in 10 mM Tris starvation buffer for the times indicated. Total RNA samples were resolved by acid/urea acrylamide gel electrophoresis and transferred to nylon membrane; the same total RNA sample either untreated or deacylated at alkaline pH was used for lanes 1 and 2. Probing was performed using end-radiolabeled oligonucleotides specific for the tRNA of interest. (B) Expression levels of ncRNAs under various conditions. Total RNA was harvested from T. thermophila under the growth or development conditions indicated, resolved, transferred, and probed as in (A). As an internal control for even loading, the same blot was hybridized to detect tRNA-Sec and SRP RNA (RNA PolIII transcripts found predominantly in the cytoplasm and involved in translation) and also to U1 and U2 snRNAs (RNA PolII transcripts found predominantly in the nucleus and involved in mRNA splicing).
karrer.bio.plos.4-9.2006 (fs5).pdf (30 kB)
FIGURE S5 Orange points represent scaffolds that have been capped with telomeres at both ends.
karrer.bio.plos.4-9.2006 (fs6).pdf (71 kB)
FIGURE S6 Neighbor-joining tree built from ClustalW alignment of polo kinase domains. Species abbreviations: Hs, H. sapiens; Dm, D. melanogaster; Ce, Caenorhabditis elegans; Sc, S. cerevisiae; Dd, D. discoideum; Tt, T. thermophila. Note that T. thermophila has multiple members of both the polo and sak subfamilies, and that even within the T. thermophila–specific cluster, sequences are as divergent as orthologs from vertebrates and lower metazoans. The bar indicates scale of average substitutions per site.
karrer.bio.plos.4-9.2006 (fs7).pdf (39 kB)
FIGURE S7 Unrooted neighbor-joining tree for Rab GTPases. Bootstrap values over 40% (from 100 replicates) are indicated near corresponding branches. Predicted T. thermophila genes are in bold. Other Rabs are from H. sapiens (Hs), D. melanogaster (Dm), and S. cerevisiae (Sc). Proposed Rab families  are shown in colored blocks. Asterisks indicate Rabs for which there is functional evidence (**) or at least localization data (*) consistent with their groupings. T. thermophila genes cluster with the members of each Rab family except VII and IV (not shown in a box). There are three clades comprised exclusively of T. thermophila gene predictions (clades I, II, and III) shown in dark gray boxes.
karrer.bio.plos.4-9.2006 (ts1).doc (28 kB)
karrer.bio.plos.4-9.2006 (ts2).doc (52 kB)
karrer.bio.plos.4-9.2006 (ts3).doc (352 kB)
karrer.bio.plos.4-9.2006 (ts4).doc (167 kB)
karrer.bio.plos.4-9.2006 (ts5).doc (30 kB)
karrer.bio.plos.4-9.2006 (ts6).doc (1022 kB)
TABLE S6 (A) 5S. (B) tRNA. (C) Other ncRNAs. (D) tRNA gene IDs.
karrer.bio.plos.4-9.2006 (ts7).doc (388 kB)
karrer.bio.plos.4-9.2006 (ts8).doc (114 kB)
karrer.bio.plos.4-9.2006 (ts9).doc (73 kB)
karrer.bio.plos.4-9.2006 (ts10).doc (1966 kB)
karrer.bio.plos.4-9.2006 (ts11).doc (3611 kB)
TABLE S11 (A) Kinases. (B) Membrane transporters. (C) Proteases. (D) Cytoskeletal related.
karrer.bio.plos.4-9.2006 (ts12).doc (90 kB)
karrer.bio.plos.4-9.2006 (ts13).doc (134 kB)
karrer.bio.plos.4-9.2006 (ts14).doc (59 kB)
karrer.bio.plos.4-9.2006 (ts15).doc (159 kB)
karrer.bio.plos.4-9.2006 (ts16).doc (25 kB)
karrer.bio.plos.4-9.2006 (ts17).doc (93 kB)