Tree Construction
Homologs of each of the 31 phylogenetic marker genes(dnaG, frr, infC, nusA, pgk, pyrG, rplA, rplB, rplC, rplD, rplE, rplF, rplK, rplL, rplM, rplN, rplP, rplS, rplT, rpmA, rpoB, rpsB, rpsC, rpsE, rpsI, rpsJ, rpsK, rpsM, rpsS, smpB, tsf) were identified from the 578 bacterial genomes that were complete at the time of the study. The corresponding protein sequences were retrieved, aligned, and trimmed and then concatenated by species into a mega-alignment [15]. A maximum likelihood tree was then constructed from the mega-alignment using PHYML. The model selected based on the likelihood ratio test was the Whelan and Goldman (WAG) model of amino acid substitution with gamma-distributed rate variation (5 categories) and a proportion of invariable sites. The shape of the gamma-distribution and the proportion of the invariable sites were estimated by the program
Tree Labeling
The number of ribosomal operons in each genome and the size of the genome were obtained from the NCBI website http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi. In a small number of instances bacteria are considered to have multiple chromosomes. In these cases, the total number of operons in all the chromosomes was used and the combined mass of the multiple chromosomes used for genome size. In addition, in some instances the number of copies of each rRNA is different. This is most frequent for 5S rRNA, which may be present in an extra copy. In these cases, the number of 16S rRNA genes was used as the number of operons as in most practical applications it is 16S rRNA that is being examined. The tree was combined with the operon and information and built using Newick format such that each node is specified http://en.wikipedia.org/wiki/Newick by "species-name*genome-size*rRNA-operon-count". The organism names on the tree were colored according to either operon number or genome size. In each case, as the parameter increases the color generally becomes darker. Thus, for the operons 14 colors were used. For 0 to 6 operons, shades of yellow, orange or red were used with darker colors indicating larger numbers of operons. For 7 to 10 operons shades of blue were used and greens were used for 11 or more. In the case of genome size, 12 colors were used to depict various size ranges. The first range was 0-1 MB with subsequent increments of 0.5 MB. The final range was for genomes greater than 6 MB in size. The final tree was created in the .esp format using ATV [16].