Rhomboid homologs in mycobacteria: insights from phylogeny and genomic analysis

Background Rhomboids are ubiquitous proteins with diverse functions in all life kingdoms, and are emerging as important factors in the biology of some pathogenic apicomplexa and Providencia stuartii. Although prokaryotic genomes contain one rhomboid, actinobacteria can have two or more copies whose sequences have not been analyzed for the presence putative rhomboid catalytic signatures. We report detailed phylogenetic and genomic analyses devoted to prokaryotic rhomboids of an important genus, Mycobacterium. Results Many mycobacterial genomes contained two phylogenetically distinct active rhomboids orthologous to Rv0110 (rhomboid protease 1) and Rv1337 (rhomboid protease 2) of Mycobacterium tuberculosis H37Rv, which were acquired independently. There was a genome-wide conservation and organization of the orthologs of Rv1337 arranged in proximity with glutamate racemase (mur1), while the orthologs of Rv0110 appeared evolutionary unstable and were lost in Mycobacterium leprae and the Mycobacterium avium complex. The orthologs of Rv0110 clustered with eukaryotic rhomboids and contained eukaryotic motifs, suggesting a possible common lineage. A novel nonsense mutation at the Trp73 codon split the rhomboid of Mycobacterium avium subsp. Paratuberculosis into two hypothetical proteins (MAP2425c and MAP2426c) that are identical to MAV_1554 of Mycobacterium avium. Mycobacterial rhomboids contain putative rhomboid catalytic signatures, with the protease active site stabilized by Phenylalanine. The topology and transmembrane helices of the Rv0110 orthologs were similar to those of eukaryotic secretase rhomboids, while those of Rv1337 orthologs were unique. Transcription assays indicated that both mycobacterial rhomboids are possibly expressed. Conclusions Mycobacterial rhomboids are active rhomboid proteases with different evolutionary history. The Rv0110 (rhomboid protease 1) orthologs represent prokaryotic rhomboids whose progenitor may be the ancestors of eukaryotic rhomboids. The Rv1337 (rhomboid protease 2) orthologs appear more stable and are conserved nearly in all mycobacteria, possibly alluding to their importance in mycobacteria. MAP2425c and MAP2426c provide the first evidence for a split homologous rhomboid, contrasting whole orthologs of genetically related species. Although valuable insights to the roles of rhomboids are provided, the data herein only lays a foundation for future investigations for the roles of rhomboids in mycobacteria.


Background
The genus Mycobacterium consists of~148 species [1], of which some are leading human and animal pathogens. Tuberculosis (TB), the most important mycobacterial disease, is caused by genetically related species commonly referred to as "the Mycobacterium tuberculosis Complex" (MTC: Mycobacterium tuberculosis; M. bovis, also the causative agent of bovine TB; M. bovis BCG; M. africanum; M. carnetti and M. microti [2]). M. leprae and M. ulcerans are respectively the causative agents for two other important diseases, Leprosy and Buruli ulcer [3,4]. Besides the three major diseases, M. avium subsp. Paratuberculosis causes John's disease (a fatal disease of dairy cattle [5]) and is also suspected to cause Crohn's disease in humans [5]. In addition, M. avium and other non-tuberculous mycobacteria (NTM) have become important opportunistic pathogens of immunocompromised humans and animals [6,7].
Mycobacteria have versatile lifestyles and habitats, complexities also mirrored by their physiology. While some can be obligate intracellular pathogens (i.e. the MTC species) [8], others are aquatic inhabitants, which can utilize polycyclic aromatic hydrocarbons (i.e. M. vanbaalenii) [9]. The biology of pathogenic mycobacteria remains an enigma, despite their importance in human and veterinary medicine. Except for the mycolactone of M. ulcerans, glycolipids (such as PDIMs) and proteins (such as ESAT-6) of MTC species [10,11], largely, in contrast to most bacterial pathogens, pathogenic mycobacteria lack obvious virulence factors and the mechanisms in which they cause diseases are still obscure [12]. Genome sequencing projects have provided invaluable tools that are accelerating the understanding of the biology of pathogenic mycobacteria. As such, genome sequencing data has guided the characterization of genes/pathways for microbial pathogens, accelerating discovery of novel control methods for the intractable mycobacterial diseases [5,[13][14][15][16].
The rhomboid protein family exists in all life kingdoms and has rapidly progressed to represent a ubiquitous family of novel proteins. The knowledge and the universal distribution of rhomboids was engendered and accelerated by functional genomics [17]. The first rhomboid gene was discovered in Drosophila melanogaster as a mutation with an abnormally rhomboid-shaped head skeleton [17,18]. Genome sequencing data later revealed that rhomboids occur widely in both eukaryotes and prokaryotes [17]. Many eukaryotic genomes contain several copies of rhomboid-like genes (seven to fifteen) [19], while most bacteria contain one homolog [19].
Despite biochemical similarity in mechanism and specificity, rhomboid proteins function in diverse processes including mitochondrial membrane fusion, apoptosis and stem cell differentiation in eukaryotes [20]. Rhomboid proteases are also involved in life cycles of some apicomplexan parasites, where they participate in red blood cell invasion [21][22][23][24][25]. Rhomboids are now linked to general human diseases such as early-onset blindness, diabetes and pathways of cancerous cells [20,26,27]. In bacteria, aarA of Providencia stuartii was the first rhomboid homolog to be characterized, which was shown to mediate a non-canonical type of quorum sensing in this gram negative species [28][29][30]. Since then, bacterial rhomboids are being characterized, albeit at low rate; gluP of Bacillus subtilis is involved in cell division and glucose transport [31], while glpG of Escherichia coli [17,32] was the first rhomboid to be crystallized, paving way for delineation of the mechanisms of action for rhomboid proteases [33,34].
Although universally present in all kingdoms, not all rhomboids are active proteases [19,35]. Lemberg and Freeman [35] defined the rhomboid family as genes identified by sequence homology alone, and the rhomboid proteases as a subset that includes only genes with all necessary features for predicted proteolytic activity. As such, rhomboid-like genes in eukaryotic genomes are classified into the active rhomboids, inactive rhomboids (known as the iRhoms) and a diverse group of other proteins related in sequence but predicted to be catalytically inert. The eukaryotic active rhomboids are further divided into two subfamilies: the secretase rhomboids that reside in the secretory pathway or plasma membrane, and the PARL subfamily, which are mitochondrial [35].
Despite their presence in virtually all eubacteria, there is a paucity of information about the functions of bacterial rhomboids. Hitherto, full phylogenetic analysis of rhomboids from the complex and populous prokaryotes has not been done; although it can provide important functional and evolutionary insights [17,35], it is a huge and difficult task to perform at once. Many species of mycobacteria contain two copies of rhomboid homologs whose sequences have not been investigated for the presence of functional signatures. Furthermore, actinobacteria can have up to five copies of rhomboids, the significance of which is currently not known. This study aimed at determining the distribution, evolutionary trends and bioinformatic analysis of rhomboids from an important genus -Mycobacterium.
Herein we report that mycobacterial rhomboids are active proteases with different evolutionary history, with Rv0110 orthologs representing a group of prokaryotic rhomboids whose progenitor may be the ancestor for eukaryotic rhomboids.

Results and discussion
A quest for the role(s) of rhomboids in mycobacteria is overshadowed by their diverse functions across kingdoms and even within species. Their presence across kingdoms implies that rhomboids are unusual useful factors that originated early in the evolution of life and have been conserved [20]. However, neither the reason for their implied significance nor the path of their evolution are understood; the key to answering these questions is rooted in understanding not only the sequence distribution of these genes, but more importantly, their functions across evolution [17,20]. This study reports that mycobacterial rhomboids are active rhomboid-serine-proteases with different evolutionary history. Reverse Transcriptase-PCRs on mycobacterial mRNA indicate that both copies of rhomboids are transcribed.
The distribution of rhomboids in mycobacteria: a nearly conserved rhomboid with unique genome organization across the genus In determining the distribution of rhomboid homologs in mycobacteria, we used the two rhomboids of M. tuberculosis H37Rv, Rv0110 (rhomboid protease 1) and Rv1337 (rhomboid protease 2) as reference and query sequences. Many mycobacterial genomes contained two rhomboids, which were orthologous either to Rv0110 or Rv1337. However, there was only one homolog in the genomes of the MAC (Mycobacterium avium complex) species, M. leprae and M. ulcerans, which were orthologous either to Rv1337 (MAC and M. leprae rhomboids) or Rv0110 (M. ulcerans rhomboid). M. ulcerans was the only mycobacterial species with an ortholog of Rv0110 as a sole rhomboid. Thus, with the exception of M. ulcerans which had a rhomboid-like element (MUL_3926, pseudogene), there is a genome-wide conservation of the rhomboids orthologous to Rv1337 (rhomboid protease 2) in mycobacteria (figure 1).
Despite evolutionary differences across the genus, the Rv1337 mycobacterial orthologs shared a unique genome organization at the rhomboid locus, with many of the rhomboid surrounding genes conserved (figure 1). Typically, upstream and downstream of the rhomboid were cysM (cysteine synthetase) and mur1 (glutamate racemase) encoding genes. Since Rv1337 orthologs are almost inseparable from mur1 and cysM, it is likely that they are co-transcribed (polycistronic) or functional partners. As such, we may consider the cluster containing mycobacterial Rv1337 orthologs as a putative operon. According to Sassetti et al [36,37], many of the rhomboid surrounding genes are essential while others (including rhomboid protease 2, Rv1337) are required for the survival of the tubercle bacillus in macrophages [38].
Despite massive gene decay in M. leprae, ML1171 rhomboid had similar genome arrangement observed for mycobacterial species. Upstream of ML1171 were gene elements (pseudogenes) ML1168, ML1169 and ML1170 (the homolog of cysM which is conserved downstream most Rv1337 orthologs). Similar to M. lepare, the MAC species also had an ortholog of Rv1337 as a sole rhomboid; perhaps the ortholog of Rv0110 was lost in the progenitor for MAC and M. leprae (these species are phylogenetically related and appear more ancient in comparison to M. marinum, M. ulcerans and MTC species [39]). In contrast to most mycobacterial genomes, cysM was further upstream the M. marinum rhomboid (MMAR_4059); and despite being genetically related to MTC species [40], MMAR_ 4059 does not share much of the genome organization observed for Rv1337 MTC orthologs (figure 1).
The rhomboid-like element of M. ulcerans (MUL_3926, pseudogene) was identical to MMAR_4059 (~96% similarity to MMAR_4059) with a 42 bp insertion at the beginning and eight single nucleotide polymorphisms (SNPs). Perhaps the insertion disrupted the open reading frame (ORF) of MUL_3926, converting it into a pseudogene. Interestingly, MUL_3926 nearly assumed the unique organization observed for mycobacterial orthologs of Rv1337, in which the rhomboid element was upstream of mur1.
The functional and evolutionary significance for the unique organization of the Rv1337 orthologs in mycobacteria is not clear. Since physiological roles are not yet ascribed to mycobacterial rhomboids, it is not certain whether MUL_3926 (psuedogene) would mimic similar roles in that it almost assumed similar genomic organization (note: functions have been ascribed to certain pseudogenes [41][42][43]). However, the fact that M. ulcerans is a new species (recently evolved from M. marinum [40]) that has undergone reductive evolution, MUL_3926 could be a consequence of these recent phenomena [44]. Interestingly, MUL_3926 was the only rhomboid-like element in mycobacteria.
In contrast, the genome organization for Rv0110 orthologs was not conserved, and mirrored the genetic relatedness of mycobacteria (figure 2). As such, the orthologs from MTC species, M. marinum and M. ulcerans, which are genetically related and are assumed to have the same M. marinum-like progenitor [39,40,45,46] had similar organization for Rv0110 ortholog. Downstream and upstream of the rhomboid were respectively, the transmembrane acyltransferase and the Proline-Glutamate polymorphic GC rich-repetitive sequence (PE-PGRS) encoding genes. PE-PGRS occurs widely in M. marinum and MTC genomes [39] but it was a pseudogene upstream MUL_4822 of M. ulcerans. The distances between MTC Rv0110 orthologs and the neighboring genes were long, in contrast to the short distances between Rv1337 rhomboids and their neighboring genes.
Similarly, the genome organization for the Rv0110 orthologs of M. gilvum, M. vanbaalenii and Mycobacterium species M.Jls, Mkms and Mmcs was also similar. Upstream and downstream the rhomboid was, respectively, the glyoxalase/bleomycin resistance protein/dioxygenase encoding gene and a gene that encodes a hypothetical protein. In contrast to MTC species, the Rv0110 orthologs in these species were close or contiguous with the neighboring genes (figure 2). The genome organization of MAB_0026 of M. abscessus and MSMEG_5036 of M. smegmatis were unique to these species (not shown).
Many bacterial genomes contain a single copy of rhomboid. However, filamentous actinobacteria such as Streptomyces coelator and Streptomyces scabiei have as many as four or five copies of rhomboid-like genes. Since multi-copy rhomboids in prokaryotic genomes are not yet characterized, it is not certain whether prokaryotic rhomboids can also have diverse functions, similar to multi-copy rhomboids in eukaryotic genomes. Mycobacteria and actinobacteria at large exhibit diverse physiological and metabolic properties. It remains to be determined whether the diversity in number, nature and functions of rhomboids can contribute to the complex lifestyles of these organisms [8].

Similarity between the two mycobacterial rhomboid paralogs
Across the genus, the similarity between the two mycobacterial rhomboid paralogs was as low as that between prokaryotic and eukaryotic rhomboids (~10-20% identity) [19]. Since paralogs perform biologically distinct functions [47], the two mycobacterial rhomboids may have distinct roles. Eukaryotic rhomboid paralogs are also dissimilar and differ in functions in a particular species [17]. In contrast, the orthologs had significantly high homology (see table 1), with an average identity of 74%. Rv0110 orthologs within the MTC and MAC species had an identity of~100% while those from other mycobacterial species had identities ranging from 61 to 78% (table 1). The exception was MAB_0026 of M. abscessus, which shared a significantly low homology with Rv0110 (38% identity at 214 amino acid overlap). Figure 1 Genomic arrangement for Rv1337 mycobacterial orthologs. Unique genome organization occurs for Rv1337 orthologs across the genus. mur1 was downstream and cysM upstream of the rhomboids (except M. marinum and MAC species). Colored block arrows: blue, cysM; green, rhomboid homologs; purple, mur1; black, rhomboid surrounding genes; white, pseudogene. White boxes indicate distances between rhomboids and upstream and downstream genes. Boxed (blue) are the species with similar arrangement for the rhomboids. This could be due to the large evolutionary distance between M. abscessus and other mycobacteria. Since proteins of~70% identity or more are likely to have similar functions [48], MAB_0026 may have unique roles.

The two mycobacterial rhomboids were acquired independently
To determine evolutionary relationship between the two rhomboid paralogs, phylogenetic analysis was done and included distant eukaryotic and prokaryotic rhomboids. The mycobacterial rhomboids clustered into two distinct clades with high Bootsrap values (99-100%), indicating that the rhomboids could have been acquired independently (figure 3A). Each clade consisted of rhomboids orthologous either to Rv0110 or Rv1337, grouped according to genetic relatedness of mycobacteria [39], with MAB_0026 of M. abscessus appearing the most distant. The phylogenetic analysis confirmed that the two mycobacterial rhomboids are paralogs, but their progenitor could not be determined. Thus, the mycobacterial rhomboid paralogs may be "outparalogs" (i.e. they could have resulted from duplication(s) preceding a speciation event [47]), while the orthologs could have originated from a single ancestral gene in the last common ancestor [47]). The Neighbor-Joining and Minimum Evolution phylogenetic trees were compared and gave almost comparable results.
The Rv0110 (rhomboid protease 1) mycobacterial orthologs (boxed blue) clustered with eukaryotic secretase and PARL rhomboids with a high Bootstrap value (85%, figure 3A). When grouped with eukaryotic iRhoms, the Bootstrap value for this clade increased to 90%, with iRhoms forming a distinct clade (not shown). The Rv0110 mycobacterial orthologs may represent prokaryotic rhomboids with similar lineage or progenitor for eukaryotic active rhomboids. This was previously noted by Koonin et al [19], who hinted on a subfamily of eukaryotic rhomboids that clustered with rhomboids of Gram positive bacteria. Indeed, the Rv0110 mycobacterial orthologs contained extra eukaryotic motifs and have topologies similar to that of rho-1 of drosophila. Koonin et al [19] alluded that rhomboids could have emerged in a bacterial lineage and were eventually widely disseminated (to other life kingdoms) by horizontal transfer [19]. Conversely, the Rv1337 mycobacterial orthologs (boxed red) formed a distinct clade, different from Rv0110 mycabacterial orthologs. These rhomboids appeared evolutionary stable and did not cluster with eukaryotic rhomboids.
MAB_0026 of M. abscess which had low homology with Rv0110 also appeared distant and clustered poorly with mycobacterial orthologs, in contrast with its paralog MAB_1481 (figure 3A). Since orthologs have an ancestral gene in the last common ancestor [47], MAB_0026 could be a "pseudoortholog" (i.e. it is a distant paralog that appears orthologous due to differential, lineage-specific gene loss [47]). In phylogenetic analysis of mycobacterial rhomboids orthologous to Rv0110, MAB_0026 was also distant from rhomboids of other actinobacteria (figure 3B). Since M. abscessus is one of the earliest species to diverge of all mycobacterial species [39], the low homology could reflect evolutionary distance or stability of this rhomboid. However, the high homology of MAB_1481 (62% identity with Rv1337) contrasts the low homology of MAB_0026 (38% identity with Rv0110), negating the notion of evolutionary distance and instead favors evolutionary stability of MAB_0026.

Mycobacterial rhomboids are active rhomboid-serineproteases
Multiple sequence alignment revealed that all mycobacteria rhomboids contain the putative rhomboid catalytic residues Gly199, Ser201 and His254. The mycobacterial rhomboids also contained two additional C-terminal Histidins (His145 and His150, which together with His254 are universally conserved in the rhomboid proteins [19]) and five invariant transmembrane residues (Gly202, Gly257, Gly261, Asn154 and Ala200) that are also conserved in many rhomboid proteins [33]. However in mycobacteria, Ala252 which occurs in many eukaryotic and prokaryotic rhomboids was substituted by Gly ( figure 4). Furthermore, Tyr205 which stabilizes the rhomboid protease active site of glpG [17,33] and of many rhomboid proteases was only conserved in MAB_0026 of M. abscessus, being substituted by Phe in mycobacterial rhomboids (figure 4). Thus, Phe is the stabilizing residue in the protease active site for majority of mycobacterial rhomboids (Phe is an additional stabilizing residue for rhomboid proteases [17]).
The nature of the transmembrane helices (TMHs) formed by mycobacterial rhomboids was analyzed to determine whether they conform to those of active  rhomboid proteases. Mycobacterial orthologs of Rv0110 formed seven TMHs and topologies similar to those of eukaryotic rhomboid rho-1 of Drosophila (see figure 5). As in rho-1, the rhomboid catalytic residues GxSx & H (Gly199, Ser201 and His254, × being any residue) were localized respectively, in TMH4 and TMH6 (see figure 5 and details in additional file 1). In mycobacterial orthologs of Rv0110, the two C-terminal histidine and asparagine (His145, His150 and Asn154) were localized in TMH2, in contrast to eukaryotic rhomboid proteases which have these residues in TMH3 [17,19,23]. However, in our analyses, we found His145, His150 and Asn154 in TMH2 in rho-1, similar to Rv0110 (see additional file 2). Despite the proteins being evolutionary diverse, other studies found the overall structure of TMHs of rhomboid proteases conserved, with eukaryotic rhomboid proteases containing seven TMHs while archaea and eubacteria contain six [23,49]. It is not clear whether these similarities infer evolutionary or functional significance; similar topologies with eukaryotic rhomboids could imply occurrence of a common bacterial universal progenitor for the eukaryotic rhomboids [19]. Nevertheless, prokaryotic and eukaryotic integral transmembrane proteins can have similar architecture, with striking similarity in the amino acid frequency distribution in their TMHs [50].
In contrast, the mycobacterial orthologs of Rv1337 formed either six or five TMHs, as observed in most bacterial and archaeal rhomboids [19]. The orthologs of pathogenic mycobacteria formed six TMHs, while those of non-pathogenic mycobacteria formed five (see figure  5). The GxSx and H catalytic residues were found respectively, either in TMH4 and TMH6 (for Rv1337 orthologs of pathogenic mycobacterial with six TMHs -see details in additional file 3) or in TMH3 and TMH5 (for Rv1337 orthologs of non pathogenic mycobacterial with five TMHs, see additional file 4). The mycobacterial orthologs with six TMHs had the two C-terminal His and Asn residues in TMH2, as in the Rv0110 orthologs; however, in the orthologs with five TMHs, these residues were outside the TMHs (see additional file 4). Although His145, His150 and Asn154 are not essential for catalytic activity [33], it is not clear whether their absence in TMHs can affect functionality. This seems unlikely in that functions have been ascribed to the catalytically inert eukaryotic iRhoms lacking the minimum catalytic sites [26,27]. Alternatively, the observed differences may imply functional divergence, with rhomboids of pathogenic mycobacteria being functionally different from those of non-pathogenic mycobacteria. Indeed, Rv1337 was essential for the survival of the tubercle bacilli in macrophages [38]. Nevertheless, experimental evidence will be necessary for validation of these assertions. Figure 5 The topology of mycobacterial rhomboids. Boxed (yellow) are the transmembrane domains containing the rhomboid catalytic residues and locations for the C-termini conserved residues. The Rv0110 mycobacterial orthologs formed topologies similar to those of the secretase eukaryotic rhomboid rho-1. The Rv1337 mycobacterial orthologs formed either six or five TMHs. The orthologs of pathogenic mycobcateria formed six TMHs while the orthologs of non-pathogenic mycobacteria formed five TMHs.

Extra protein domains in mycobacterial rhomboids
Mycobacterial rhomboids contained extra protein motifs, many of which were eukaryotic. The orthologs of Rv0110 contained diverse eukaryotic motifs, while the Rv1337 orthologs maintained a fairly constant number and type of motifs, either fungal cellulose binding domain or bacterial putative redox-active protein domains (table 2). It is difficult to account for the origin of eukaryotic motifs in mycobacterial rhomboids; nevertheless, extra protein motifs are common in eukaryotic rhomboids where their significance is also not known [17]. Since eukaryotic rhomboids are presumed to have been acquired from bacteria through horizontal gene transfer mechanisms [19], the extra protein motifs may :Cbb3-type cytochrome oxidase component. 5 : Bacterial ABC transporter protein. 6 : In Between Ring 'IBR' fingers. 7 : Dynactin p62 family. 8 : Tim17/Tim22/Tim23 family. 9 : Fungal cellulose binding domain. 10 : Putative redox-active protein. 11 : Predicted membrane protein. have originated from prokaryotic progenitors. Mycobacterial rhomboids also contained N-signal peptides and eukaryotic subcellular localization target signals which were either mitochondrial or secretory (see table 2), with scores higher than or comparable to those of rho-7 and PARL. These observations further allude to a common ancestor for mycobacterial and eukaryotic active rhomboids [17].
A novel nonsense mutation at the Trp73 codon split the MAP rhomboid into two hypothetical proteins  codon]). Usually, nonsense mutations disrupt ORFs resulting in truncated and non-functional proteins; however, this rare scenario resulted into two unique ORFs of MAP, providing the first evidence of a split rhomboid, contrasting whole orthologs of genetically related species. Although the significance of this is currently not known, cDNA was amplified from both ORFs, implying that both hypothetical proteins may be expressed (see figure 6).
What are the lengths of MTC rhomboids?
In genome databases, the lengths for annotated sequences of rhomboids from genetically related mycobacteria vary, and initially we thought this reflected strain diversity. For instance, lengths for Rv0110 orthologs of MTC species are either 249 or 284 residues, while Rv1337 orthologs from the same species are 240 residues. In contrast, MT1378 (ortholog of Rv1337) of M. tuberculosis CDC 1551 is 227 amino acids, 13 residues shorter at the NH 2 -terminus. Thus, we aimed to validate the sizes of rhomboids from related strains/species. Genomic analyses at the rhomboid loci for the sequenced MTC genomes revealed that MTC rhomboid orthologs are 100% identical and are of equal length. Rhomboids were PCR-amplified from MTC with common primer sets for each ortholog (see methods), and sequencing data confirmed that MTC rhomboid orthologs are identical and are of the same size (284 residues for Rv0110 orthologs and 240 residues for Rv1337 orthologs). Rhomboid sequences were deposited in Gen-Bank and accession numbers were assigned (see table 3).

Putative gene clusters for mycobacterial rhomboids
To determine putative functional coupling between mycobacterial rhomboids and other genes, genes in clusters formed by mycobacterial rhomboids at the KEGG database [51] were analyzed. The gene cluster formed by Rv1337 was conserved across the genus and extended to other actinobacteria such as Norcardia and Corynebacteria. This cluster included 58 genes (Rv1311 to Rv1366, see additional file 5) of which some are essential and others are required for the growth of M. tuberculosis in macrophages [38], a necessary step during pathogenesis of the tubercle bacillus. Conversely, the Rv0110 orthologs formed clusters reflecting the genetic relatedness of mycobacteria. Thus, the orthologs from MTC species and M. marinum formed similar clusters consisting of 61 genes (Rv0080 to Rv0140, see additional file 6). These clusters also included essential genes and those required for survival of the tubercle bacillus in macrophage. However, MUL_4822 of genetically related M. ulcerans was not included in the MTC/M. marinum cluster, and formed a unique cluster consisting of only 19 genes (MUL_4791 to MUL_4824) with two genes upstream of the rhomboid (MUL_4823 and MUL_4824, see additional file 7). It is not certain whether this The gene cluster of Rv0110 orthologs of M. vanbaalenii, M. gilvum and Mycobacterium species Jls, Kms and Mcs were also similar, and consisted of 48 genes (Mjls_5512 to Mjls_5559, see additional file 8), whose orthologs in MTC species are required for the growth of the tubercle bacillus in macrophages [38]. Conversely, the cluster for MAB_0026 of M. abscessus consisted of only three genes (MAB_0024, MAB_0025 and MAB_0026), shared with actinobacteria other than mycobacteria. Many MTC orthologs in the gene clusters of MUL_4822, Mjls_5529 and MAB_0026 are required for the growth of the bacillus in macrophages, the implication of which requires further study. There was no gene cluster formed by MSMEG_5036 of M. smegmatis. The essential genes in mycobacterial rhomboid gene clusters are described in additional file 9.

Transcription analysis
Due to their ubiquity in eubacteria, we aimed to determine the expression of mycobacterial rhomboids in a preliminary fashion by screening for in vivo transcription. RT-(Reverse Transcriptase) PCRs amplified rhomboid cDNAs from mycobacterial mRNA, indicating that both copies of mycobacterial rhomboids are transcribed, and possibly expressed (see figure 6).

Functional insights Signal transduction and Metabolite transport
Since mycobacterial rhomboids contain rhomboid catalytic signatures, they may be functionally similar to aarA and rho-1, rescuing phenotypes associated with deletion of these genes in P. stuartii and D. melanogaster rhomboid mutants [52]. Due to their diverse functions, rhomboids appear good candidates for investigation in studies elucidating inter/intra-species/ kingdom signaling mechanisms [29,[53][54][55].
Furthermore, gluP (contains a rhomboid domain) of B. subtilis is involved in sugar transport [17,32], while aarA activates the TatA protein transporter in P. stuartii [31]. As such, the putative gene clusters for mycobacterial rhomboids contained putative metabolite transporters and transcriptional regulators. Since genes in clusters for transport and signal transduction genes tend to have similar roles [56], mycobacterial rhomboids may have such roles.

Roles in pathogenesis?
In a TraSH analysis by Rengarajan et al, Rv1337 was required for the survival of M. tuberculosis H37Rv in macrophages [38], a necessary step during the development of TB. The genome wide conservation of Rv1337 alludes to a possibly important protein. The pathogenesis of M. ulcerans, (the only mycobacterium lacking the Rv1337 ortholog) is known and it culminates in skin ulcerations caused by the plasmid encoded polyketide toxin -mycolactone [4,40,44,57]. Buruli ulcer contrasts with the tuberculous nature of lesions formed by many pathogenic mycobacteria, whose pathogenesis is not well understood and remains a vast field of study.

Moonlighting properties?
It is possible to predict functional coupling between genes based on conservation of gene clusters among genomes [56,58]. Since proteins encoded by conserved gene pairs appear to interact physically [58], the evolutionary conservation of the Rv1337 genome arrangement might have functional implications. mur1 is a moonlighting protein (ability to perform multiple independent functions [59]) that exhibits both racemization and DNA gyrase activities [59]. Since rhomboids are known for diverse functions, the proximity of Rv1337 orthologs with a moonlighting protein makes them suspects for moonlighting properties.

Mycobacterial rhomboids have different evolutionary history
The two mycobacterial rhomboids are phylogenetically distinct and could have been acquired independently. The mycobacterial orthologs of Rv0110 (rhomboid protease 1) appear to be under evolutionary pressure; hence they were lost in the MAC species and M. leprae. These orthologs represent prokaryotic rhomboids whose progenitor may be the ancestor for eukaryotic rhomboids. The Rv1337 (rhomboid protease 2) mycobacterial orthologs appear more stable and are conserved nearly in all mycobacteria, possibly alluding to their importance in mycobacteria.
MAP2425c and MAP2426c provide the first evidence of a split rhomboid contrasting whole orthologs of genetically related species.

Mycobacterial rhomboids are active rhomboid proteases
Mycobacterial rhomboids are active rhomboid proteases, with the active site being stabilized by Phe. Although valuable insights to the roles of rhomboids are provided, the data herein only lays a foundation for future investigations for the roles of rhomboids in mycobacteria.

PCR conditions
Chromosomal DNA was extracted from mycobacteria by boiling heat-killed cells for 10 min and centrifuging briefly at 5000 g to obtain the supernatant containing DNA. Amplification reactions contained 20 pmol each of the rhomboid specific forward and reverse primers (see below), 1.5 U of high fidelity Taq polymerase (Roche Applied Science, Mannheim, Germany), Custom PCR Master Mix (Thermo Scientific, Surry, UK),~200 ng template DNA and nuclease-free water in a reaction volume of 10 μL. The reactions were performed in a Peltier thermocycler (MJ Research, Waterman, MA, USA) at the following conditions: initial denaturation at 94°C for 5 min, followed by 30 cycles each consisting of 94°C, 0.5 min; 60°C, 0.3 min & 72°C, 1 min, with a final extension at 72°C for 10 min. Following amplification, the amplicons were purified with QIAquick PCR purification kit (Qiagen, Hilden, Germany) and sequenced at ACGT (Wheeling, IL, USA). After analyzing with BioEdit software and BLAST algorithm for similarity searches, rhomboid sequences were deposited in the GenBank database (see table 3 for accession numbers).
Transcription assays mRNA was purified from mycobacteria with the Oligotex mRNA mini kit (Qiagen, Hilden, Germany) and~60 ng/μl (in 15 μl) mRNA used as template for cDNA synthesis. Reverse Transcriptase-PCRs were performed with the Titan One Tube RT-PCR System (Roche Applied Science, Mannheim, Germany) to amplify Rv0110 and Rv1337 cDNAs in separate reactions. Except for the initial cDNA synthesis step (50°C for 30 min), PCR conditions were similar to those described above. RT-PCRs were repeated with primers (1337int1: TGGACGTCAACGGCATCAG, forward, and 1337int2: CCAGCCCAATGACGATATCCC, reverse) that amplify an internal fragment (~350 bp) of Rv1337 orthologs.  [62]. These sequences were used as queries in BLAST-searches for rhomboid homologs from an array of mycobacterial genome databases: "tuberculist" [63], GIB-DDBJ [64] and J. Craig Venter institute [65].

Sequence analysis
The similarity between mycobacterial rhomboids was determined using specialized BLAST bl2seq for comparing two or more sequences [66]. Multiple sequence alignments were performed with ClustalW [67] or MUS-CLE [68]. Mycobacterial rhomboids were examined for the presence of rhomboid family domains and catalytic signatures (GxSx). The TMH predictions were done using the TMHMM Server v. 2.0 [69]. The data generated was fed into the TMRPres2D [70] database to generate high resolution images. Cellular localization signals were predicted using TargetP 1.1 server [71].

Phylogenetic analysis
Phylogenetic analysis was conducted using MEGA4 software [72]. The evolutionary history of mycobacterial rhomboids was determined using the Neighbor-Joining method. The percentage of replicate trees in which the associated taxa clustered together was determined using the Bootstrap test (1000 replicates). The evolutionary distances were computed using the Poisson correction method and are in the units of the number of amino acid substitutions per site. All positions containing gaps and missing data were eliminated from the dataset (complete deletion option). For comparison of evolutionary history, trees were also constructed using "Minimum Evolution" and "Maximum Parsimony".

Functional predictions
To predict possible roles for mycobacterial rhomboids, sequences were analyzed at the KEGG database [51] for the genome arrangement, presence of extra protein domains, nature of gene clusters, orthologs and paralogs. Other parameters used to glean functions from mycobacterial rhomboid sequences included analyzing their topologies. To predict functional relatedness among genes within mycobacterial rhomboid clusters, sequences in the clusters were aligned by ClustalW, and Neighbor-Joining trees deduced using default settings.