Our objective was to build an open access reference database to provide access to several proteins related to T4SS. To date, the AtlasT4SS holds 134 ortholog clusters. Their features are shown in Additional file 1: Table S1 that includes the presence of signal peptide and transmembrane regions, subcellular location and genomic location. These features were extracted from PubMed references, as indicated in the table, or from prediction algorithms.
How to access the AtlasT4SS
By “List of Biological sources”: The list of biological sources contains 58 Bacteria (49 Gram-negative and 9 Gram-Positive), one Archaea and 11 plasmids, all known to carry at least one T4SS related protein. The list provides the TaxID NCBI number of each source and the link to the NCBI Taxonomy database.
By “Genes by Clusters and Genes by Biological sources”: The table of genes by clusters displays the 1st T4SS category, the list of clusters, the biological sources compounding the cluster, the annotated product name, the gene ID - according to the NCBI- , and the CDS size. On the other way, the table of genes by biological sources gives almost similar information, sorting by biological sources instead of clusters.
We used controlled vocabulary in order to annotate the names of genes and products. For product name, we used two major denominations: (i) “Type IV secretion system protein”, for all proteins involved in effector translocation, T-DNA translocation or DNA Uptake/Release processes or, (ii) “Conjugal transfer protein”, for all proteins involved in the conjugation process. These denominations were according to the nomenclature used in the reference databases (UniProtKB/Swiss-Prot, COG, Kegg) or the cited literature. We added “homolog” as a final tag of the product name, to describe an ortholog system of one given archetypal T4SS system. For almost all gene names, we used the existing denomination found in NCBI or UniProtKB/Swiss-Prot.
The “1st category": We defined the first category according to the four well-known T4SS groups, as follows: (i) the F-T4SS group displays the Tra/Trb orthologs that form the conjugal transfer system encoded on the plasmid F identified in the E. coli genome; (ii) the P-T4SS group includes the Tra/Trb proteins that are encoded on the plasmids belonging to the incompatibility group IncP. This group also contains the orthologs of the archetypal A. tumefaciens VirB/D4 system, including the proteins Mpf (VirB subunits of the matting pair formation complex), T4CP (coupling-protein VirD4), and Dtr (Tra, VirC and VirD proteins that are involved in the DNA processing and its transfer to the Mpf/T4CP complex); (iii) the I-T4SS group includes ortholog clusters related to the archetypal L. pneumophila, C. burnetti and/or Plasmid Colb-P9 Dot/Icm systems; and (iv) the GI-T4SS group contains orthologs encoded on the genomic islands of H. influenza, P. aeruginosa and Salmonella enterica.
The "2nd category": The second category describes a well-known protein family or else an uncharacterized protein family (UPF). At present, the AtlasT4SS shows a total of 119 annotated protein families.
The "3rd category": The last category displays the classification based broadly on the function of a particular type IV secretion system. We described ten functional categories. When the function of a T4SS is well-known, we annotated it as either: (i) conjugation, (ii) effector translocator, (iii) T-DNA translocator, or (iv) DNA uptake/release. Also, when there is experimental evidence of bifunctional proteins, we annotated them with both functions, as follows: (v) conjugation and effector translocator or (vi) effector and T-DNA translocator. On the other hand, there are some uncharacterized systems, which we annotated as a probable function by analysis of similarity data (subject and query coverage ≥80% and similarity ≥80%) and phylogenetic tree, as follows: (vii) probable effector translocator, (viii) probable conjugation or (ix) probable effector translocator and DNA uptake/release. Finally, when the function of a given system was not possible to predict, we annotated it as (x) unknown.
The current version of the AtlasT4SS database contains 119 families dispersed into 134 clusters. Each protein family can be related to one cluster (e.g. F-T4SS TraA-F family), two clusters (e.g. I-T4SS DotA family), three clusters (e.g. P-T4SS VirB7 family), or up to eight clusters (e.g. P-T4SS VirB2/TrbC family). Figure 3 shows the distribution of protein family sizes in the database, and for each of them its functional category is highlighted. This figure allows a simple identification of functional category within a given family. For example, the largest protein families (more than 10 members), in particular those belonging to the P-T4SS group contain several annotated functional categories, including the unknown function. These functional categories vary from four for Endonuclease_MobA/VirD2 Family to eight for several VirB related families and nine for VirB6/TrbL Family.
Clustering search mode
This mode corresponds to an advanced search with several parameters that allow the user to retrieve selected T4SS data using one or more filtering parameters. Moreover, this searching tool is a comparative mode, since the user can select biological sources of interest from the whole list. Thus, the user can retrieve T4SS records by entering the product, gene name or synonym (by NCBI gene ID). Also, it allows performing a search by either selecting an interesting biological source(s) or from the whole list of biological sources. Figure 4 shows an example of a search: T4SS proteins involved in conjugation belonging to the VirD4/TraG family in A. tumefasciens C58 Cereon, Rhizobium etli CFN 42 and Mesorhizobium loti R7A. It is also possible to run a BLASTP and BLASTX algorithm with a query amino acid or nucleotide sequence against AtlasT4SS clusters (Figures 5 and 6).
Phylogenetic analysis
Using the concatenated amino acid sequences of the ortholog clusters containing three or more predicted proteins, we generated a NJ midpoint-rooted trees for each ortholog cluster. A total of 108 phylogenetic trees are displayed in the AtlasT4SS. Overall, all clusters represent a mixture of described functions, including effector translocators, DNA uptake/release and conjugation systems. However, a closer examination of the major trees resulting from alignment of amino acid sequences encoded by VirB1/AvhB1, VirB2/AvhB2, VirB3/AvhB3, VirB4/AvhB4/TrbE/CagE, VirB6/AvhB6/TrbL, VirB8/AvhB8, VirB9/AvhB9/TrbG, AvhB10/VirB10/TrbI, AvhB11/VirB11/TrbB/GspE, VirD4/AvhD4/TraG and their homologues revealed that single branches grouped proteins with the same functional classification.
Accordingly, these T4SS trees display two categories of functions: single branches grouping effector translocator systems, and the other ones grouping conjugation systems. For example, the midpoint-rooted phylogenetic tree of the AvhB11/VirB11/TrbB/GspE cluster [39] contains the highest number of sequences, totalizing 206, including 142 paralogs. As mentioned before, proteins VirB11 belong to the ATPase VirB11 family, which contains the Type II secretion system protein E domain, also found in the DotB family. Consequently, the BBH merged into the same cluster, VirB11, TrbB, and also the GspE proteins of type II (e.g., GeneID: lpg1522 and product: Type IV fimbrial assembly protein pilB), but these sequences were not included in this tree. It is important to note that the VirB11 homolog from Campylobacter jejuni (CJJ81176pTet0039) involved in DNA uptake/release is closer to the conjugative TrbB proteins, which is also observed in the VirB4 phylogenetic tree [40].
There is only one discrepancy in the grouping of functions at the final branches: the VirB11 from Brucella suis (BRA0059), which is an effector translocator system, was grouped on the same branch of TraM protein from a possible conjugative plasmid pSB102. Hence, this discrepancy is observed in all phylogenetic trees of the P-T4SS clusters.
A case study: T4SS in Rhizobium etliCFN42
The genome of R. ettli strain CFN42, a nitrogen-fixing bacterium, consists of one chromosome and six plasmids, and contains three copies of the T4SS: the plasmid p42a carries two copies of T4SSs (VirB/D4p42a and Tra/Trbp42a), and the symbiotic plasmid p42d carries one VirB/D4p42d system [41].
The Tra/Trbp42a is involved in conjugal transfer of the self-transmissible plasmid p42a, and can mobilize the symbiotic plasmid p42d. On the other hand, the VirB/D4p42d probably is not a functional conjugation system [41]. Concerning the function of the third T4SS, the VirB/D4p42a, we postulated the hypothesis that this system is a possible effector translocator. Through examination of the phylogeny of ortholog clusters, we observed that all VirB/D4p42a subunits grouped together with the effector translocator systems VirB/D4Ti of A. tumefasciens and VirB/D4pR7 of Mesorhizobium loti. The alphaproteobacteria M. loti belonging to the Rhizobiales order enables symbiotic relationships for biological nitrogen fixation with Lotus spp., including Lotus corniculatus and the model legume plant L. japonicus. The M. loti VirB/D4pR7 is encoded in the symbiotic island of plasmid R7A, and was proven to be an effector translocator system, essential for plant symbiosis [42, 43]. To date, two substrates transferring by the VirB/D4pR7 to the host plant have been identified in vitro, one being the product of ORF msi059, and the other one the product of ORF msi061 [42]. This T4SS is the first example of a type IV being involved in mutualistic symbiotic relationships.
Interestingly, looking for msi059 and msi061 homologues in the R. etti CFN42 genome, we found two ORFs in the plasmid p42a. One is RHE_PA00030 (270 aa) belonging to the Peptidase C48 family, which is similar to a domain of msi059 (61% BLASTP over 15% of the length of the protein). The other one is RHE_PA00040 (203 aa) (annotated as VirF1), which is similar to msi061 (54% BLASTP over 42% of the length of the protein) and VirF (52% BLASTP over 78% of the length of the protein), a protein transferred by the VirB/D4Ti required for A. tumefasciens virulence [44].
Consequently, according to evidence shown in our analysis, we suggest experimental investigation of VirB/D4p42a in order to elucidate the probable effector translocator function and its involvement in the R. etti CFN42 symbiosis. Through T4SS analysis of symbiotic bacteria, it is possible to verify a role of this system for the host relationship. Perhaps in these bacteria, the T4SS can replace the same secretion function mediated by another system, such as the type III secretion system.
Future development and perspectives
Currently, we are working to include new systems and the related substrates for the effector translocator systems in the database. Also, we will perform an upgrade of the database to incorporate more systems from Gram-negative and Gram-positive Bacteria and Archaea.