New adhesin functions of surface-exposed pneumococcal proteins

Background Streptococcus pneumoniae is a widely distributed commensal Gram-positive bacteria of the upper respiratory tract. Pneumococcal colonization can progress to invasive disease, and thus become lethal, reason why antibiotics and vaccines are designed to limit the dramatic effects of the bacteria in such cases. As a consequence, pneumococcus has developed efficient antibiotic resistance, and the use of vaccines covering a limited number of serotypes such as Pneumovax® and Prevnar® results in the expansion of non-covered serotypes. Pneumococcal surface proteins represent challenging candidates for the development of new therapeutic targets against the bacteria. Despite the number of described virulence factors, we believe that the majority of them remain to be characterized. This is the reason why pneumococcus invasion processes are still largely unknown. Results Availability of genome sequences facilitated the identification of pneumococcal surface proteins bearing characteristic motifs such as choline-binding proteins (Cbp) and peptidoglycan binding (LPXTG) proteins. We designed a medium throughput approach to systematically test for interactions between these pneumococcal surface proteins and host proteins (extracellular matrix proteins, circulating proteins or immunity related proteins). We cloned, expressed and purified 28 pneumococcal surface proteins. Interactions were tested in a solid phase assay, which led to the identification of 23 protein-protein interactions among which 20 are new. Conclusions We conclude that whether peptidoglycan binding proteins do not appear to be major adhesins, most of the choline-binding proteins interact with host proteins (elastin and C reactive proteins are the major Cbp partners). These newly identified interactions open the way to a better understanding of host-pneumococcal interactions.


Background
Streptococcus pneumoniae is a common bacteria of the commensal flora and together with other bacterial species, colonizes the nasopharyngeal niche and upper respiratory tract. Pneumococcal colonization is mostly asymptomatic, but can progress to respiratory or even systemic disease, causing the majority of communityacquired pneumonia and invasive diseases such as meningitis and bacteremia. Risk groups include young children, elderly people and patients with immunodeficiencies. In USA and Europe the annual incidence of invasive pneumococcal infections ranges from 10 to 100 per 100 000 with a mortality rate of 10 to 50%; the highest incidence concerns people older than 65 years [1]. The burden of pneumococcal pneumonia is very high in developing countries, and estimated to cause every year the death of more than 1 million children under the age of five. The current seven-valent conjugate vaccine for children is effective against pneumococcal invasive diseases caused by the vaccine-type strains. As more than 90 serotypes have been described, the vaccine coverage is limited and non-vaccine serotypes replacement is a serious threat for the near future [2]. The search for new vaccine candidates that would elicit protection against a broader range of pneumococcal strains or for new drugs to circumvent pneumococcal invasive disease is of tremendous interest.
Over the past 20 years, the importance of proteins for S. pneumoniae virulence has become clear. Research has been stimulated by the observation that pneumococcal proteins, and more precisely, surface-exposed proteins, represent promising candidates for the development of vaccines that could be common to all pneumococcal serotypes [3]. Mechanisms and pneumococcal factors that enable host epithelial and tissue barriers to be breached during the progression from colonization to invasive infection are still poorly understood. The role of the capsular polysaccharides in virulence has long been studied [4]. In order to better understand the pathogenic processes of pneumococcus, screens have been conducted, with very diverse methodologies, which allowed the identification of proteins potentially involved in hostpathogen interactions [5][6][7][8][9]. It now appears clearly that cell-surface proteins participate in many stages of the colonization process and/or the disease transition.
One of the first identified virulence factor of the pneumococcus is the toxin pneumolysin [10] which is able to interfere with the immune system [11,12] as well as directly destabilize host's membranes [13]. Interactions of PspA and CbpA with lactoferrin and factor H, respectively as well as proteolysis of IgA1 play important roles in the escape from the innate immune system [14][15][16]. The pneumococcal glycosidases NanA, NanB and SpnHL cleave terminal sugars from human glycoconjugates, which might reveal receptors for bacterial adherence and/or help for spreading of the bacteria [17]. Contrary to other pathogenic bacteria, very few interactions of pneumococcal proteins with extracellular matrix components have been described. One example is the interaction of PavA with fibronectin [18]. Direct adherence of pneumococci to epithelial cells was shown to be mediated by choline-binding protein A (CbpA) and PsaA which bind to polymeric Ig receptor and E-cadherin, respectively [19][20][21][22]. Finally, a way to progress into host tissue is to recruit the host protease plasmin at the bacterial surface. We recently demonstrated that the pneumococcal surfaceexposed CbpE is a receptor for the plasminogen (as for enolase [23] and GAPDH [24]), activation of which into plasmin facilitates traversal of S. pneumoniae through (i) a reconstituted basement membrane, and (ii) epithelial and endothelial cell barriers via a pericellular route [25,26].
Beside the secreted or membrane-anchored protein associated with N-terminal peptide signal, three major groups of pneumococcal cell-surface proteins have been identified from specific sequence motifs which are related to three different attachment modes to the cell wall, composed by peptidoglycan, teichoic acids and lipoteichoic acids. Teichoic and lipoteichoic acids are decorated with phosphorylcholine (PCho) residues that anchor a group of proteins, the choline-binding proteins (already mentioned as Cbps). These proteins harbor repeated sequences of approximately 20 amino acids, the cholinebinding module, generally present in the C-terminal part of the protein. Two to twelve modules form the choline-binding domain is attached to PCho in a non-covalent manner. Beside the choline-binding domain, the aminoacid sequences vary greatly and for some Cbps, various enzymatic activities or binding properties have been identified. The virulence factors PspA, CbpA, LytA and CbpE are part of this protein family. Secondly, in Grampositive bacteria, proteins can be covalently linked to the peptide moiety of the peptidoglycan [27]. Transpeptidase enzymes called sortases catalyze this anchorage on a specific amino-acid sequence motif: LPXTG. This motif can vary from the canonical LPXTG sequence, this is the case for the pilin proteins (RrgA: YPRTG; RrgB: IPQTG; RrgC: VPDTG). The pneumococcal glycosidases NanA, and SpnHL are members of this LPXTG proteins family. Thirdly, cell-surface lipoproteins are covalently linked to the membrane phospholipids through the N-terminus LXXC motif recognized by the signal peptidase II. PsaA is a lipoprotein.
The availability of genomic sequence data for pneumococcal strains has facilitated the identification of additional pneumococcal surface proteins, relying on searches for specific signatures in sequences of open reading frames. For example, the initial analysis of the genomic sequence of the TIGR4 strain [28] identified 70 genes encoding for proteins predicted to be exposed at the surface of the pneumococcus, using one of the 3 attachment modes. This protein set included 19 predicted proteins with the peptidoglycan anchor LPXTGlike motif, 15 predicted Cbps, 36 proteins with putative lipid-attachment motifs (predicted lipoproteins) [28]. In the R6 strain, a comparable set of proteins display bacterial surface motifs even though not in the same number: 13 LPXTG proteins linked to the peptidoglycan, 10 Cbps and 109 lipoproteins (this number is different than in the TIGR4 strain probably because the authors used different algorithms to predict the lipoproteins). The authors mentioned that overall 471 proteins contain a predicted signal peptide sequence, an indication of their bacterial surface location, either through membrane anchoring or by secretion in the extracellular space and bound somehow to the cell wall [29].
To date, pneumococcal surface proteins acting as virulence factors and playing a role in colonization and disease are overall about 15 (mainly the ones described previously in this text). Taking into account the large number of predicted surface-exposed, and the lack of knowledge on key aspects of the physiopathology of the pneumococcus, we assume that understanding of pneumococcal disease might greatly profit from the study of yet unstudied surface-exposed proteins. In order to identify new host-pneumococcal interactions that may play roles in colonization and disease progress, we have designed a global screening strategy. We first evaluated the ability of the pneumococcus to adhere to host compo-nents. Then we cloned and expressed pneumococcal proteins from the Cbps and the LPXTG protein families to systematically test the interactions of these proteins against host proteins. We thus obtained a map of pneumococcal surface proteins interactions with twelve mammalian proteins putatively encountered during the colonization and/or invasion stages. This work allowed the identification of new protein-protein interactions between Cbp, LPXTG proteins and host proteins, and gives renewed view of the respective roles of Cbp and LPXTG proteins, opening the route for in depth study of the interactions uncovered.

Binding of pneumococcal strains R6 and TIGR4 to host proteins
We first investigated the ability of pneumococcal strains to interact with a wide range of host proteins likely encountered by bacterial pathogens [30]: extracellular matrix proteins (collagens, elastin, fibronectin, laminin, mucin), circulating plasma proteins acting in the coagulation cascade (fibrinogen, plasminogen) and proteins involved in the innate immune defense (lactoferrin, CRP, SAP, factor H). Binding of the R6 strain to these host proteins was tested in a solid-phase assay. Host proteins or Bovine Serum Albumine (BSA) as a negative control were coated on a multi-well plate. FITC-labeled pneumococcus was added and FITC signal was measured after washings of the plate to compare binding of the pneumococcus to BSA and host components (Fig. 1a). The threshold for considering a positive interaction was twice the BSA negative control. Consequently, no significant binding of R6 bacteria was detected to collagen type IV, to a mix of different collagens or to elastin. A low binding level (two to three times above the BSA binding level) was observed for CRP, fibrinogen, fibronectin, mucin and SAP while a higher level of binding was detected to laminin, lactoferrin, plasminogen and factor H (Fig. 1a). A similar experiment has been performed with the encapsulated TIGR4 strain (Fig 1b). No, or very low binding level, was observed for the TIGR4 strain to the collagen type IV, fibronectin, mucin and SAP and a slight higher interaction with CRP, fibrinogen, laminin, collagens and elastin. A high binding level of the TIGR4 strain was measured to lactoferrin, plasminogen and factor H (Fig 1b). Both R6 and TIGR4 strains bind strongly to the lactoferrin and factor H, while the high binding level of R6 to laminin and plasminogen is less important in the case of the TIGR4 strain, the latter harbors a higher recognition property to the elastin compared to the R6 strain.
Interaction of pneumococcal cells with laminin [31], CRP [32], fibronectin [33] and mucin [34] have been described in the literature. All other identified interac-tions are not described to date, and to investigate these interactions at the molecular level, we designed an approach to systematically test interactions between selected pneumococcal surface proteins and host proteins.

Identification, expression and purification of cholinebinding proteins (Cbps)
We built a list of the Cbps present in the R6 and TIGR4 genomes using the published data [28,29]. From these sequences, 10 genes encoding Cbps were predicted in the R6 genome, and 15 in the TIGR4 genome (Fig 2). We systematically compared the TIGR4 and R6 protein databases derived from their complete genome sequence in order to get a list of orthologs between the two organisms. This work was facilitated by the high level of conservation of gene organization between both genomes. This analysis led to the identification of two new Cbps in the R6 genome not identified in the initial study [29], namely spr0583 and spr1274 (Fig 2). In order to homogenize the nomenclature, we named these newly identified choline-binding proteins CbpL (encoded by spr0583 and SP0667 in the R6 and TIGR4 strains, respectively) and CbpM (encoded by spr1274 in R6, the TIGR4 SP1417 locus being a pseudo-gene). The CbpG [35] (SP0390) ortholog in the R6 strain is split in two proteins: spr0349 contains a peptidase domain and spr0350 is a very small protein (42 aa) with a single predicted choline-binding domain. Thus, CbpG does not seem to exist in the R6 strain as a Cbp. Taking all these data together, we conclude that the R6 and TIGR4 genomes encode for 12 and 14 Cbps respectively. Figure 2 gives a comprehensive overview of the Cbps in Streptococcus pneumoniae strains R6 and TIGR4. This classification points out that names previously used to identify the Cbps were confusing. For instance, the ortholog of PcpC in TIGR4 (SP0377) is named CbpF in R6 (spr0337) and the ortholog of CbpF in TIGR4 (SP0391) is PcpC in R6 (spr0351). As CbpF was studied in R6 [36] under that name, we chose to rename SP0391 and spr0351 CbpK. PcpA was also renamed CbpN. We didn't rename well studied Cbps such as PspA, LytA, LytB and LytC. A similar analysis has been performed with the strains G54 (serotype 19F) and Hungary 19A-6 (serotype 19A) ( Table S1). The G54 strain contains 14 Cbps among which only the CbpJ is absent, while 12 Cbps have been identified in the Hungary 19A-6 strain which does not express CbpI, CbpJ and CbpG.
The level of sequence identity between the R6 and TIGR4 Cbps orthologs was determined by Kalign http:// msa.sbc.su.se/cgi-bin/msa.cgi and ranged between 84% and 99%, except for PspA with 63% of sequence identity. Some of the Cbps present slight differences in their general topology: TIGR4 CbpK is larger than R6's and has 3 more choline-binding domains. TIGR4 CbpN is reduced  Cbps can be separated into three classes: some of them have no predicted domain except the choline-binding domain, as CbpI, PspA, CbpF, CbpJ, CbpK, CbpM (which is the shortest with 129 aa), and CbpN (which is the longest with 690 aa and 10 choline-binding domains). Other Cbps present additional domains with identified enzymatic functions (CbpG, CbpE, Lyt proteins). Finally some Cbps exhibit additional predicted domains of unknown functions (CbpL, CbpA, CbpD). All the genes encoding the Cbps were cloned, excluding genes coding for the Lyt proteins as their roles are well documented. CbpE was already cloned in the laboratory [25]. PspA, CbpN and CbpD were not expressed. CbpG and CbpK were expressed as an insoluble form: these proteins were not studied further. CbpA, CbpE, CbpF, CbpI, CbpJ, CbpL and CbpM were successfully purified.

Expression and purification of LPXTG proteins
A comparable analysis has been conducted with the LPXTG proteins (Fig 3). There are genes for 19 and 13 LPXTG family members identified in the TIGR4 and R6 Figure 2 Streptococcus pneumoniae Choline-binding proteins. Topology of the Cbps was analyzed on R6 proteins when existing otherwise TIGR4 by SMART search of PFAM domains http://smart.embl-heidelberg.de/. Resulting general topology of the protein is figured, domains are named with PFAM nomenclature. YSIRK stands for the Gram-positive signal peptide (Pfam entry: PF04650). * refers to proteins for which the number of cholinebinding repeats has been determined by crystallography, and was thus used in the table [36,[45][46][47]. The cloned part of the protein is included in the grey box. Protein and locus nomenclature together with the common names of the proteins, and references for their original discovery are listed in the second column. The third column figures the construct boundaries, and size of the complete protein, NC: Not Cloned. The latter columns display the positive or negative results of expression and solubility of the corresponding proteins. Q97SI4-SP0377-pCpC -CbpC [6] Q8DR52-spr0337 -CbpF [35] Q9KGY7-SP0378 -CbpJ [47] Q97SH5-SP0390 -CbpG [47] Q9KGZ1-SP0930 -CbpE [47] Q8DQ62-spr0831 -CbpE, Pce [45] Q8DN05-spr1995 -CbpA Q97NB5-SP2136 -PcpA [54] Q8DN38-spr1945 -PcpA Q97N74-SP2190 -PspC -SpsA -CbpA [55] Q9KGY9-SP0391 -CbpF [47] Q8DR39-spr0351 -PcpC [50] Q97RW9-SP0667 Q8CZ16-spr0583 Q9KGZ2-SP2201 -CbpD [47] Q8DMZ4-spr2006 -CbpD  genomes, respectively [28,29]. Ten LPXTG proteins are common to the R6 and TIGR4 genomes meaning that some of these surface-exposed proteins are specific to either R6 or TIGR4 strains. Five LPXTG proteins are specific of TIGR4, among which the pilin proteins encoded at loci SP0462, SP0463 and SP0464 and thought to be covalently associated to each other via their LPXTG-like motif by specific pilus-sortase enzymes [37]. Because these particular LPXTG proteins are not linked to the peptidoglycan by the housekeeping sortase A, they have not been included in this study. Two other LPXTG proteins are present in the TIGR4 strain and absent from the R6 strain: the metalloprotease ZmpC and PsrP, a very large protein (4776 aa) essentially composed of a serine rich region [38]. Three new R6 orthologs were identified: proteins EndoD (SP0498 = spr0440), ZmpB (SP0664 = spr0581) and ZmpA (SP1154 = spr1042) (Fig 3). NanA (spr1536) and PclA (= spr1403) are present in the R6 strain but not in TIGR4. Among the LPXTG proteins, spr0400 does not have a LPXTG motif, as was initially reported [29] nor a Gram-positive anchor, was thus excluded from our study. CbpA (SP2190) is identified both as Cbp and LPXTG protein in the TIGR4 annotations. As we did not find a LPXTG motif in SP2190, it was excluded from the LPXTG proteins list and kept with the Cbps (Fig 2 &3). The initial inaccurate annotation as an LPXTG protein likely originates from the presence of an allelic variant of CbpA harboring an LPXTG motif in some pneumococcal strains [15,39]. Finally, the R6 strain has 15 genes encoding for LPXTG proteins compared to 18 for the TIGR4 strain. Protein sizes range from 202 aa (MucB) to 4776 aa (PsrP). Some of them are enzymes (Fig  3) while others may be involved in molecular recognition (SpuA and SpnHL harbor carbohydrate binding modules...). The sequence identity between LPXTG orthologs found in R6 and TIGR4 strains ranged between 89% and The latter columns bring out that every cloned genes gave soluble proteins produced.
As LPXTG proteins are often large, selected domains were cloned for protein expression for most of them ( Fig  3). All cloning were successful except for PclA. All the constructs were positively tested for protein expression and led to the production of soluble recombinant forms.

Protein interactions screening by solid-phase assay
In order to study on a large scale the interactions of the pneumococcal choline-binding proteins and LPXTG proteins with host components, a solid-phase test to screen for interactions between the purified His-tagged pneumococcal proteins and host components was designed and automated. Chosen mammalian proteins, already tested with pneumococci ( Fig. 1), were either part of the extracellular matrix (collagens, fibronectin, laminin, mucin, elastin) or circulating proteins (CRP, lactoferrin, fibrinogen, plasminogen, factor H, SAP). These proteins were coated on a 96 wells plate and the interaction with the purified recombinant His-tagged pneumococcal proteins was detected using an anti His-Tag antibody coupled to the HRP enzyme and revealed by chemiluminescence. Each interaction experiment was conducted at least three times using two or more different protein preparations. Interactions observed in a majority of at least three independent experiments are considered as positives (Table 1).

Interaction profile of the choline-binding proteins
Elastin is the extracellular matrix component showing the largest number of interactions with Cbps: CbpI, CbpL and CbpF, while collagens interact only with CbpL and laminin only with CbpE ( Table 1). The most frequent interactions have been observed with circulating proteins, such as CRP, factor H and plasminogen. Four different Cbps interact with CRP: CbpI, CbpM, CbpJ and CbpL. CbpE and CbpA, interact with factor H, the latter interaction confirming previous results [40], Plasminogen interacts with CbpE and CbpF (Table 1). Interactions between CbpE and laminin or plasminogen confirm our previous observations to which we add factor H herein [25].

Interaction profile of the LPXTG proteins
Even though all expressed LPXTG proteins were produced as soluble recombinant proteins, some of them gave poor purification yield or poor signal detection dur-ing the screen. These restrictions led to the abandon in the screen assay of PavB, ZmpA, MucB and PsrP. The most common interactions encountered with the LPXTG candidates involved the collagen IV (PrtA, ZmpB, NanA and spr1806) and the plasminogen (SpuA, Eng, PrtA and spr1806) ( Table 1). NanA also interacts with collagens and fibrinogen ( Table 1). The interaction level of NanA with lactoferrin was not significant in our assay contrary to a previous observation [17].

Dose-responses curves
We chose to investigate the dose-response of three unstudied Cbps for which we observed host-protein binding functions: the solid-phase assay screening led to the observation that CbpL interacts with collagens, elastin and CRP, CbpI binds to elastin and CRP and CbpM binds only to CRP. In this experiment, 1 μg of each mammalian protein is coated and increasing amounts of pneumococcal proteins is used, from 0.8 to 200 pmoles per well. For all three analyzed Cbps, the interaction with mammalian proteins is dose-dependent (Fig 4). The highest level of binding of CbpL is observed with elastin, intermediate response with collagens and CRP compared with the BSA negative control (Fig 4). These data confirm the results of the screen, and also comfort the "semiquantitative" informations about the level of binding that we obtained from the screen. The interaction of CbpI with elastin and CRP yielded the most important response levels in the dose-response measurements, in accordance with the screen assay but a significant level of interaction was also observed with collagens (Fig 4). Even though the sole interaction of CbpM which came out from the screen procedure was with CRP, confirmed in the dose-response analysis, this more detailed characterization allows to propose that CbpM interacts with elastin but too weakly to be considered as positive during the screen procedure (Fig 4). All together these results validate the procedure that we used to select the interactions that emerge from the screen.

Discussion
We have presented an experimental set up that allowed the analysis of the binding properties of 19 surfaceexposed pneumococcal proteins, leading to the screen of more than 200 interactions, most of which have never been reported in the literature before. The validity of this approach is strengthened by the fact that known interactions were « rediscovered ». For example, we confirmed the interaction between CbpA and Factor H [40]. Complementary ELISA analysis gave a confirmation of the validity of our procedure on chosen protein-protein interactions. From this screen, we conclude that whereas LPXTG proteins do not appear to be major adhesins, Cbps seem to be more important players in the adhesion processes. One explanation can be that most of the Cbps are not associated with enzymatic functions (except the Lyt proteins, CbpD, CbpE and CbpG, see Fig 2). Probably the main function of the Cbps (except for the Lyt proteins) resides in the host-pathogen interaction, and adhesion processes. Most of the LPXTG proteins do exhibit complex 'multi'-functions (enzymatic domains plus different binding domains, see Fig 3), rendering plausible the hypothesis that they have more diverse functions at the surface of the bacteria. Indeed, the results obtained tend to minimize their roles in the adhesion processes. However one has to keep in mind that often only part of the LPXTG proteins was tested as they are usually larger proteins than the Cbps. It's possible that this bias led us to miss significant interactions. Another point is that only protein-protein interactions were tested during the course of the screen. Yet carbohydrates are important components of the host, they were not included in that study and could be an important target of the LPXTG proteins, in particular for the ones that bear carbohydrate-binding modules as it was recently proven for SpuA [41]. Finally, this screen addressed a small fraction of host factors potentially involved in the interactions with the pneumococcus. Thus our screen gives an overview of some protein-protein interactions and extension on this work would require higher throughput techniques such as those based on chips. It's interesting to note that some of the LPXTG found to be adhesins during the course of this screen are proteases such as PrtA and ZmpB. One tempting hypothesis that has already been proposed for PrtA [42] could be that these proteins are involved in the cleavage of host proteins in order to penetrate into the tissues or escape the immune system. Future research will have to elucidate these questions and in particular, the fate of the mammalian proteins after the interactions. During the course of the screen, we identified 3 Cbps, CbpI, CbpL and CbpM that interact with elastin. To the best of our knowledge, this is the first time that interactions of pneumococcal proteins with elastin are discovered. Elastin is a major component of the lungs and blood vessels, and is thus probably frequently encountered by the bacteria. CbpI and CbpL are only expressed in the TIGR4 strain and harbor a high level binding to elastin, while CbpM is specific of the R6 strain and binds weakly to elastin. These data are in accordance with the bacterial binding pattern to elastin: no interaction of the R6 strain was observed with elastin while the TIGR4 strain presents a significant binding property to elastin, indicating that in this latter strain, and despite the presence of the capsule, the recognition to elastin might be due to CbpI and CbpL (Fig. 1). These newly characterized interactions open the way to a better understanding of the contribution of choline-binding proteins during the invasion process. Considering the general interest in the identification and validation of new protein vaccine candidates, that would elicit protection against a broader range of pneumococcal strains and/or play a significant role in the virulence process, it is interesting to note that all the identified recombinant proteins that positively interact with the host proteins are also present in the G54 and Hungary 19A-6 strains, except CbpJ in both strains and CbpI in the latter strain.
We also observed an interaction between some Cbps and the CRP. The interaction between Streptococcus pneumoniae and CRP is one of the first identified hostpathogen interaction at the molecular level [32]. CRP stands for C Reactive Protein, with C standing for C polysaccharide, which contains the teichoic and lipoteichoic acids from pneumococcus. In fact, CRP is interacting with phosphocholines (PCho) [43] harbored by teichoic and lipoteichoic acids. The possibility exists that Cbps could harbor in their choline-binding domains enough PCho to reproduce this interaction. However, it's important to note that not every purified Cbp did interact with CRP, leaving opened the question of a direct interaction between Cbps and CRP.

Conclusions
We have presented an experimental design that allowed the analysis of the binding properties of 19 surfaceexposed pneumococcal proteins, leading to the discovery of 20 new interactions with host proteins. This screen opens the route for in depth study of the role of these surface exposed proteins in the virulence processes,

Cloning of the cbp and lpxtg genes
Oligonucleotides were designed to amplify the required fragments either on R6 or TIGR4 genomic DNA (ATCC BAA334D). R6 genes were preferentially cloned when existing. In order to maximize chances to get soluble proteins expressed in E. coli cytoplasm, we systematically eliminated the predicted signal peptides, transmembrane domains or Gram-positive anchor when present, as for CbpA (Fig 2). The Ligation Independent Cloning (LIC) technique was chosen in order to facilitate high throughput cloning steps [44]); LIC extensions were in consequence included in the primers. PCR amplification was performed using the Phusion polymerase (Finnzyme, #F530L). The amplified gene fragments were cloned into pLIM01 or pLIM12 LIC-vectors (PX'Therapeutics, Grenoble) leading to N-terminal His-Tag fusion proteins. Plasmids were transformed into E. coli DH5a and inserts were sequenced to verify the absence of undesired mutations (Cogenics, Grenoble). The E. coli strain BL21CodonPlus ® (DE3)RIL (Stratagene #230245) was used for protein expression.

Protein expression and purification
Transformed bacteria were precultured (3 mL) in Terrific Broth (TB) with the appropriate antibiotic, chloramphenicol 34 μg/mL, ampicillin 100 μg/mL (pLIM01 vector) or kanamycin 50 μg/mL (pLIM12 vector) at 37°C for overnight incubation. A volume of 250 mL of TB media (plus ampicillin or kanamycin only) was inoculated with the overnight culture and the bacterial growth was performed at 37°C until an OD at 600 nm of 2 was reached. The protein expression was induced by 1 mM IPTG and the culture incubation was carried on at 15°C for about 18 hours.
Bacterial culture was spun down and the pellet resuspended in an appropriate buffer composed of 50 mM Hepes pH7.0 or 50 mM Tris pH8.0 (depending on the pI of the expressed protein), 150 mM NaCl, 40 mM Imidazole and a cocktail of protease inhibitors (complete EDTA free, Roche). After cell lysis by sonication, the recombinant proteins were recovered from the soluble fraction and loaded onto a 1 ml -prepacked HisTrap™ HP (17-5247-01, GE Healthcare) column or HIS-Select ® High Flow Cartridge (Sigma #H7788). Column equilibration was performed in the same buffer as lysis. After extensive washing, recombinant proteins were eluted with a 20 -500 mM imidazole gradient. The eluted fractions were analyzed on an SDS acrylamide denaturing gel. If necessary (generally when the purity of the protein appeared to be less than 90% on the gel), the purification process was continued with an ion exchange column and/or a size exclusion chromatography. Protein concentrations were determined from the absorbance at 280 nm with a spectrophotometer (Nanovue, GE healthcare). For the choline-binding proteins, yields ranged between 5 mg/liter (CbpF) and 120 mg/liter (CbpM, CbpJ) of E. coli culture with a purity estimated on SDS-PAGE greater than 90%. Cbps are often more stable when stored in the elution buffer of the affinity column than in PBS. The purification yields of LPXTG proteins ranged between less than 1 mg to 60 mg/liter of E. coli culture, with a purity level estimated on SDS-PAGE of a minimum of 75%.
Streptococcus pneumoniae from the R6 strain was cultured in Todd Hewitt broth (BD) to an OD of 0.3, harvested and washed in PBS. One mg of FITC (Sigma, F7250) was diluted in 1 mL of PBS, centrifuged and the supernatant was used to resuspend the R6 pellet. The bacteria were kept 20 minutes in the dark. Afterwards, several centrifugation steps (usually 5 or 6, 4000 g-2 min) are conducted in PBS in order to remove free FITC. FITC-labelled bacteria (10 8 cfu) were then deposited in each well (in 50 μL of PBS, BSA 0,2%). The bacteria were left to interact for 2 h at 37°C, before washing eight times with 100 μL of PBS. The fluorescence signal was read in a fluorimeter (FLUOstar Optima, BMG Labtech).

Protein interactions screening by solid-phase assay
White 96 well plates (Greiner 655074) were coated overnight at 4°C with 1 μg (in 100 μL PBS pH7.0) of the same mammalian proteins as in the previously described experiment: collagen IV, collagens, elastin, fibronectin, laminin, fibrinogen, mucin, plasminogen, lactoferrin, CRP, SAP, factor H, and BSA as a control. The following steps were conducted at room temperature in a Microstar ® lab robot (Hamilton). Saturation was performed for 1 h with 200 μL of PBS 2% BSA (Sigma, A7030). His-Tagged recombinant pneumococcal surface protein (200 pmole in 100 μL PBS) were added to each well and left for two hours, three washing steps of ten minutes in 200 μL PBS, Tween 0,03% were then performed. The anti His-HRP-coupled antibody (Sigma, A7058) was diluted 1000× in PBS Tween 0,03% BSA 0,2% and 100 μL were added to the wells. Three washings in 200 μL PBS, Tween 0,03%, followed this last step. The antibody signal was revealed with 100 μL of ECL (Pierce, 32106) and the luminescence immediately read in a FLUOstar OPTIMA (BMG Labtech). Each well was triplicated. The threshold for considering a positive interaction was twice the BSA negative control.
In order to get the maximal accuracy in the interpretation of the results, we built a specific protocol for a global analysis of the results and retained as positive the interactions observed in the majority of data set, provided from at least 3 independent experiments.

Dose response curves
Similar protocol was used except that increasing quantities of pneumococcal His-tagged proteins were used in the interaction steps, from 0.8 to 200 pmoles. Doseresponse curves are in consequence presented with a logarithmic scale.

Additional material
Authors' contributions CF participated in the design of the study, carried out and analyzed all the experiments. The Robiomol platform (BG and MNS) participated in the gene cloning procedures. BG conceived the program for the Hamilton robot. MB and LR participated in protein purification and ELISA experiments. AMDG and CF conceived the study; AMDG and TV coordinated the study; CF, AMDG and TV drafted the manuscript. All authors read and approved the final manuscript.