DNA-Microarray-based Genotyping of Clostridium difficile

Background Clostridium difficile can cause antibiotic-associated diarrhea and a possibility of outbreaks in hospital settings warrants molecular typing. A microarray was designed that included toxin genes (tcdA/B, cdtA/B), genes related to antimicrobial resistance, the slpA gene and additional variable genes. Results DNA of six reference strains and 234 clinical isolates from South-Western and Eastern Germany was subjected to linear amplification and labeling with dUTP-linked biotin. Amplicons were hybridized to microarrays providing information on the presence of target genes and on their alleles. Tested isolates were assigned to 37 distinct profiles that clustered mainly according to MLST-defined clades. Three additional profiles were predicted from published genome sequences, although they were not found experimentally. Conclusions The microarray based assay allows rapid and high-throughput genotyping of clinical C. difficile isolates including toxin gene detection and strain assignment. Overall hybridization profiles correlated with MLST-derived clades. Electronic supplementary material The online version of this article (doi:10.1186/s12866-015-0489-2) contains supplementary material, which is available to authorized users.


Background
Clostridium difficile is a component of the human colonic flora. If the physiological bacterial flora in the colon is altered or damaged by antibiotics, especially by clindamycin, fluoroquinolones, cephalosporins, or amoxicillin/clavulanic acid [1,2], C. difficile is able to multiply and to cause damage due to its production of several toxins. Resulting conditions are antibiotic-associated diarrhea and pseudomembranous colitis (for a recent review, see [1]). Severe cases might progress to toxic megacolon and end fatally [3].
Important virulence factors are secreted toxins TcdA and TcdB, encoded by genes tcdA and tcdB [4] that form a pathogenicity locus together with regulatory genes (tcdC and tcdD) and a gene (tcdE) encoding a holin-like pore-forming protein [5]. TcdA and TcdB irreversibly modify GTPases from the Ras superfamily resulting in disruption of vital signaling pathways of the cell and in cell death [4]. Besides, some C. difficile strains harbor a binary toxin encoded by cdtA and cdtB. The binary toxin appears to modify actin via its ADP-ribosyltransferase activity. Its clinical significance is not yet fully elucidated [4,6,7] The therapy of C. difficile infection includes rehydration, discontinuation of antibiotics triggering the condition, oral administration of vancomycin or metronidazole as well as surgical intervention in severe cases [1]. Relapses are common, either due to surviving spores, or to re-infection. A possible role of probiotics is still investigated as well as the concept of transplanting feces in order to restore the physiological flora [8,9]. With increasing numbers of patients who receive long-term, broad-spectrum antibiotic therapies, C. difficile became an increasingly important problem in healthcare. Case numbers as well as fatality rates are increasing; with the latter being attributed to the emergence of more virulent strains [10].
Transmissions of C. difficile and even outbreaks within hospital settings are common, given that spores are able to survive in a clinical environment and are resistant to alcoholic disinfectants [1]. Hospitalizations, or residence in nursing homes, are significant risk factors for acquisition of C. difficile, and 50 % of patients who stayed in hospital for more than one month acquired C. difficile [11]. Transmissions within healthcare setting justify infection control measures, in analogy to, e.g., methicillin-resistant S. aureus. Besides barrier nursing, isolation, disinfection, etc., this also should include molecular typing in order to trace chains of infections. A variety of methods that included multilocus sequence typing (MLST), sequencing of the slpA gene, multilocus variable-number tandem-repeat analysis and ribotyping has been described previously [12][13][14][15][16][17] and genome sequencing might become an option in the future.
Microarray-based rapid typing proved to be a convenient tool for MRSA genotyping [18] allowing both, virulence and resistance gene detection and molecular typing within one experiment. Therefore, a microarray-based assay was designed to prove this concept for C. difficile.

Profile-and MLST based clade assignment
Data for a subset of most relevant target genes are presented in Table 1; full data are provided in the Additional file 1.
Isolates were clustered into hybridization profiles (HP) or strains based on overall hybridization profiles with emphasis to tcdA/B and slpA alleles. Isolates or strains were regarded as one HP in case of at least 88 % identity of positive/ambiguous/negative classifications for all probe positions covered, plus presence of identical tcdA/ B and slpA alleles. Possibly mobile resistance markers were counted for the score, but they were, contrarily to tcdA/B and slpA, not considered for the definition of hybridization profiles or strains. It still needs to be clarified whether these genes could be used as subtyping markers for isolates within one HP (i.e., for outbreak investigations).
Applying this approach, tested isolates and reference strains clustered into 37 distinct hybridization profiles (HPs; Table 1 and Fig. 1). Three additional profiles were predicted from published genome sequences, although they were not found experimentally. If several isolates with identical hybridization profiles were subjected to MLST, they yielded identical or related sequence types. Occasionally, several ribotypes (RTs) were observed within one cluster and some ribotypes were present in different, although similar or related, clusters.
In C. difficile, MLST-derived sequence types (STs) cluster into five major clades [19]. Hybridization profiles also can be clustered into these clades when analyzing their similarities (see Fig. 1).
Clade I encompasses a variety of sequence types including ST-03, ST-45, ST-54 and others [19]. It was found to correspond to the largest and most diverse cluster of hybridization profiles (HP) comprising HP-1 to 30.
Clade II comprised ST-01/RT-027 strains [19]. It matched hybridization profiles 31 and 32. Beside reference strains, only two isolates were assigned to this Clade indicating that the emergence and spread of ST-01/RT-027 strains [20,21] did not yet engulf the Dresden region at the time when the samples were taken.
Clade III includes ST-05/RT-023 strains [19] corresponding HP-33 and −34. Clade IV consists of ST-37/ RT-017 or HP-35 and -36 strains while a Clade V include ST-11/RT-078 corresponding to HP-37 to HP-39. ST-127-like STs might form an additional clade according to eBurst analysis (with ST254 as predicted founder), putatively named "Clade VI" herein. It included the genome sequence of Strain 6503 (GenBank prefix ADEI) which translated into a 40th hybridization profile. It was not identified experimentally.
In the visualization using SplitsTree (see Methods as well as Fig. 1), the tcdA/B negative isolates appear to form a separate clade. This, however, can be regarded as an artifact related to the relatively high number of probes recognizing the tcd locus (see Discussion).

Alleles of slpA
The gene slpA encodes the surface layer protein. Fifty four probes were designed to distinguish slpA alleles that are currently represented in GenBank, with one or two probes recognizing one allele. Table 2 shows the predicted patterns and the respective GenBank entries as well as the corresponding ribotyping and/or MLST data for isolates identified within this study. The analysis predicted twenty-eight patterns; twenty-one were found. Additionally, two patterns were observed which probably represent truncated variants of known alleles.
Five isolates (2.1 %) yielded no positive slpA signals. Based on their overall hybridization profiles they clustered into two distinct Clade I strains (HP-06,-30). However in HP-30, ambiguous signals for one probe were observed which might indicate the presence of a truncated variant or divergent allele.
There was no direct correlation of slpA alleles, ribotyping and MLST, with isolates of some ribotypes or STs yielding different slpA alleles.

Alleles of tcdA/tcdB
Four probes allowed distinguishing two tcdA alleles. Both alleles, tcdA R20291 and tcdA CF5 , were found in this study; with the former one being more common and being detected in more diverse lineages. Table 3 shows corresponding GenBank entries, HPs, RTs, MLST types and slpA types. Nineteen isolates were tcdA-negative.
For tcdB, seven alleles were distinguished using nine probes (Table 4), but only three, tcdB R20291 , tcdB 630 and tcdB CF5 , were experimentally identified. Allele tcdB 630 was the most common and widespread one. Nineteen isolates were negative for tcdB; its absence correlated with the absence of tcdA.
Co-localized genes tcdC and tcdE were interrogated with one probe each. They were absent from all tcdA/B-  RT-013, RT-087  -1164  sometimes  ambiguous) RT-029, RT-081, RT-094 Table 1 Detected hybridization pattern types and their association with ribo-and sequence types as well as toxin gene alleles and resistance markers (Continued) Full hybridization profiles are provided as Additional file 2 Asterisk indicates in silico analysis only negative strains, but frequently they yielded also in other isolates negative or ambiguous results. This might be attributed to sub-optimal binding conditions for these individual probes, un-appreciated sequence variation or to a technical problem during probe synthesis, and should in future be overcome by re-design.

Binary toxin
Two alleles of the A component (cdtA R20291 and cdtA 630 ) of the Binary Toxin were theoretically predicted from published sequences as well as experimentally identified with four different oligonucleotide probes. Isolates of RT-023/MLST Clade III yielded an additional pattern for which no matching GenBank entry was identified. It is putatively named "cdtA Clade III " in Tables 1 and 5. For the B component (cdtB), three alleles (cdtB M120 , cdtB R20291 and cdtB 630 ) were distinguishable with six probes. The variant cdtA 630 + cdtB 630 was the most ubiquitous one in accordance to the predominance of Clade 1, although some isolates completely lacked cdtA/B. In Clade 1 isolates, ambiguous signals were frequently detected apparently due to a poor performance of two probes (as discussed above for tcdC and tcdE). Clade II strains harbored a distinct variant, cdtA R20291 + cdtB R20291. Isolates of RT-023 or MLST Clade III yielded "cdtA Clade III " while cdtB signals in these isolates were indistinguishable from the cdtB R20291 allele. Clade V isolates carried cdtA R20291 and a characteristic cdtB allele, cdtB M120. Finally, no cdtA/B was detected in Clade IV isolates, and a "Clade VI" genome sequence (Strain 6503, ADEI) did also not include these genes.

Ubiquitous resistance markers
The gene bcrA, encoding the bacitracin ATP binding cassette transporter BcrA, was present in all C. difficile isolates but four. Three probes could be used to identify three different alleles.
Allele bcrA 630 (GenBank AM180355.1; 767,494 to 768,420;probe 1072) was present in all Clade I and Clade II isolates. Clade V isolates carried allele bcrA NAP07 (GenBank ADVM01000079.1; 10,507 to 11,100;probes 1071 and 1073). Clade IV and VI harbor bcrA CF5 (Gen-Bank FN665652.1; 715,979 to 716,905)which also yielded a signal with probe 1071 while the binding site of 1073 was more similar to the equivalent site in bcrA 630 (differing in one base from bcrA 630 but in five from bcrA NAP07 ). Three tested Clade III isolates appeared bcrA-negative. Since no published genome sequence was available for that clade, it is not clear whether this lineage lacks the gene entirely, or harbors an unknown allele.
The gene lmrB, associated with lincomycin/clindamycin resistance was detected in all tested isolates, and in all published genome sequences analyzed. Two probes were used to identify two different alleles. Allele lmrB 630 (GenBank AM180355.1; 2,893,512 to 2,894,912), was detected in the vast majority of isolates. In isolates associated with Clade V, another allele, lmrB NAP07 (GenBank ADVM01000028.1; 28,036 to 29,436) was found.             Likewise, vatA (synonym sat) encoding a virginiamycin/ streptogramin A acetyltransferase was found ubiquitously, in tested isolates as well as in analyzed genome sequences. Two alleles were differentiated using two probes, vatA-NAP07 (GenBank ADVM01000028.1; 23 to 655) in Clade V isolates and vatA 630 (AM180355.1; 2,576,453 to 2,577,085) in all others.

Variable/mobile resistance markers
The presence of cat (chloramphenicol acetyl transferase), erm(B) (RNA methyl-transferase, conferring resistance to macrolides and clindamycin) and tet(M), encoding tetracycline resistance, was variable. The gene cat was found in 18 isolates (i.e., in 7.5 % of tested strains and isolates). The gene erm(B) was detected in two reference strains, BI-9 and 630, as well as in 78 isolates (30 %). tet(M) was present in two reference strains, M120 and 630, and in 33 isolates (14.6 %). Carriage rates within C. difficile strains were ranging widely, with isolates of certain hybridization profiles (e.g., HP-25 to -27) being virtually always positive for erm(B) and/or tet(M).
For tet(M), five probes reacted in different combinations (Additional file 2). An assignment to alleles was not performed because of several possible sources for error. These might include i) a simultaneous presence of different plasmids in one strain, ii) the existence of chimeric forms (for instance, 5′-and 3′-ends in AJ973139.1, AJ973141.1 and FN665653.1 are identical to ADNX01000070.1 while the middle parts are identical to AM180355.1) and iii) possible irregular patterns for lowcopy number plasmids with an effective target concentration around the detection limit of the linear amplification procedure.

Other markers
Two genes, vncS/vexP1 encoding a histidine kinase and a permease were found to always occur together. Some similar strains (e.g., HP-31 and-32, or HP-35 and -36) could be distinguished by their presence or absence.
Several other markers contributed to specific profile showing different alleles that were uniform within a HP but could vary within a clade (Additional file 2). These included genes encoding septum formation initiation protein (divC), flagellin subunit C (fliC), cell wall proteins 66 and 84.

Discussion
A rapid, reproducible and convenient method for molecular typing of C. difficile was developed. It based on a linear multiplex amplification followed by array hybridization. Target genes were resistance genes localized in published C. difficile genome sequences and toxin genes with their different alleles. In addition to these markers, other genes were selected based on the variability of their presence (e.g., vncS/vexP1) or their sequence (divC, fliC, bcrA, lmrB, vatA, genes encoding cell wall proteins 66 and 84). Alone these genes would not be suitable typing markers but taken together, they can be used to generate stable profiles or fingerprints that allow assignment to clusters or clades as defined by other methods.
Genes that show clade-specific allelic variations also include the toxin genes. Therefore, a topic for a future study could be a possible correlation of toxin alleles and/or of clonal complex affiliations to clinical severity. In order to check whether a possible higher virulence is caused by the actual toxin alleles, or by some other factor linked to phylogenetic background, a high number of isolates from defined conditions need to be typed and their toxin alleles need to be determined. The proposed system might be a suitable platform for such a task.
It can be assumed that ribotyping, slpA typing, MLST and array hybridization yield comparable phylogenetic information, i.e., strains that are recognized as similar/ related by one method will also appear as similar/related by the other methods. However, there is no complete correlation. One ribotype might be associated with two similar array profiles or related MLST types and vice versa. Single and multilocus typing schemes by design tend to emphasize subtle differences. Isolates that are identical belong by definition to the same ST, but single locus variants, and even those that differ in a single base exchange are defined to belong to another ST. STs are numbered chronologically (i.e., by date of submission to the database curator) so that their numbers yield no phylogenetic information. Thus, STs with very different numbers might be still very similar. In order to cluster related STs, clonal complexes (as in, e.g., Staphylococcus aureus, [22]) or Clades [19] were introduced giving a more structured overview on the phylogeny of the target species. In C. difficile there are five major clades, at least one minor clade and several "singletons", i.e., STs that have no known links to others [19].
When converting HPs to a SplitsTree graph, its topology is strikingly similar to a SplitsTree graph of MLST sequences as presented by Dingle et al. [19]. The only significant difference is that all tcdA/B negatives are categorized as one "branch". This is an artifact caused by the high number of probes associated with this locus (n = 15, out of which nine to ten normally are positive). The loss of this locus would thus significantly impact the overall hybridization profile overriding other features affecting a smaller number of probes. Negative results of other markers, such as for slpA, would not have this effect because of the smaller number of probes involved.
With regard to practicalities, a major advantage for the array-based approach is that isolate typing as well as toxin gene detection and allele identification can be performed within one experiment by a single amplification reaction starting from clonal colony material. The amplification follows linear kinetics, utilizing one primer per target. This has the advantage of facilitating unlimited "multiplexing", i.e., the simultaneous detection of multiple targets, and of being resistant to contaminations by amplicons from previous experiments. The disadvantage is a reduced sensitivity compared to standard, exponential PCR. However, since the assay was designed to characterize cultured and cloned bacterial cultures (as opposed to native patient samples) this is not of relevance; and sequencing-based typing methods would also lead to nonsensical results when applied to polyclonal samples. In practical terms, protocol and time requirements, including hands-on-time, of the linear amplification are the same as for normal PCR. The subsequent hybridization procedure can be performed within half a day being more rapid than ribotyping. The assay as well as analysis and interpretation can largely be automatized. The set of probes can, possibly combined with MLST markers and splA sequences, also be mapped to "conventional" or "next generation" sequence data in order to rapidly obtain clinically relevant typing information out of an abundance of data and to create a database that encompasses both, in silico and in vitro typing data.

Conclusions
The microarray based assay allows rapid and highthroughput genotyping of clinical C. difficile isolates including toxin gene detection and strain assignment. Overall hybridization profiles correlated with MLSTderived clades, and target genes that showed cladespecific allelic variations also included the toxin genes.

Ethics statement
Isolates were obtained as part of routine diagnostics and were analyzed retrospectively and anonymously. No patient data were used. Ethical approval and informed consent were thus not required.

Culture and DNA preparation
Isolates were kept frozen at-80°C using cryobank tubes (Microbank, Pro-Lab Diagnostics, Richmond Hill, Canada). Prior to use they were inoculated on pre-reduced Schaedler haemin-cysteine blood agar and incubated at 37°C for 48 hours. Then, harvested culture material was transferred into 200 μl Lysis buffer/enzyme mix (A1 + A2; from Alere StaphyType Kit, Alere Technologies, Jena, Germany). After 60 min incubation at 37°C and 550 rpm, 200 μl AL buffer and 25 μl Proteinase K (from the QIAamp DNA Mini Kit Qiagen, Hilden, Germany) were added and another incubation step of 60 min, at 56°C and 550 rpm followed. After addition of ethanol, DNA was purified using spin columns (QIAamp DNA Mini Kit Qiagen). Finally, DNA was eluted in 50 μl water and heated for 10 minutes at 85°C in order to evaporate trace contaminants of ethanol. The DNA concentration was determined spectrophotometrically at 260 nm. If necessary, DNA was concentrated to 150 ng/μl by evaporation.

Array design
The array was designed to include toxin genes (tcdA/B, cdtA/B), genes related to antimicrobial resistance (cat, erm(B), tet(M)), known typing markers (slpA) as well as genes for which the analysis of published genome sequences showed either a variable occurrence, or the occurrence of distinct alleles. A complete list of targets and primer/probe sequences is provided in Additional file 1. First, all GenBank entries for any given target were retrieved. One entry was selected as reference, and its coding sequence was excised. All resulting BLAST hits were downloaded and re-annotated into a local database excising and aligning all valid open reading frames. Sequences were classified into paralogues and allelic variants based on similarity. Consensus regions from the alignments were chosen for the probe and primer design. Probe sequences were selected for specificity and for similar GC content, length, and melting temperature. Resulting probe sequences were re-blasted against all available sequences to check for false negativity or cross-reactivity.
One hundred thirty-five probes were spotted in triplicate on arrays that were mounted into ArrayStrips (http:// alere-technologies.com/en/products/lab-solutions/platform-components/arraystrip-as.html). The length of the probes ranged from 24 to 34 bases (mean length, 27 bases; median length, 28 bases). There were 140 primers. Their lengths ranged from 18 to 25 bases (mean and median lengths, 20 and 21 bases, respectively).

Protocol optimization
For validation of the array and for the optimization of the protocol, completely sequenced strains (see above) were used. Hybridization profiles were predicted by comparing the probe sequences with their known genome sequences.