Assessment of various parameters to improve MALDI-TOF MS reference spectra libraries constructed for the routine identification of filamentous fungi

Background The poor reproducibility of matrix-assisted desorption/ionization time-of-flight (MALDI-TOF) spectra limits the effectiveness of the MALDI-TOF MS-based identification of filamentous fungi with highly heterogeneous phenotypes in routine clinical laboratories. This study aimed to enhance the MALDI-TOF MS-based identification of filamentous fungi by assessing several architectures of reference spectrum libraries. Results We established reference spectrum libraries that included 30 filamentous fungus species with various architectures characterized by distinct combinations of the following: i) technical replicates, i.e., the number of analyzed deposits for each culture used to build a reference meta-spectrum (RMS); ii) biological replicates, i.e., the number of RMS derived from the distinct subculture of each strain; and iii) the number of distinct strains of a given species. We then compared the effectiveness of each library in the identification of 200 prospectively collected clinical isolates, including 38 species in 28 genera. Identification effectiveness was improved by increasing the number of both RMS per strain (p<10-4) and strains for a given species (p<10-4) in a multivariate analysis. Conclusion Addressing the heterogeneity of MALDI-TOF spectra derived from filamentous fungi by increasing the number of RMS obtained from distinct subcultures of strains included in the reference spectra library markedly improved the effectiveness of the MALDI-TOF MS-based identification of clinical filamentous fungi.


Background
The identification of mold in the clinical laboratory is classically based on macroscopic and microscopic examination of the colonies grown on mycological culture media. It is a slow and complex process requiring highly skilled mycologists, and misidentifications may occur, even in experienced reference laboratories [1]. Additionally, some distinct species, which are identified via DNA sequence analysis, are morphologically indistinguishable [2][3][4]. Therefore, multilocus DNA sequence analysis represents the recommended approach to accurately identify these microorganisms. Nevertheless, the DNA sequence-based identification of filamentous fungi is primarily limited by the following: i) low DNA extraction yields because mold cells are difficult to lyse, ii) the presence of PCR inhibitors, iii) the presence of misidentified sequences in non-curated public DNA sequence databases, and iv) the cost and time required for sequencing. Currently, only some clinical laboratories routinely use a molecular approach for microorganism identification, which is primarily due to the cost and application constraints [5,6].
Recently, matrix-assisted desorption/ionization timeof-flight (MALDI-TOF) mass spectrometry (MS) has been applied to rapidly identify bacteria and yeasts in the clinical microbiology laboratory setting [7]. This technique is used to analyze microorganism content (primarily ribosomal proteins), thereby generating a spectrum that is considered the fingerprint of the microorganism [8]. Using this technique, the identification of an unknown organism is performed by comparing the corresponding spectrum to a reference library of spectra. When establishing a reference library for microbial identification purposes, many authors have used reference mass spectra, sometimes referred to as "metaspectra" or "superspectra", which are generated by combining the results of a various number of individual spectra corresponding to technical replicates of a given sample. Previous studies have indicated that MS could be used to identify various filamentous fungi taxa of clinical interest, including Fusarium spp [9][10][11], dermatophytes [12,13], Aspergillus spp [14,15], and Pseudallescheria/Scedosporium spp [16]; those of industrial interest, including Penicillium spp [17,18], Verticillium spp [19], and Trichoderma spp [20]; and various filamentous fungal contaminants frequently isolated in the clinical laboratory [21,22].
The heterogeneous morphological phenotypes of filamentous fungi affect the identification process. As shown in Figure 1, the same heterogeneity exists for MALDI-TOF mass spectra, between different strains of the same species as well as between subcultures of the same strain, which negatively impacts the reproducibility of the spectra. To troubleshoot this issue, we accounted for this heterogeneity during the establishment of the RMS library (MSL). We hypothesized that MS identification effectiveness could be enhanced by increasing both the number of reference meta spectra (RMS) of a given strain included in the reference library and the number of deposits used to generate each RMS. The primary objective of this study was to test the effectiveness of distinct reference spectra library architectures for the MALDI-TOF MS-based identification of filamentous fungi. More precisely, we assessed the influence on identification effectiveness of the following: i) the number of technical replicates, i.e., the number of analyzed deposits (spots) from one culture used to generate an RMS; ii) the number of biological replicates, i.e., the number of RMS derived from distinct subcultures for each strain; and iii) the number of distinct strains of one species used to construct the library.

Phenotypic and genotypic identification of clinical isolates
The results of the classical and DNA sequence-based identification of 200 clinical isolates (Table 1) were applied Figure 1 Comparison of mass spectra obtained from four subcultures of a strain of Aspergillus flavus. The Aspergillus flavus 1027804 strain was subcultured on four different agar plates. Spectra A, B, C, and D display the first spectrum acquired from the subcultures 1, 2, 3 and 4, respectively. Spectra A to D display many common peaks; however, a few varying peaks are also clearly visible and characteristic of one of the subcultures.
to classify the isolates into two groups: isolates included and isolates excluded from the MSL. The MS results of both groups are summarized in Table 2. The isolates belonged to 28 different genera and 38 different species. Moreover, 174 isolates corresponded to 18 species, which were represented among those used to construct the eight libraries, whereas the 26 remaining isolates belonged to 20 species that were not represented in the libraries.

Reference MS library validation
All 104 spectra derived from the 26 clinical isolates for which the species was not included in the seven MS libraries (4 raw spectra per clinical isolate) yielded low Log Scores (LS) ranging from 0.45 to 1.79 (only 1/104 spectra yielded LS>1.7: Penicillium aurantiogriseum identified instead of Geotrichum candidum) regardless of the library utilized, which is markedly below the manufacturer recommended threshold of 2.00 for a valid identification. The number of correct identifications among the 706 remaining spectra (i.e., corresponding to the species included in the libraries) and the corresponding LS values were statistically different depending on the mass spectra library used for identification (Figures 2  and 3). Notably, the number of identifications concordant with the molecular biology or microscopic identification and LS values significantly increased when the library included an increased number of both RMS per strain and strains per species. In contrast, constructing RMS from 40 raw spectra (B5) instead of 10 raw spectra or reducing the number of raw spectra used to build RMS of the B1 library from 10 to 4 (B0) failed to significantly alter the performance of the identification process (Table 3, Figure 3). Overall, the best results were obtained using library B7, which involved the combination of the highest number of RMS per strain and the highest number of strains per species. Using this library, we obtained 611 (87%) concordant identifications, with LS values higher than 1.700 in 80.85% (494/611) of the cases and LS values higher than 2.000 in 50.90% (311/ 611) of the cases. Conversely, all 91 (13%) nonconcordant identifications exhibited LS values less than 1.700, a value under which the results of LS identification should not be taken in account. These results were dramatically improved compared with those obtained using library B1, which included only one isolate per species and one subculture per isolate. Indeed, using the B1 library, we only obtained 449 (64%) concordant identifications, 40.09% of which displayed LS values higher than 1.7 (180/449) and only 15.59% were higher than 2.000 (70/449). Modulation of the MSP creation parameters, while considering the B1 library, tended to show that the performance of the database could be improved by an increased peak frequency minimum, regarding the number of concordant identifications and the Log Score   of the first identification (LS1) mean value. However, when these parameters were applied to the B7 library, we observed the opposite result (Table 4).
Considering Aspergillus fumigatus isolates separately, the results ranged from 79% (B0/B1) to 97% (B7) concordant identifications, whereas for other species, the percentage of concordant identification ranged from 56% (B0/B1) to 79% (B7) ( Table 3). Finally, the identification of a clinical isolate, regardless of the species, was not improved by creating metaspectra (MSP) of the 4 spectra for the comparison of the various libraries ( Table 3).
The multivariate analysis findings (Table 5) indicate that concordant identification rates increased significantly with the number of both RMS per strain and raw spectra per RMS. Similarly, the LS values significantly increased (p<10 -4 ) with the independent effect of the numbers of RMS per strain and raw spectra per RMS (data not shown). The independent effect of the number of raw spectra per RMS was weaker than the effect of the number of RMS per strain. The percentage of concordant identifications significantly increased exclusively when the number or raw spectra per RMS exceeded 20 (i.e., 40 raw spectra per RMS).

Discussion
In contrast with recurrent efforts to improve the reproducibility of the MS-based identification of filamentous fungi by standardizing the pre-treatment procedures, we report the first study aiming to improve identification by comparing the effectiveness of distinct RMS library architectures. However, in a recently published study aiming to identify filamentous fungi using MS, de Carolis et al. [22] have shown that some of the mass spectra data obtained during routine diagnosis matched preferentially with the RMS obtained from either young or mature cultures of the same species. Regarding Scedosporium identification, Coulibaly et al. [16] have shown that both the culture media and the duration of culture had a significant impact on MALDI-TOF assay results. However, the standard recommendation to address problems associated with the heterogeneity of microorganism species is merely to increase the number of strains per species in the library. Our findings confirm this hypothesis; however, it is particularly challenging to increase the number of well-characterized strains included in the RMS library for each fungal species. Numerous species have been described to play a role in human infections and, in many cases, only a single strain or a few strains of the same species are preserved in international collections. In the current study, we demonstrated that increasing the number of mass spectra generated from distinct subcultures of a given strain yields a significant improvement in the process of filamentous fungi identification and can partially offset the relatively low number of specific strains available to construct RMS libraries. Modulating MSP creation parameters yielded discrepant results depending on the database that was taken into account. As the B7 database appears ideal for filamentous fungi identification, Bruker's default parameters for the MSP creation method seem to be more suitable for library construction.
Conversely, the number of spectra derived from a strain (4, 10, 20, or 40) that were used to construct RMS did not result in a marked improvement of the identification performance. This straightforward optimization of RMS library architecture significantly enhanced the identification effectiveness.
In this study, we used quadruplicates of the clinical samples to test the various RMS libraries. By taking only the spectrum with the highest LS value into account, we observed an increased percentage of concordant identifications (e.g., ranging from 87% to 90% with library B7). In parallel, using the four clinical replicates to construct an MSP and then compare it to the various libraries did not alter the results but instead tended to complicate the procedure, as this cannot be performed with RTC software during routine analyses.
The use of standardized conditions (incubation time, temperature, and culture medium) [10,[15][16][17][18] reduces filamentous fungi pleomorphism but does not preclude the heterogeneity of the mass spectra derived from a given isolate. For example, Chen et al. [17] have improved the accuracy of Penicillium identification by assessing the presence or absence of different speciesspecific peaks in the mass spectrum data obtained when analyzing Penicillium spores; however, separating spores from hyphae significantly complicates the pre-processing step. Conversely, some authors have shown that mass spectra heterogeneity is reduced using non-sporulating       hyphae obtained in broth culture conditions [21][22][23]. Unfortunately, the more stringent the method, the less suited it is for high-throughput routine diagnoses. Furthermore, certain impediments are difficult to avoid in routine culture conditions, such as inter-technician variations, variation in protocol, and minor variations (temperature, humidity, or light), when aiming to standardize such protocols.

Conclusion
Overall, this study provides useful insight into architecture design of reference MS libraries utilized for the MALDI-TOF MS-based identification of filamentous fungi in routine clinical laboratories. Our results show that both incorporating an increased number of subcultures from each strain and increasing the number of strains representing each species are key to improve the architecture of RMS libraries. These findings should be taken into account to construct a more effective library in clinical laboratories.

Fungal strains
The 90 reference filamentous fungus strains corresponding to 30 distinct species that were used to construct the eight distinct reference mass spectrum libraries are detailed in Table 6. Of the 90 reference strains, 63 strains were graciously provided by the BCCM/IHEM (Belgian coordinated collection of microorganisms, Scientific Institute of Public Health, Mycology and Aerobiology Section, Brussels, Belgium), and 3 strains were provided by the Pasteur Institute (Paris, France). The remaining 24 strains were clinical isolates from the Marseille University Hospital mycology laboratory, which were accurately identified via DNA sequence analysis as described below. All strains used to construct the reference database are preserved in the BCCM/IHEM collection. The identification performance of each reference library was tested using 200 clinical isolates from the Marseille University Hospital mycology laboratory.

Culture
Each reference strain was subcultured on four Sabouraud Gentamicin Chloramphenicol agar plates (AES, France) at 30°C. The strains used to construct the reference libraries and the isolates obtained from clinical samples were analyzed as soon as a fungal colony grew on the agar (usually after 48-72 hours). The clinical isolates were identified via morphological assessment, DNA sequencing, and MALDI-TOF MS as described below.

Clinical isolate identification
All 200 clinical isolates were identified in parallel by two trained mycologists following the identification keys of the Atlas of Clinical Fungi [24]. If the morphological identification was impossible or conflicted with the MALDI-TOF MS-based identification results, the isolate was further analyzed using DNA sequencing. DNA sequence-based identification was performed by analyzing the ITS 2 (primer ITS3: GCA TCG ATG AAG AAC GCA GC and primer ITS4c: TCC TCC GCT TAT TGA TAT GC) and D1-D2 (primer D1: AAC TTA AGC ATA TCA ATA AGC GGA GGA and primer D2: GGT CCG TGT TTC AAG ACG G) variable regions of the 28S unit of the rRNA gene as described by de Hoog et al. [24]. DNA extraction was performed using a QIAmp DNA kit (QIAGEN, Courtaboeuf, France). The reaction mixture was subjected to 35 cycles of 30 s denaturation at 94°C, 30 s primer annealing at 53°C, and 1 min primer extension at 72°C for the ITS 2 region and 40 cycles of 20 s denaturation at 94°C, 30 s primer annealing at 58°C, and 1 min primer extension at 72°C for the D1-D2 region. The sequencing reactions were performed using the same primers used for amplification. In both cases, the sequencing mixture was subjected to 25 cycles of 10 s denaturation at 96°C, 5 s primer annealing at 50°C, and 4 min primer extension at 60°C. Purification of the sequences was performed using BigDye W XTerminator™ (Applied Biosystems, Inc., Courtaboeuf, France), and the different reactions were processed using a 3130 Genetic Analyzer (Applied Biosystems, Inc., Courtaboeuf, France). The resulting sequences were then compared using the Medical Fungi pairwise sequence alignment tool (http://www.cbs.knaw.nl/Medical/BioloMICSSequen ces.aspx). Identification was validated when the sequence was at least 300 nucleotides long and the similarity percentage was over 98%.

Raw mass spectra acquisition
The colonies were gently scraped with sterile plastic pliers to obtain an aliquot (approximately 3-4 mm in diameter) of fungal spores and hyphae. This sample was first suspended in 75% ethanol HPLC. Next, the hydroalcoholic solution was removed via 10 min centrifugation at 13,000 g, and the pellet was suspended in 10 μL of 70% formic acid (Sigma-Aldrich, France) by vigorously pipetting the sample up and down. After a 5-min incubation, 10 μL of acetonitrile HPLC (VWR International S.A.S., Fontenay-sous-Bois, France) was added, and the mixture was incubated at room temperature for 5 min. Finally, the sample was centrifuged for 2 min at 13,000 g. One microliter of the supernatant (consisting of a mixture of fungal proteins) was deposited for each reference strain subculture in 10 replicates on a polished steel target (MTP384, Bruker Daltonics GmbH, Bremen, Germany) and air-dried. Each deposit was then covered with 1 μL of a freshly prepared solution of α-cyano-4-

Constructing the reference mass spectra (RMS)
The RMS were established by combining i) 4 raw spectra obtained from one subculture (RMS4); ii) 10 raw spectra obtained from one subculture (RMS10); iii) 20 raw spectra, 10 from two subcultures each (RMS20); or iv) 40 raw spectra, 10 from four subcultures each (RMS40) of a given reference strain using the "MSP creation" function of the MALDI Biotyper v2.1 software ( The modulation of the number of peaks and desired peak frequency minimum of the MSP creation parameters has been tested regarding the B1 library, and the modified parameters were tested on the B7 database ( Table 4).

Architecture of the eight mass spectral libraries
The same fungal species were included in the eight libraries that differed in number of raw spectra used to build the RMS (described above), RMS included for each reference strain, and strains included. The characteristics of the various libraries are detailed in Table 2.

MALDI-TOF MS-based identification of clinical isolates
Raw mass spectra were obtained from clinical isolates using the same procedure as for the reference strains with the exception that the supernatant were deposited in quadruplicate. The deposits, referred to as spots 1, 2, 3, and 4, correspond to the first, second, third, and fourth extraction supernatant deposit of each sample, respectively. The raw MS data for each spot was successively matched to the eight reference libraries, and the resulting "best match" LS values were calculated using MALDI Biotyper software. An alternate identification process was assessed by constructing an MSP with the four spots corresponding to each of the clinical isolates and comparing isolate MSP with each of the RMS in the libraries.
The interpretation of the results was initially performed independently of the LS value. If the MS identification was identical to the microscopic identification or the sequencing analysis results, the identification was considered concordant, regardless of the LS value; otherwise, it was considered a non-concordant identification. Next, the LS value was considered to be applicable in comparing the performance of the various libraries. As approximately half of the clinical isolates corresponded to the Aspergillus fumigatus species, a comparison was also performed between the libraries when either considering or disregarding this dominant species.
Library performance was also compared regarding the method by which the clinical quadruplicates were considered as follows: i) each spectrum was treated independently, ii) only the spectrum with the highest LS was taken into account, regardless of whether it was concordant, and iii) an MSP of the four spectra was constructed, and the clinical MSP was compared to each library.

Ambiguous MS identifications
Some of the species included in this study are known to be difficult to distinguish, even via ITS sequencing. Reference spectra were included in the libraries, but concordance could neither be confirmed nor contradicted. The species included were Penicillium aurantiogriseum and Penicillium chrysogenum. Both MS identifications were then considered concordant with the other identification methods.

Reference mass spectra library architecture assessment
Analyzing 200 clinical isolates, we tested the influence of the number of the following parameters on identification effectiveness: i) raw spectra used to build a reference MS, ii) reference MS included per strain, and iii) strains per species included in the library. The various reference spectrum architectures were compared with respect to the number of correct and false identifications as well as the mean LS values of both correct and false identifications.

Statistical analysis
The concordant and non-concordant identification results were compared two by two using the paired and non-parametric McNemar's test. The results of the quantitative variable LS analysis were compared using the non-parametric rank sum test of the Kruskall-Wallis test. When the results of the Kruskall-Wallis test indicated a statistical difference between the LS values derived from the different mass spectral libraries, a post hoc statistical analysis was performed, which involved a pairwise comparison of the LS values obtained from each library using the Wilcoxon signed-rank test with Bonferroni adjustment. These analyses were performed using R software (http://www.r-project.org/) with the MASS and ROCR packages. To further examine the influence of library architecture on the probability of obtaining a correct identification, a multivariate analysis was conducted with the Genmod procedure of the SAS 9.2 (Cary, NC, USA) statistical software using the generalized estimating equations option to account for the non-independence of identification results obtained from the same isolate tested against distinct libraries. These analyses were performed to identify the optimal reference library architecture; therefore, the results obtained with isolates for which the species was not included in the library were excluded from this multivariate analysis. All statistical tests were two-sided with a p≤ 0.05 significance level.

Availability of supporting data
These data are included in Table 6 entitled "Details of the 90 reference strains included in the reference libraries".