The NGS technology offers a molecular biological diagnostic tool that allows pathogen detection in complex sample material without prior specific suspicion, if an adequate sequence depth can be guaranteed. The question of adequate sequence depth for metagenomic analyses is not easily answered, in particular, if the proportion of pathogen DNA within a sample is unknown. Most recently, it was suggested by Hillmann et al. (https://www.biorxiv.org/content/biorxiv/early/2018/05/12/320986.full.pdf, last accessed on 1 August 2018) that shallow metagenomic analysis effectively probes the diversity of species down to a sequencing depth of ~ 500 k reads per sample. Even better sequence depth was achieved for all described samples by our sequencing approach.
The technological approaches of NGS are varied [9, 19,20,21,22,23,24,25] and some are still in the stage of development or optimization. A descriptive overview on NGS for the diagnosis of infectious diseases was introduced by Hasman and colleagues [26]. In a previous study, an association between infectious agents and a disease of unknown origin was confirmed [14]. Further, NGS-based detection of bacterial pathogens from two-thirds of tested urine samples was demonstrated in a previous “proof-of-principle” investigation [26]. NGS is also suitable for the detection of poly-microbial infections, as was shown for sample material from brain abscesses [27]. The most reliable diagnostic information can be provided by NGS from primary sterile sample material, where few reads can be used for pathogen diagnostics. Thus Wilson and colleagues succeeded in demonstration of Leptospira-induced meningoencephalitis with NGS based on only 475 (out of more than 3 million) specific reads [28]. Pathogen identification with NGS-based analysis of RNA (ribonucleic acid) in the sample material is also possible and succeeded in recognizing RNA viruses such as influenza virus in respiratory samples in the so-called UMERS (“unbiased metagenomic nontargeted RNA sequencing”) approach [29].
Although the NGS technology is still expensive, sequencing costs have dropped dramatically. For example, the cost of sequencing a human genome was reduced from about 100,000 euros to about 1000 euros within a few years as a result of technological progress [9]. In particular, the introduction of small automated sequencers (about the size of laser printers) has made NGS technology interesting for diagnostic purposes. An earlier comparative evaluation of these small “workbench” sequencers showed that the MiSeq system (Illumina) that was used in this study is superior to the competitors Ion Torrent PGM (Life Technologies, Carlsbad, CA, USA) and the no-longer available 454 GS Junior (Roche, Basel, Switzerland) with focus on the rarity of sequencing errors [30].
The hitherto quite complex and non–user-friendly analysis of sequence information is currently one of the major limitations of wide diagnostic application of NGS technology [31]. Further automation and standardization are essential to overcome these problems for the application of NGS in diagnostic routine. This also applies to the quality and accessibility of underlying databases.
Although the application of NGS with formalin-fixed, paraffin-embedded tissue is not new [17, 18], the NGS-based detection of etiologically relevant pathogens from such materials is a diagnostic challenge. In addition to previous experiments, we therefore conducted a real-life assessment with sample materials from patients with rare and tropical invasive infections, for which no similar experience is available. Non-pathogen-specific molecular diagnostic approaches such as NGS are easily affected by contamination due to environmental microorganisms that are, for example, cast along with the sample in wax. As shown for Bartonella spp. DNA some years ago [32], DNA cross-contamination during tissue processing in a multispecies histopathological laboratory is highly likely. In the current, still unpublished, EORTC (European Organization for Research and Treatment of Cancer) criteria (personal correspondence with Professor Ralf Bialek) for the detection of a fungal infection from paraffin-embedded tissue by means of PCR, it is explicitly pointed out that the detection of specific fungal DNA in paraffin-embedded tissues shall only be used as proof of infection if fungal elements are also seen in histopathological assessments. This is to make sure that possible contamination of paraffin with ubiquitous fungal spores, for example of Aspergillus spp., is not mistakenly used as evidence of invasive mycosis. Although protocols for optimizing the use of FFPEs in molecular epidemiology by reducing the contamination risk have been introduced [33], initial tissue processing and waxing had been performed in a histopathological standard laboratory, where no special precautions against DNA contamination had been enforced. During the cutting of the sections for the molecular analyses, protective procedures against contamination such as discarding the first cuts of each block had been enforced as detailed elsewhere [7, 13]. However, such precautions cannot undo contamination with fungal spores or pathogen DNA that has already occurred during initial processing and waxing of the tissue. This problem was also evident in the present study, in terms of both pan-fungal PCRs and the NGS approach. Traces of DNA even of rare tropical pathogens could be identified within the samples. Species-specific PCRs [34,35,36,37,38,39,40,41] are potential alternatives to pan-fungal PCR approaches, but their selection requires a specific diagnostic suspicion.
Traditional histology is not always reliable in case of invasive fungal infections as well. Its reliability is influenced by a variety of factors, including the requirement for a critical minimum density of pathogens in the examined tissue and a high level of expertise of the physician. In comparative studies between histology and culture, the latter of which cannot be performed from formalin-fixed tissues, a match of less than 80% was demonstrated [42], so histological diagnoses of invasive mycoses have to interpreted with caution [36]. In this study, the histological evaluation was performed by experienced pathologists who were professionally experienced in tropical infectious diseases [13]. Particularly considering the large number of genera and species that—as shown in the “Material and Methods” section—may account for the assessed invasive fungal infections, one has to bear in mind that histologically indistinguishable findings may be caused by different agents. In most cases of invasive mycosis in this study, histology did not allow a species-specific diagnosis but only micro-morphological descriptions such as chromoblastomycosis, mucormycosis, or mycetoma. The lack of cultural and serological results makes the interpretation of such findings challenging, which is an undeniable limitation of this study. Molecular approaches can be very useful here if culture is not possible. Even when sampling conditions allow culture approaches, cultural growth is not possible for all invasive fungi and takes between several days and several weeks depending on the species, as summarized elsewhere [13]. These factors reduce the diagnostic value of fungal culture.
A first important precondition for the reliability of molecular diagnostic findings is the quality of the nucleic acid extraction, which in this study was unacceptable for several samples that had been stored for long times. In line with this, partial PCR inhibition was observed in some of the assessed samples, as shown elsewhere [13]. Comparative testing of alternative nucleic acid purification methods [43, 44] might have contributed to a further optimization of nucleic acid preparation in this study, but this was impossible due to the small amount of sample material that was available, which is an undeniable limitation of the study. For the samples that could be included in the NGS assessment, no significant Spearman rank correlation between sample age and number of detected reads could be found. However, the heterogeneity of the sample materials used makes an interpretation difficult. Of note, no samples older than 31 years were included.
Since the paraffin blocks were stored with the formalin-fixed tissues for years without any special protective measures against the deposition of fungal spores, contamination with environmental fungal spores can be regarded as highly probable. Thus, the high levels of contamination with environmental fungi are not unexpected. Contamination of the paraffin is an alternative explanation.
The high degree of contamination, expected from the previously applied pan-fungal PCRs [13], was a challenge for the NGS analysis. Since NGS analysis is associated with a completely nonspecific analysis of DNA fragments, the challenge is the discrimination of contaminants and etiologically relevant pathogens. The histological results of the samples from patients with invasive mycosis provided hints but not etiological clarification at the species level.
To overcome this problem, each mean value and standard deviation of the percentages of specific sequence fragments (reads) of etiologically relevant species were determined in the assessed samples. Then, the standard deviation from the average at which matching with the histological results can be expected was investigated.
A high rate of matches between histology and NGS results was found only for percentages above the fourth standard deviation in relation to the total number of reads and the number of fungi-specific reads. In cases with percentages above the fourth standard deviation, clear similarities with histology were found. When the percentages in relation to the totality of the reads in the sample were compared with the percentages in relation to the fungal reads in the sample, there was a considerable deviation, which can be explained by the massive differences in the proportions of assignable reads as well as eukaryotic, bacterial, and viral reads. For samples in which none of the assessed species reached the 4th standard deviation, no reliable assignment of etiological relevance could be performed. In the 6 tested samples from patients with invasive amebiasis, NGS-based detection of E. histolytica succeeded in a single sample only, which had also been positive in histology and was clearly positive by PCR.
The approach of comparing NGS results from nonsterile samples of patients with results from a healthy population to define etiologic relevance is not new. A comparison with negative control samples, which was based on a specific subtraction of reads, has been proposed by other authors as a method for identifying pathogens of potential etiological relevance. In this way, the detection of shiga-toxin-producing Escherichia coli succeeded in 67% of stool samples of patients during an outbreak [15].
Another approach was chosen for the sample collection assessed in this study. Other than in the recently described study [15], historical sample materials were used in the real-life assessment presented here. Because the samples had not been stored and collected for study purposes but as part of the diagnostic routine, no matched standardized negative control samples had been prepared. The collection of corresponding materials from completely healthy control subjects would also have posed an ethical problem in instances where the materials were derived from severely invasive sampling procedures, e.g., in case of samples from lung tissue, spinous process tissue, or tricuspid valve tissue. In any case it is obviously impossible to retrospectively apply any sort of standardization to samples prepared, paraffinated, and stored under unknown, and presumably variable conditions in comparatively low-tech laboratory environments sometimes a considerable time in the past. Although randomly selected blocks from a similar time frame that were negative by histopathology might have helped to establish an expected background, such an approach was not chosen for the above-mentioned reasons.
To overcome the problem of the lack of standardized negative controls, the mean percentages of specific reads from all samples, including histologically positive and negative ones regarding the various assessed species, were considered as proxy-negative control values, representing an averaged background. The repeated summing of the standard deviation values and comparison with the individually measured percentages in each sample allowed an estimation of how many more specific reads were detected in each sample than in the proxy-negative control. Accordingly, a standard deviation-based and not a subtraction-based approach [15] was chosen.
The rationale of the standard deviation-based approach is the assumption that the likelihood of a real infection increases with the number of standard deviations of a percentage of measured specific reads in a specific sample above the proxy-negative control. With a value high above the mean value plus several standard deviations, the risk is low that this percentage is measured by chance, i.e., due to contamination. If bacteria and fungi were assessed, these comparisons were carried out not only with all reads within the samples but also with bacteria- or fungi-specific reads. This was done to reduce the effects of the slightly different proportions of viral, bacterial, fungal, and other eukaryotic reads specific to the sample materials. As amebae are neither fungi nor bacteria, such an approach was not possible for their assessment. As an indication of potential contamination, the percentages of specific reads for all species of the genus Entamoeba and also of specific reads for non-pathogenic amebae such as E. dispar were assessed.
For the fungi and bacteria that were assessed, comparisons of the species-specific reads with the total number of reads and with fungus-specific reads and bacteria-specific reads, respectively, led to slightly different results. For example, there were matches above the 2nd standard deviation for Cladophialophora psammophila compared with the total number of reads and for both Cladophialophora psammophila and Chaetomium globosum compared with the fungus-specific reads in a sample with the histological diagnosis of chromoblastomycosis. Such differences are mathematical artifacts resulting from slightly different proportions of fungus-specific reads in the different sample materials. Such examples demonstrate the vulnerability of the model, which is a particular problem with low sample numbers when slight variances show large effects.
An undeniable limitation of the standard deviation-based approach is the fact that the reliability of the proxy-negative control will depend on the number of assessed samples. However, subtraction-based approaches [15] are also susceptible to the problem of sample numbers in excluding major effects of variations by chance.
It is likely that the variety of anatomical source sites might influence the quality of the proxy-negative control. The fact that samples from primarily sterile body compartments were also severely contaminated with DNA of various non-human species suggests that the effects of procedures subsequent to sample acquisition, e.g., during processing, paraffination and storage, were more relevant to the measured contamination than was the anatomical sampling site. Accordingly, the anatomical site was not specifically considered in the definition of the proxy-negative control for the formalin-fixed, paraffin-embedded tissue samples that were assessed. For medical interpretation of the diagnostic NGS results, however, the natural occurrence of environmental microorganisms on primarily non-sterile sampling sites has to be considered. Thus NGS cannot do away with the need for medical validation and interpretation of diagnostic findings.
No target enrichment, e.g., by specific PCR, was attempted or evaluated because the performance of diagnostic NGS without specific suspicion was being assessed. Depletion of human DNA prior to the NGS runs was also not attempted, because the initial DNA quantities in the historical samples was so low that the appropriate technical strategies might also have affected the recovery of the residual target DNA. As an example of this concern over sensitivity, pro-viral DNA of HIV that would be anticipated to be present was never detected in any sample of the patients with invasive and tropical mycoses. The sensitivity concern is of particular importance, because various matches with the histological diagnoses were achieved with just the standard deviation-based approach for the attribution of etiological relevance, while the total numbers of specific reads were very low. In contrast, etiologically irrelevant environmental fungi dominated among the most frequently detected fungal reads in nearly all samples assessed.
Another pointer toward unlikely etiological relevance but increased likelihood of contamination is the frequent detection of very rare pathogens in various samples. An example is the frequent detection of Cladophialophora yegressii, which lives on living cactus plants [45]. Although Cladophialophora spp. can in rare cases be associated with human disease, i.e. chromoblastomycosis [45], the frequent occurrence of comparably high DNA concentrations in samples without any histological indications for chromoblastomycosis makes it more likely that there was contamination deriving from cactus plants in the diagnostic institute.
Further, interpretation can be difficult if increased quantities of sequences of a species are detected which has rarely or never been associated with clinical disease so far. Cryptococcus carnescens is such an example. C. carnescens is part of the Cryptococcus laurentii complex [46]. In a recent review on non-neoformans cryptococcal infections, only 20 cases of infection with C. laurentii complex were reported [47] and those were without detailed differentiation within the complex. The etiological relevance of the C. carnescens sequences, which were identified by NGS in sample 5 of a patient with the histological diagnosis of histoplasmosis or cryptococcosis, is therefore uncertain.
Although potentially useful diagnostic information for 5 out of 17 samples from patients with invasive fungal infection (29.4%) and for 1 out of 6 samples from patients with invasive amebiasis (16.7%) represents only a modest result, this result must be interpreted in relation to the complexity of the sample materials. The sensitivity of the procedure is, undeniably, still unacceptably poor. In comparison, the molecular gold standard method of pan-fungal PCRs with subsequent Sanger sequencing allowed conclusive detection of pathogens in only 2 out of 17 fungal samples (11.8%) and even that only in 3 out of 10 PCR reactions for those 2 samples [13]. In contrast, NGS analysis not only allowed confirmation of the pan-fungal PCR detections of Histoplasma capsulatum and Madurella mycetomatis but also gave hints of infections due to Rhizopus spp., Cryptococcus spp., and Fusarium spp. Particularly for assignments at genus and species levels, histology showed limited value for the diagnosis of invasive fungal infections [36, 42], as in the study described here. For the detection of Entamoeba histolytica in intestinal biopsies, however, specific PCR proved to be superior to NGS analysis.
Accordingly, NGS analysis can help to improve the molecular discrimination of fungal pathogens in formalin-fixed, paraffin-embedded tissues in comparison with contamination-sensitive pan-fungal PCR with subsequent Sanger sequencing. However, the sensitivity appears inferior to that of specific PCR approaches, as the experiments with the ameba-containing samples suggest. For the invasive fungi, however, quality-controlled specific PCRs were available only for histoplasmosis and mucormycosis in the laboratories of the study participants. Specific analysis for all fungal pathogens could therefore not be performed—an admitted limitation of the study.
Focusing on samples for which results of specific PCR and Sanger sequencing were available, it is interesting that PCR with subsequent Sanger sequencing suggested Lichtheimia/Absidia corymbifera while NGS gave strong hints for Rhizopus oryzae in sample 4 of a patient with mucormycosis. Preferential amplification of Lichtheimia/Absidia corymbifera DNA by the PCR primers is a likely explanation, while the more abundant Rhizopus oryzae-specfic DNA was identified by NGS. Preferential primer binding affinities of multispecies primers to certain microorganisms is a well-known problem affecting amplification-based diagnostic approaches [48].
With focus on the hypothesis of the study, it could be shown that hypothesis-free genomic detection of rare invasive infections by NGS in poly-microbially contaminated, formalin-fixed, paraffin-embedded tissue samples is feasible and can provide hints on likely causative agents. Considering the cost of the technique, the demanding technical and bioinformatic procedures, and the uncertainties regarding the interpretation of the results, the technique at present is still subordinate in the diagnostic workflow and should be only considered if other, less demanding procedures do not lead to conclusive results.
It should be noted that assignment of potential etiological relevance based on a percentage of specific NGS reads is far from being standardized and requires further evaluation. Among other factors, the choice of the number of negative control samples in the calculation of the average of the percentage values of reads will necessarily have an impact on the size of the standard deviation and thus on the potential attribution of etiologic relevance in contaminated sample materials. So, standardization prior to diagnostic use is obligatory. From this perspective, the results presented here can only be considered as hypothesis-forming. Further studies are needed to define standards for medical interpretation of NGS-based pathogen identification directly from sample material. This applies even more strongly for contamination-prone sample materials such as formalin-fixed, paraffin-embedded tissue samples.
For such contamination-prone sample materials, there is considerable risk of false-positive spurious results, e.g., in case of contamination events that are restricted to the processing of individual samples. Such events cannot be controlled by the proxy-negative control-based standard deviation approach. Accordingly, the procedure we have introduced can only lead to hypothesis-forming results that will induce the clinician in charge to consider as differential diagnoses clinically matching infectious diseases that had not been considered prior to the non-specific NGS assessment. Without consideration of the clinical findings, the NGS results from such materials are not interpretable. If these limitations are accepted, however, NGS can help to suggest infectious agents as potentially etiologically relevant that were not considered during the initial clinical assessment of a patient. With this aim, the technique can be applied in situations when there are no clear candidates in the potential etiological background of clinical situations in infectious disease patients.