1H NMR spectroscopy-based metabolomics analysis for the diagnosis of symptomatic E. coli-associated urinary tract infection (UTI)

Background Urinary tract infection (UTI) is one of the most common diagnoses in girls and women, and to a lesser extent in boys and men younger than 50 years. Escherichia coli, followed by Klebsiella spp. and Proteus spp., cause 75-90% of all infections. Infection of the urinary tract is identified by growth of a significant number of a single species in the urine, in the presence of symptoms. Urinary culture is an accurate diagnostic method but takes several hours or days to be carried out. Metabolomics analysis aims to identify biomarkers that are capable of speeding up diagnosis. Methods Urine samples from 51 patients with a prior diagnosis of Escherichia coli-associated UTI, from 21 patients with UTI caused by other pathogens (bacteria and fungi), and from 61 healthy controls were analyzed. The 1H-NMR spectra were acquired and processed. Multivariate statistical models were applied and their performance was validated using permutation test and ROC curve. Results Orthogonal Partial Least Squares-discriminant Analysis (OPLS-DA) showed good separation (R2Y = 0.76, Q2=0.45, p < 0.001) between UTI caused by Escherichia coli and healthy controls. Acetate and trimethylamine were identified as discriminant metabolites. The concentrations of both metabolites were calculated and used to build the ROC curves. The discriminant metabolites identified were also evaluated in urine samples from patients with other pathogens infections to test their specificity. Conclusions Acetate and trimethylamine were identified as optimal candidates for biomarkers for UTI diagnosis. The conclusions support the possibility of a fast diagnostic test for Escherichia coli-associated UTI using acetate and trimethylamine concentrations. Electronic supplementary material The online version of this article (10.1186/s12866-017-1108-1) contains supplementary material, which is available to authorized users.


Background
Uncomplicated urinary tract infection (UTI) is one of the most common diagnoses especially in young women who are sexually active, and although less frequent still reasonably common in older women, pregnant women and men [1,2]. Uncomplicated cystitis and pyelonephritis are mainly caused by Escherichia coli (75% -95%) and possibly by other species of Enterobacteriaceae (such as Proteus mirabilis and Klebsiella pneumoniae) and Staphylococcus saprophyticus [3]. Other gram-negative and gram-positive species are rarely isolated in uncomplicated UTIs. Signs and symptoms of acute uncomplicated UTI include dysuria, urinary frequency or urinary urgency, loin pain, and eventually hematuria. The occurrence of other symptoms, such as vaginitis and urethritis suggests alternative diagnoses. The diagnosis of UTI is generally a clinical diagnosis based on symptoms and signs, and urine culture is usually not required to manage uncomplicated infections [1]. In fact, recent expert guidelines [4] state that for women without a history of a laboratory-confirmed UTI, a visit in a clinic, or ambulatory care facility for urinalysis or dipstick testing is appropriate. However, it is recommended that in symptomatic patients a negative dipstick result should be confirmed by a urine culture, urinalysis, or both [1,4]. Urine culture is the gold standard for the microbiological diagnosis of UTI, and it is routinely used in most clinical laboratories [5]. As well, the method is labour intensive and time-consuming, and the turnaround time to obtain culture results is often exceeding the time requested to start the antimicrobial treatment. Moreover, only half of the symptomatic women have a UTI if the definition of infection is more than 10 5 colony-forming units (CFU)/mL. Nevertheless, both urinalysis and culture are not able to predict clinical outcome in most women with clinical signs of infection, and recent evidence-based guidelines suggest that treating adult symptomatic women without laboratory testing does not increase adverse outcomes [6,7]. Accordingly, given the cost and time constrains routine urine culture should be avoided to manage most uncomplicated UTI. Finally, prescriptions for UTI treatment have a significant impact on total antibiotic consumption and are associated both with an increase in expense for healthcare and with the spread of antibiotic resistance. Clearly, better and rapid diagnosis of UTI might prevent patients from being unnecessarily treated [4]. In the light of these recommendations, there is a need to develop a simpler, faster and cheaper tool to predict the causative organisms for bacterial UTI, or at least those suspected as the most likely source of UTI. For these reasons, any improvement in the diagnostic process of UTI would greatly impact future healthcare.
Recently, the availability of new technologies such as identification by matrix-assisted laser desorption/ ionization time-of-flight mass spectrometry (MALDI-TOF) and the development of automated solutions designed for microbiology have had a positive impact on patient management and hospital costs, increasing productivity and quality, and reducing turnaround time and laboratory costs [8,9]. Furthermore, application of highresolution nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS), ultra-performance liquid chromatography, coupled with analytical and bioinformatics techniques, has enabled the metabolomics profiles of a number of diseases to be comprehensively investigated [10]. In particular, in the recent years 1 H-NMR spectrometry has been used in the diagnosis of bacterial UTI, reaching an accuracy of nearly 99.5% using a multivariable prediction model. With this approach, a panel of biomarkers could act as indicators of UTI, thus providing a feasible method for diagnosis [11][12][13][14][15][16]. Here we confirm that NMR-based urinalysis is a useful and rapid method for the etiological diagnosis of E. coli-associated UTI that could be properly translated for clinical practice.

Results
Based on the urine culture results, 51 patients with E. coli-associated UTI (E. coli-pos), 21 patients with non-E. coli UTI, and 61 healthy controls (CTRLs) were selected for the metabolomics analysis. Clinical and demographic characteristics of the patients and the healthy subjects are summarized in Table 1. Amongst the 72 patients positive for microbial infection, only 72% were positive for nitrite after dipstick urinalysis. On the other hand, about 93% resulted positive for the nonspecific inflammation marker leukocyte esterase. All urine samples underwent NMR analysis. A representative 1 H NMR spectrum of urine obtained from E. coli-pos sample is reported in Additional file 1: Figure S1. In order to perform the multivariate statistical analysis, the data from E. coli-pos and CTRLs samples were split into training set and validation set. Training set was composed of 72 samples (31 E. coli-pos + 41 CTRLs) and the validation set of 40 samples. The multivariate statistical analysis was carried out using binned bucketed data.
To generate an overview of the dataset variation, a Principal Components Analysis (PCA) was first performed based on the normalized NMR spectral data obtained from urine samples. In order to get a better discrimination between the groups, orthogonal partial least square-discriminant analysis (OPLS-DA) was applied in our study. The OPLS-DA showed a clear separation of samples into two distinct groups, indicating that the E. coli-pos and CTRLs samples had a significant different metabolic profile. Score plots using the first two PCs were used to present a 2D representation of variations among the spectra (Fig. 1a).
The OPLS-DA model was established with onepredictive and one-orthogonal components, and showed good values of R 2 Y (0.76) and Q2 (0.45) and a p value < 0.001. Then, in order to test the validity of the OPLS-DA model, a permutation test (200 times) on the corresponding PLS-DA model was performed by using the same number of components (Additional file 2: Figure S2).
The R2 and Q2 values derived from the permuted data were lower than the original values and the regression of Q2 line intersected at below zero, indicating the validation of the PLS-DA model (Q2 of -0.131). By analyzing the coefficient loading plot color-based between the Controls and E. coli-associated UTI we selected regions of the spectra with a key role on the OPLS-DA (Fig. 1b). The validation set was then used. The classification of an external validation dataset using the model-parameters based on the training set provides information about the generalizability of the model. The predicted value of Y variable based on OPLS-DA was calculated for all prediction set and used as classifier. Receiver operating characteristic (ROC) Fig. 1 a) Distribution of E. coli-associated UTI and healthy controls obtained with a OPLS-DA model. Controls (open circle), E. coli-associated UTI (full circle). b). Color-coded coefficient loadings plots between the Controls and E. coli-associated UTI curve analysis was applied to provide a measure of clinical utility. The area under the ROC curve (AUC) calculated using the Y-predicted value was 0.79 (95% CI 0.61-0.98) (Fig. 2a).
Using the ChenomX software, the discriminant metabolites were identified and quantified. This method compares the integral of a known reference signal (trimethylsilyl propanoic acid, TSP), with signals derived from a documented database of about 350 compounds in order to determine concentrations relative to the reference signal. The univariate statistics were calculated using the Mann Whitney U test, with each metabolite concentration normalized to the creatinine concentration. The discriminant metabolites, their chemical shifts and the p-values are listed in Table 2. Acetate, Ac, and trimethylamine, TMA, were shown to be the most discriminatory metabolites. Box plots of relative concentrations for both metabolites are shown in Fig. 3, indicating the diversity of individual metabolites among different groups. The ROC curve build using both metabolites gave an AUC of 0.938 (95% CI 0.86-1.02) (Fig. 2b).
The concentrations of these metabolites were also evaluated in urine samples from patients with infections with other pathogens to test the specificity of their detection. These metabolites were used as classifiers and the area under the ROC curve was evaluated. The ROC analysis results are shown in Table 3. The AUC from ROC curve build comparing E. coli-pos vs. CTRLs samples was 0.92 (95% CI 0.81-1.0) for acetate and 0.89 (95% CI 0.79-0.99) for TMA, and 0.94 (% CI 0.86-1.0) combining both metabolites, respectively. This indicates that ROC curve by single metabolite concentrations are not performing as well as using them together. The use of both acetate and trimethylamine concentrations shows almost 100% sensitivity and 100% specificity. The comparison between E. coli-pos samples and group "All" (CTRLs + non-E. coli) shows an AUC of 0.89 (95% CI 0.80-0.98) for Ac, 0.80 (95% CI 0.67-0.94) for TMA, and 0.89 (95% CI 0.80-0.99) using both, respectively. The results get worse, in terms of AUC, when comparing E. coli-pos and non-E. coli samples.

Discussion
Urinary tract infections are a common problem in adults that frequently need prescriptions of laboratory tests and   [17]. Dipstick urinalysis provides some support with leukocyte esterase and nitrite information. They can be nonspecific since the leukocyte esterase is a marker of granulocytes and not sensitive as not all urinary pathogens produce nitrite, sensitivity of nitrite being between 37 and 59% [18]. In our study, in fact, the two parameters showed better performances, though they did not reach sufficient levels of analytical sensitivity, as expected.
Most human diseases have characteristic modifications in the metabolite profiles of fluids prior and during the development of clinical symptoms [19]. Thus, metabolomics offers a unique tool to investigate the complete set of small molecules derived from biochemical changes due to ongoing disease. The identification of these low-molecular-weight biomarkers will result in the early and more accurate diagnosis, care of disease and, as hopefully expected, in effective therapeutic treatment [20][21][22]. NMR-based urinalysis for the screening of UTI with high accuracy and reproducibility has previously been described. E. coli-associated UTI, the most common scenario for all UTI, has benefited from NMR analysis which provides a valuable presumptive diagnosis for clinician judgment. Bacterial metabolic end-products were invariably observed in contaminated urine samples to date [23], and a set of metabolites were investigated and proposed as marker for bacterial UTI: in most studies, acetate and TMA were indicated as ideal urine biomarker for bacterial UTI and E. coli-associated UTI, in that their presence can be attributed to the metabolic effect of bacterial contamination of urine in the presence of ongoing disease [14,15]. In particular, it is known that TMA production is the result of integrated metabolism between the host primary metabolome and microbes that reduce trimethylamine N-oxide (TMAO) using the bacterial trimethylamine N-oxide reductase in the urinary bladder, thus making TMA as a specific marker for bacterial metabolic activity and a valuable marker of E. coli-associated UTI [14,15]. The use of TMA as a biomarker, due to its origin, rules out a possible contamination as a result of positive test. In this study, using a metabolomics approach the urinary profile of UTI caused by the gram-negative E. coli could be distinguished from that of the healthy controls, and also from that caused by other non-E. coli pathogens. We focused our study on E. coli-positive patients in order to validate the metabolomics approach in a well defined group of UTI caused by the most frequently detected pathogen. Indeed, we  have shown that the method could be used to discriminate the metabolomic profiles of UTI caused by other pathogens, and further investigation will aim to validate the specificity of the approach in different clinical contexts (especially in the intensive care units). Acetate and trimethylamine concentrations appear optimal candidates as biomarkers for E. coli-associated UTI diagnosis in the clinical setting, and this can have application in the selection of antibiotic treatment of the patients.
In fact, the results suggest that NMR-based diagnosis is a rapid, simple and safe test for the diagnosis of UTI and enables the prescription of medication before the microbiological diagnosis of UTI is confirmed. This advancement in clinical utility of non-cultured based diagnosis should be regarded as a paradigm shift in clinical medicine and in microbiological diagnosis.

Conclusions
The study further confirms the value of NMR spectrometry as a diagnostic tool for E. coli-associated UTI. The method is validated for clinical purposes, easy to use, delivers reliable results with a rapid turn-around time, and has a high sensitivity and specificity compared to routine tests as urinalysis or dipstick testing. The discriminative model enabled acetate and trimethylamine to correctly provide information on bacterial contamination with E. coli with a high predictive ability. Clinical application of NMR spectroscopy has been previously assessed, and its role as in vitro diagnostic test highlighted in a great variety of diseases, including infections. One of the goal is to restrict the use of empirical antibiotic treatments to patients with UTI and to curtail the overuse of drugs that increase the spread of antibiotic resistance and the public health costs. A rapid, sensitive and cheap biochemical test will in future well acceptable once it is standardized and a precise cut-off level defined for the identification and/or differentiation of specific UTI.

Patients
Urine samples, collected over the course of two years, 2013-2014, at Policlinico Universitario di Monserrato-Cagliari (Italy) from a total of 133 subjects were analyzed: 51 patients with E. coli-associated UTI (E. coli-pos), 21 patients with UTI caused by other pathogens (Enterococcus spp. = 6; Staphylococcus spp. = 4; Proteus spp. = 3; Candida spp. = 8), and 61 healthy controls (CTRLs). All samples had previously been collected for routine mandatory diagnostic analysis from patients with acute uncomplicated cystitis and manifesting symptoms of dysuria, urinary frequency, or urinary urgency, and in a few cases, hematuria. Urine samples with negative or low colony counts and without evidence of inflammatory disease were used as the control group (Table 1). The following parameters were registered for each patient: the collection date, age, sex, identification of the bacteria strain associated with UTI, and the results of the antimicrobial susceptibility test. Mid-stream urine samples were obtained before starting antimicrobial therapy and analyzed using standard microbiological methods. Samples were collected and immediately aliquoted for the analysis after the addition of sodium azide 0.1% (w/v) to stop bacteria growth.

Microbiological assays
Samples for urine culture were analysed within one hour of sampling; otherwise, they were stored at 4°C and processed until 24 h after collection. All samples were inoculated in CLED agar as well as MacConkey agar and Sabouraud dextrose agar (all from bioMérieux, Marcyl'Étoile, France), and were incubated under aerobic conditions for 18-48 h at 35°C. A positive culture was defined as having a number of yielded colonies ≥10 3 CFU/ml when associated with clinically significant signs in symptomatic patients and in light of the patient's immunological status. Non-UTI controls were samples with no microbial growth from subjects with a negative history of UTI. The identification and antimicrobial susceptibility of the isolated strains were performed using automatized Vitek2 (bioMérieux, Marcy-l'Étoile, France). The ATCC 25922 E. coli was used for quality control and susceptibility defined in accordance with EUCAST recommendations (http:// www.eucast.org/clinical_breakpoints/).
After sample collection, fractions of urine samples were added with a solution of 0.1% sodium azide, then centrifuged for 10 min at 4°C at 12000 rpm to remove whole cell debris and to avoid contaminants. The supernatant was used for subsequent NMR analysis.

NMR spectroscopy analysis
For each sample 630 μl-aliquots were collected from the supernatant and processed as previously described [24]. Samples were then transferred into a 5 mm NMR tube for analysis. The 1 H-NMR spectra were acquired using a Varian Unity Inova 500 MHz spectrometer (Agilent Technologies, Santa Clara, CA, USA). The acquisition conditions of the NMR spectra were the following: standard temperature of 27°C, 1D NOESY sequence with a 90°p ulse of 10.4 μs, acquisition time of 1.5 s and spectral width of 6000 Hz. Free induction decay (FID) was acquired 128 times to increase signal-to-noise ratio. FIDs were weighted by an exponential function with a 0.5-Hz line broadening factor prior to Fourier Transformation. Finally, acquired spectra were phase and baseline corrected (Version 7.1.2, Mestrelab Research S.L.).

Statistical analysis
The 1 H-NMR spectra were reduced to regions of 0.04 ppm in the region 0.5-9.5 ppm, excluding urea and residual water region. Total area for each spectum bin was normalized to a constant sum of 100 to minimize the effects of variable concentration among different samples [25].
The resulting dataset was then analyzed by using SIMCA-P+ (Version 13.0, Umetrics, Umeå, Sweden). Multivariate statistical models were validate as follows: one-third of the samples were removed from the total set and used as a validation set, and the rest of the samples were used for constructing the training set. The class of samples in the validation dataset was then predicted based on the model built from the training set.
PCA was performed to get an overview of similarities/ differences between sample profiles and to detect possible outliers.
Additionally, PLS-DA was applied to maximize the separation between samples and to identify subsets (linear combinations) of metabolic features associated with a specific sample class. OPLS-DA was applied to achieve a better interpretation of PLS-DA models, as it removes systematic variations from the data by placing them in orthogonal components and maximizing class separation in the OPLS-DA component [26]. A permutation test was performed using 200 permutations to check overfitting of the PLS-DA model. Once the supervised model has been estimated from the training set, the model can be used to predict new observations. All observation of the prediction-set can be predicted using the supervised model calculating the predicted value of Y variable. The Y predicted value could be used to build a classifier.
Six metabolites were identified as discriminant for the separation between the E.coli-pos and CTRLs samples from the S-Plot line and, then, quantified using ChenomX NMR Suite 7.1 (Chenomx Inc., Canada) [27]. Univariate statistical summaries and tests were performed based on the creatinine-normalized concentrations. The univariate statistics were calculated using the Mann Withney U test to estimate the significance of group differences. P-values of less than 0.05 were considered statistically significant.
The discriminant metabolites were also quantified in a group of 21 urine samples with infections with other pathogens in order to test E. coli specificity. Each metabolite was used to build a classifier to discriminate between E. coli-pos versus CTRLs, E. coli-pos versus non-E. coli and E. coli-pos versus "ALL" (CTRLs + non-E. coli).
The performance of the classifiers was assessed using a ROC curve performed using GraphPad Prism version 7.00 (GraphPad Software, La Jolla California USA, http://www.graphpad.com). ROC curves are summarized in a single value, the area under the curve (AUC), that ranges from 0 to 1.0. [28].