Bacterial infections are one of the major causes of mortality among human and animals in the world . Understanding adaptation of bacterial pathogens to the dynamic and hostile environment is crucial for improvement of therapies of infectious diseases. Bacteria associated with chronic infections in patients suffering from e.g. AIDS, burn wound sepsis, diabetes and cystic fibrosis (CF) are ideal objects for studying bacterial adaptation.
In airways of CF patients, mucus forms a stationary and thickened gel adhering to the epithelial lining fluid of the airway surfaces, which affects the mucociliary escalator and results in impaired clearance of inhaled microbes . CF patients suffer from chronic and recurrent respiratory tract infections which eventually lead to lung failure followed by death. Pseudomonas aeruginosa is one of the major pathogens for CF patients and is the principal cause of mortality and morbidity in CF patients . Early P. aeruginosa infection in CF patients is characterized by a diverse of P. aeruginosa strains which have similar phenotypes as those of environmental isolates [4, 5]. In contrast, adapted dominant epidemic strains are often identified from patients chronically infected with P. aeruginosa from different CF centers [4, 6, 7]. Once it gets adapted, P. aeruginosa can persist for several decades in the respiratory tracts of CF patients, overcoming host defense mechanisms as well as intensive antibiotic therapies .
As P. aeruginosa has been sequenced, transcriptome profiling (e.g. microarray analysis and RNA-Seq) becomes a convenient approach for characterizing biological differences among different P. aeruginosa clinical isolates from CF patients. Transcriptome profiling enables researchers to measure genome-wide gene expressions in a high-throughput manner thus can provide valuable information for P. aeruginosa adaptation during infections. However, the interpretation of transcriptomic data is a great challenge for researchers due to the complexity and noise. Clinical strains isolated from different patients have adapted to distinct host environments since patients vary in their ages, infection histories and medical treatments (e.g. different kinds of antibiotics and their dosages). Therefore, researchers need to reduce dimensionality and extract the underlying features from the multi-variable transcriptomic dataset.
Principle component analysis (PCA) is a classic projection method which is widely used to accomplish the above mentioned tasks . PCA transforms a number of correlated variables into a smaller number of uncorrelated variables called principal components (PC). The first PC captures as much of the variability in the data as possible, and each succeeding PCs capture as much of the remaining variability as possible. However, the constraint of mutual orthogonality of components implied in classical PCA methods may not be appropriate for the biological systems. Recently, independent component analysis (ICA), which decomposes input data into statistically independent components, was shown to be able to classify gene expressions into biologically meaningful groups and relate them to specific biological processes . ICA has been successfully applied by different research groups to analyze transcriptomic data from yeast, cancer, Alzheimer samples and is shown to be more powerful at feature extraction than PCA and other traditional methods for microarray data analysis [11–13]. In a study by Zhang et al., ICA was used to extract specific gene expression patterns of normal and tumor tissues, which can serve as biomarkers for molecular diagnosis of human cancer type . Yet to the best of our knowledge, there have been no reports of application of ICA to the study of bacterial transcriptomic data from chronic infections.
In this study, we applied ICA to project the transcriptomic data of 26 CF P. aeruginosa isolates into independent components. P. aeruginosa genes are unsupervisedly clustered into non-mutually exclusive groups. Each retrieved independent component is considered as a putative adaptation process, which is revealed by the functional annotations of genes that give heavy loadings to the component.