Modern sequencing technologies allow deciphering the entire genetic material – the genome – of a bacterium requiring only a short amount of time. This produces a huge amount of data with thousands of genes. However, the analysis of the bacterial properties encoded by these genes is time-consuming. Bioinformaticians of the Braunschweig Integrated Centre of Systems Biology (BRICS), a joint facility of the Helmholtz Centre for Infection Research (HZI) and the Technische Universität Braunschweig, recently developed a software that can predict a total of 67 traits of a bacterium from its genomic data. These traits include, e.g., preferred nutritional sources, antibiotic resistance and bacterial motility. The scientists made their software called "Traitar" available online, free of charge and described it in the international journal mSystems.
The physiological properties of an organism are encoded in its genetic information, i.e. in its genome. The genome contains the blueprints for a multitude of proteins, which are the foundation of all functions in the cells of the organism and therefore define its characteristic features. Especially in the study of pathogenic bacteria, it is important to identify their specific features to be able to research them in more detail and to develop a suitable medication as therapy in case of an infection. For this purpose, scientists need to know which nutritional sources a pathogen utilises, whether or not the bacterium is dependent on oxygen, and if it is resistant to antibiotics. To gather this information often requires labor-intensive experiments, especially in the case of yet uncharacterised bacterial strains. A first nowadays very quick step in this process is the deciphering of the bacterial genome. But the analysis of the genomic data is challenging. The new "Traitar" software developed by the bioinformaticians from Braunschweig helps scientists significantly in the characterisation of bacterial strains: Based on genomic data, the programme derives a whole range of relevant properties in just a few minutes.
"The current version of Traitar tests the bacterial genomic data for 67 different phenotypes, i.e. properties, of the respective bacterium," says Aaron Weimann, who is doing his PhD project in the "Bioinformatics of Infection Research" Department of Prof Alice McHardy and is a member of the team that developed the new software. "Bacteria usually possess between 3000 and 6000 genes that determine their traits. Based on the genetic data, Traitar utilises machine learning algorithms to identify those protein families that are indicative of the predicted phenotype." Consequently, the software also provides detailed information about the proteins or protein families that lead to the predicted traits. The name of the software derives from the word "trait", which refers to a bacterial property. Traitar is freely available on the Internet at github.com/hzi-bifo/traitar. The software can be downloaded free of charge from this site and can be used without special programming knowledge. Smaller data sets can be analysed directly through an online tool at http://research.bifo.helmholtz-hzi.de/webapps/wa-webservice/pipe.php?pr=traitar.
For the development of Traitar, Weimann and his colleagues initially searched the literature and databases for information about phenotypes of bacteria whose genome had already been sequenced. They then compared the genome data on the computer to the curated phenotype data systematically applying machine learning methods to search for patterns and combinations of protein families that facilitate an accurate prediction of phenotypes. "For the programming of the software, we used training data, in which the genome and the resulting phenotypes were known," Aaron Weimann explains. "Later on we used additional datasets to validate Traitar and to improve the accuracy of the phenotype models based on the correct versus incorrect earlier predictions." The training and the additional validation of Traitar were based on the bacterial encyclopedia, Bergey's Manual of Systematic Bacteriology, and the GIDEON (Global Infectious Diseases and Epidemiology Online Network) database, which summarises a plethora of clinically relevant traits of bacteria. The project was supported by funds from the German Center for Infection Research (Deutsches Zentrum für Infektionsforschung, DZIF).
"The prediction performance is not equally high for all 67 phenotypes, since the quality of some of the underlying data varies across phenotypes," says Dr Andreas Bremges, who is a postdoctoral scientist in the team of Alice McHardy. "As soon as new data become available, we adapt the software in an ongoing process. And it can be extended to include additional phenotypes at any time. However, the primary objective of Traitar is to give scientists some initial evidence indicating interesting traits the bacteria under investigation may have." Accordingly, the software facilitates a rapid and targeted start into the characterisation and thus reduces the cost of required laboratory experiments.
The software is already in use, for example in a recently started cooperation project with Prof Susanne Häußler from the HZI. Häußler's "Molecular Bacteriology" Department characterised a large number of strains of the pathogen Pseudomonas aeruginosa. It can cause severe acute infections and shows a high level of resistance to many antibiotics. The detailed phenotypic and genetic data collected for this pathogen by Häußler will now be analyzed with Traitar. "In the future, we will need to have automated systems making accurate predictions of antibiotic resistances of Pseudomonas aeruginosa to be able to select a suitable therapy for the patient," says Häußler. "In addition, we hope to be able to elucidate so far unknown resistance mechanisms through the help of Traitar."
Original publication:
Aaron Weimann, Kyra Mooren, Jeremy Frank, Phillip B. Pope, Andreas Bremges, Alice C. McHardy: From Genomes to Phenotypes: Traitar, the Microbial Trait Analyzer. mSystems, 2016, DOI: 10.1128/mSystems.00101-16