Personal tools
You are here: Home Publikationen Highly parallelized inference of large genome-based phylogenies
Navigation
 
Document Actions

Jan P Meier-Kolthoff, Alexander F Auch, Hans-Peter Klenk, and Markus Göker (2013)

Highly parallelized inference of large genome-based phylogenies

Concurrency and Computation: Practice and Experience.

Genome Blast Distance Phylogeny (GBDP) infers distances and phylogenetic relationships between organisms from completely or partially sequenced genomes. It is well suited for parallelization as pairwise distances are calculated independently. As exemplar data for a high-performance cluster implementation that executes many pairwise genome comparisons in parallel, we here used sequences from the Genomic Encyclopedia of Bacteria and Archaea project. Phylogenies were inferred from genome-scale nucleotide and amino acid data with all variants of GBDP, including novel adaptations to amino acid sequences and approaches yielding trees with branch support. The dependency of phylogenetic accuracy, average branch support as well as performance indicators such as running time and disk space consumption on details of genome comparison, distance calculation, and phylogenetic inference was examined in detail. If combined with conservative measures for branch support, GBDP appears to infer reasonable phylogenetic relationships of microorganisms with a comparatively low computational cost. Due to the linear speed-up of the cluster, benchmarks reveal an overall computation time of less than 24 h required for the 7750 pairwise genome/proteome comparisons of the Genomic Encyclopedia of Bacteria and Archaea data set that is opposed to an estimated running time of about 30 days for the non-parallelized version.


Powered by Plone CMS, the Open Source Content Management System