The "Genome BLAST Distance Phylogeny" is a whole genome phylogeny method, introduced in Henz et. al (2005). It basically consists of the following steps:
- All-against-all homology search using well known local alignment search tools like NCBI-BLAST, WU-BLAST or BLASTZ.
- Use of a bunch of dissimilarity functions to calculate dissimilarity matrices.
- Applying tree reconstruction algorithms like Neighbor-Joining, BioNJ, FastME, etc.
There also exists a web service to calculate in-silico-DDH (DNA-DNA hybridization) values based on GBDP.
Legal stuffThis software is currently under development and only available as a Beta version. Use at your own risk!
There is no support for this version. If you prefer a user-friendly application, please use the GGDC web server.
If you want to use this application, please cite the following paper:
Auch AF, Henz SR, Holland BR, and Göker M (2006). Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences. BMC Bioinformatics, 7:350.
GBDP is currently implemented in Java (using Version 1.6), but it is only tested under a Linux operating system. A jar-File can be downloaded which includes the GBDP classes. The snakeyaml 1.8 library is also needed.
GBDP is organized in two separate parts - a tool which handles the all-against-all BLAST runs, and another one that conducts the distance transformation.
- Running GBDPblaster
You need a config file gbdpblaster.ini in the directory where GBDPblaster is run from. All settings including input and output files can be changed in this config file.
The application can then be launched via:
java -Xmx512M -cp bg.jar:snakeyaml-1.8.jar auch.disttrans.GBDPblaster
- Running Matches2Distances
Description of the command line options.
To apply Greedy with trimming to the previously generated data, use the following command:
java -Xmx512M -cp bg.jar:snakeyaml-1.8.jar auch.disttrans.Matches2Distances -greedy2 -multiChromosomesOff lengths.zip cgviz.zip -outn outfile-prefix
The switch "-multiChromosomesOff" should be used in all cases where different filenames indicate different taxa. The default behaviour is to assume that each file in the same subdirectory of the genome search path is only a different chromosome belonging to the same organism.