Distance Methods for Phylogenetic Prediction

David W. Mount

doi:10.1101/pdb.top33

Distance Methods for Phylogenetic Prediction

David W. Mount

Adapted from “Phylogenetic Prediction,” Chapter 7, in Bioinformatics: Sequence and Genome Analysis, 2nd edition, by David W. Mount. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, USA, 2004.

INTRODUCTION

Phylogenetic analysis of a multiple sequence alignment (msa) can be performed using distance methods, which are based on genetic distances between sequence pairs in an msa. The genetic distance between two sequences is the fraction of aligned positions in which the sequence has been changed. In contrast, sequence identity is the fraction of the aligned positions that are identical. Gaps may be ignored in distance calculations or treated like substitutions. A scoring or substitution matrix may also be used, making the calculation slightly more complicated, although the principle is the same. Sequence pairs that have the smallest distances are “neighbors.” On a tree, these sequences share a node or common ancestor position and are each joined to that node by a branch.