Misleading results of likelihood‐based phylogenetic analyses in the presence of missing data

Mark P. Simmons

DOI:10.1111/j.1096-0031.2011.00375.x
Corpus ID: 53123024

Misleading results of likelihood‐based phylogenetic analyses in the presence of missing data

@article{Simmons2012MisleadingRO,
  title={Misleading results of likelihood‐based phylogenetic analyses in the presence of missing data},
  author={Mark P. Simmons},
  journal={Cladistics},
  year={2012},
  volume={28},
  url={https://api.semanticscholar.org/CorpusID:53123024}
}

Mark P. Simmons
Published in Cladistics 1 April 2012
Biology

This study uses contrived and simulated examples to demonstrate that likelihood, even when applied to simple matrices with little or no homoplasy, homogeneous evolution across groups of characters, perfect model fit, and hundreds or thousands of variable characters, can provide strong support for incorrect topologies when the matrices have non‐random distributions of missing data distributed across all partitions.

View on Wiley

onlinelibrary.wiley.com

113 Citations

Highly Influential Citations

Background Citations

Methods Citations

Results Citations

Limitations of locally sampled characters in phylogenetic analyses of sparse supermatrices.

Mark P. Simmons

Biology

Molecular phylogenetics and evolution

2014

Radical instability and spurious branch support by likelihood when applied to matrices with non-random distributions of missing data.

Mark P. Simmons

Biology

Molecular phylogenetics and evolution

2012

Disparate parametric branch-support values from ambiguous characters.

Mark P. SimmonsC. P. Randle

Biology

Molecular phylogenetics and evolution

2014

A confounding effect of missing data on character conflict in maximum likelihood and Bayesian MCMC phylogenetic analyses.

Mark P. Simmons

Biology

Molecular phylogenetics and evolution

2014

The Impact of Missing Data on Species Tree Estimation.

Zhenxiang XiLiang LiuCharles C. Davis

Biology

Molecular biology and evolution

2016

It is demonstrated that concatenation (RAxML), gene-tree-based coalescent (ASTRAL, MP-EST, and STAR), and supertree (matrix representation with parsimony [MRP]) methods perform reliably, so long as missing data are randomly distributed and that a sufficiently large number of genes are sampled.

Do missing data influence the accuracy of divergence-time estimation with BEAST?

Yuchi ZhengJ. Wiens

Biology

Molecular phylogenetics and evolution

2015

Differences between hard and soft phylogenetic data

Robert S. SansomM. Wills

Biology

Proceedings of the Royal Society B: Biological…

2017

When building the tree of life, variability of phylogenetic signal is often accounted for by partitioning gene sequences and testing for differences. The same considerations, however, are rarely…

Phylogenetic inference using discrete characters: performance of ordered and unordered parsimony and of three-item statements

Anaïs GrandAdèle CorvezLina María VélezM. Laurin

Biology

2013

The results suggest that the hierarchical character representation not only results in the greatest resolving power, but also in the highest artefactual resolution, both with the simulated and empirical data.

Divergence and support among slightly suboptimal likelihood gene trees

Mark P. SimmonsJohn Kessenich

Biology, Computer Science

Cladistics : the international journal of the…

2020

Contemporary phylogenomic studies frequently incorporate two‐step coalescent analyses wherein the first step is to infer individual‐gene trees, generally using maximum‐likelihood implemented in the…

The effects of subsampling gene trees on coalescent methods applied to ancient divergences.

Mark P. SimmonsDaniel B. SloanJ. Gatesy

Biology

Molecular phylogenetics and evolution

2016

Missing data in phylogenetic analysis: reconciling results from simulations and empirical data.

J. WiensMatthew Morrill

Biology

Systematic biology

2011

Previous simulation and empirical studies showing that taxa with extensive missing data can be accurately placed in phylogenetic analyses and that adding characters with missing dataCan be beneficial can be beneficial (at least under some conditions) are confirmed.

Missing data and the design of phylogenetic analyses

J. Wiens

Biology

J. Biomed. Informatics

2006

Does Adding Characters with Missing Data Increase or Decrease Phylogenetic Accuracy ?

J. Ohn

Biology

2003

The results show that the addition of a set of characters with missing data is generally more likely to increase phylogenetic accuracy than decrease it, but the potential benets of adding these characters quickly disappear as the proportion of missing data increases, and it is suggested that accuracy can be increased to a surprising degree.

PROBLEMS DUE TO MISSING DATA IN PHYLOGENETIC ANALYSES INCLUDING FOSSILS: A CRITICAL REVIEW

M. KearneyJames M. Clark

Biology

2003

Missing data simply represent the unknown and should not be viewed as an impediment to considering all available evidence in phylogenetic analyses, nor used as justification for excluding specific taxa or characters.

Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous

Bryan D. KolaczkowskiJ. W. Thornton

Biology

Nature

2004

It is shown that maximum likelihood and BMCMC can become strongly biased and statistically inconsistent when the rates at which sequence sites evolve change non-identically over time.

Missing data, incomplete taxa, and phylogenetic accuracy.

J. Wiens

Biology

Systematic biology

2003

In this study, simulations are used to show that the reduced accuracy associated with including incomplete taxa is caused by these taxa bearing too few complete characters rather than too many missing data cells, and suggest a more effective strategy for dealing with incompleteTaxa.

Efficiently resolving the basal clades of a phylogenetic tree using Bayesian and parsimony approaches: a case study using mitogenomic data from 100 higher teleost fishes.

Mark P. SimmonsM. Miya

Biology

Molecular phylogenetics and evolution

2004

The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference

Alan R. LemmonJeremy M. BrownK. Stanger-HallE. Lemmon

Biology

Systematic biology

2009

The results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruction, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis.

Effects of data incompleteness on the relative performance of parsimony and Bayesian approaches in a supermatrix phylogenetic reconstruction of Mustelidae and Procyonidae (Carnivora)

M. WolsanJ. Sato

Biology

Cladistics : the international journal of the…

2010

Parsimony and Bayesian analyses on a mustelid–procyonid molecular supermatrix found no compelling evidence in support of a relationship between the inferior performance of parsimony and taxon incompleteness, and the relatively good performance of the analyses may be related to the large number of sampled characters.

Quantification of the success of phylogenetic inference in simulations

Mark P. SimmonsC. Webb

Biology

2006

This method represents an improvement relative to the commonly used approaches of quantifying the percentage of clades that are correctly resolved in the inferred trees or presenting the Robinson–Foulds distance between the inferred Trees and the correct tree.

Misleading results of likelihood‐based phylogenetic analyses in the presence of missing data

113 Citations

Limitations of locally sampled characters in phylogenetic analyses of sparse supermatrices.

Radical instability and spurious branch support by likelihood when applied to matrices with non-random distributions of missing data.

Disparate parametric branch-support values from ambiguous characters.

A confounding effect of missing data on character conflict in maximum likelihood and Bayesian MCMC phylogenetic analyses.

The Impact of Missing Data on Species Tree Estimation.

Do missing data influence the accuracy of divergence-time estimation with BEAST?

Differences between hard and soft phylogenetic data

Phylogenetic inference using discrete characters: performance of ordered and unordered parsimony and of three-item statements

Divergence and support among slightly suboptimal likelihood gene trees

The effects of subsampling gene trees on coalescent methods applied to ancient divergences.

73 References

Missing data in phylogenetic analysis: reconciling results from simulations and empirical data.

Missing data and the design of phylogenetic analyses

Does Adding Characters with Missing Data Increase or Decrease Phylogenetic Accuracy ?

PROBLEMS DUE TO MISSING DATA IN PHYLOGENETIC ANALYSES INCLUDING FOSSILS: A CRITICAL REVIEW

Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous

Missing data, incomplete taxa, and phylogenetic accuracy.

Efficiently resolving the basal clades of a phylogenetic tree using Bayesian and parsimony approaches: a case study using mitogenomic data from 100 higher teleost fishes.

The Effect of Ambiguous Data on Phylogenetic Estimates Obtained by Maximum Likelihood and Bayesian Inference

Effects of data incompleteness on the relative performance of parsimony and Bayesian approaches in a supermatrix phylogenetic reconstruction of Mustelidae and Procyonidae (Carnivora)

Quantification of the success of phylogenetic inference in simulations

Related Papers