Publications & Software

Publications
Click here for complete list of my publications on Google Scholar.

Books
2nd Edition of my book “Weight-of-evidence for Forensic DNA Profiles” (Wiley, 1st ed 2005) will appear in mid 2015 (Chris Steele is now a co-author). Corrections and comments on the 1st ed, and a glossary of acronyms, can be found here.

Handbook of Statistical Systems Biology, Mark Girolami, DJB, Michael Stumpf (eds), Wiley (2011)

Handbook of Statistical Genetics (3rd ed), co-edited by myself, Martin Bishop and Chris Cannings (Wiley, 2007)

  • Doug Speed’s LDAK software for calculating kinship coefficients from genome-wide SNP data, adjusted for linkage disequilibrium, is available here.
  • Will Astle’s FastMixedModel software for doing fast inference in linear mixed effects models is distributed in MixABEL module of GenABEL, which is easy to install from R via CRAN. Its capabilities are described within our review paper: Astle W, Balding DJ (2009) Population Structure and Cryptic Relatedness in Genetic Association Studies. STAT SCI 24(4), 451-471, doi:10.1214/09-STS307
  • HYPERLASSO software for simultaneous analysis of all SNPs and covariates in a GWAS can be found at the Bargen project webpage at the EBI. See Hoggart et al. PloS GENET, 2008, and also Cordell and Ayers, Genet. Epi. 34:879-91 (2010) (these authors performed a simulation study comparing several penalized regression methods for SNP association and found that the HYPERLASSO performed overall best).
  • FREGENE C++ software for simulating sequence-like data in large genomic regions in entire populations (say, 1 Mb in 100K individuals, or 20 Mb in 5K individuals), under a flexible range of scenarios for recombination (including gene conversion), demography (population growth and structure) and selection (directional and balancing) is available from the Bargen project webpage at the EBI.  See Hoggart et al, Genetics, 2007 and Chadeau-Hyam, BMC Bioinformatics (full references above).
  • HAPCLUSTER software for confirming and fine-mapping genetic associations in candidate regions, using haplotype clustering, is available from Thomas Mailund’s homepage.  The original R code described in Waldron, Whittaker and Balding (2006) only analysed phased haplotype data, and implemented a general risk model (separate risk for each genotype at the postulated causal locus).  Thomas has recoded it in C++ and extended it to handle unphased genotype data, to output the Bayes Factor assessing the evidence against the null hypothesis of no association, and to implement an allelic disease model.  The latter should give more power for near-multiplicative disease risks, but may be affected by deviations from Hardy-Weinberg equilibrium.  The latest version (2.1.5) also allows unphenotyped individuals, to facilitate imputation e.g. from HapMap individuals.
  • BAYESFST software for Bayesian hierarchical inference for Fst, described in Theoretical Population Biology63(3): 221-230, 2003 andMolecular Ecology13: 969-980, 2004, is available here.
  • MAC5 software for phylogenetic inference, described in McGuire, Denham & Balding (2001a,b) may be downloaded from here.
  • C programs described in Ayres & Balding, Heredity, 1998, and Ayres & Balding, Genetics, 2001, are available here.
  • BATWING software, a development of that described in Wilson & Balding, Genetics,1998, may be downloaded from Ian Wilson’s home page.