Publications & Software



The Handbook of Statistical Genomics (4th ed), co-edited with Ida Moltke (Copenhagen) and John Marioni (Cambridge) was published by Wiley in 2019 (online and in print).  The Handbook of Statistical Genetics (1st ed 2001, 2nd ed 2003, 3rd ed 2007) were co-edited with Martin Bishop and Chris Cannings.  Sadly Chris passed away in 2018.

Handbook of Statistical Systems Biology, Mark Girolami, DJB, Michael Stumpf (eds), Wiley (2011)

  • Doug Speed’s LDAK software for calculating kinship coefficients from genome-wide SNP data, adjusted for linkage disequilibrium, is available here.
  • Will Astle’s FastMixedModel software for doing fast inference in linear mixed effects models is distributed in MixABEL module of GenABEL, which is easy to install from R via CRAN. Its capabilities are described within our review paper: Astle W, Balding DJ (2009) Population Structure and Cryptic Relatedness in Genetic Association Studies. STAT SCI 24(4), 451-471, doi:10.1214/09-STS307
  • HYPERLASSO software for simultaneous analysis of all SNPs and covariates in a GWAS can be found at the Bargen project webpage at the EBI. See Hoggart et al. PloS GENET, 2008, and also Cordell and Ayers, Genet. Epi. 34:879-91 (2010) (these authors performed a simulation study comparing several penalized regression methods for SNP association and found that the HYPERLASSO performed overall best).
  • FREGENE C++ software for simulating sequence-like data in large genomic regions in entire populations (say, 1 Mb in 100K individuals, or 20 Mb in 5K individuals), under a flexible range of scenarios for recombination (including gene conversion), demography (population growth and structure) and selection (directional and balancing) is available from the Bargen project webpage at the EBI.  See Hoggart et al, Genetics, 2007 and Chadeau-Hyam, BMC Bioinformatics (full references above).
  • HAPCLUSTER software for confirming and fine-mapping genetic associations in candidate regions, using haplotype clustering, is available from Thomas Mailund’s homepage.  The original R code described in Waldron, Whittaker and Balding (2006) only analysed phased haplotype data, and implemented a general risk model (separate risk for each genotype at the postulated causal locus).  Thomas has recoded it in C++ and extended it to handle unphased genotype data, to output the Bayes Factor assessing the evidence against the null hypothesis of no association, and to implement an allelic disease model.  The latter should give more power for near-multiplicative disease risks, but may be affected by deviations from Hardy-Weinberg equilibrium.  The latest version (2.1.5) also allows unphenotyped individuals, to facilitate imputation e.g. from HapMap individuals.
  • BAYESFST software for Bayesian hierarchical inference for Fst, described in Theoretical Population Biology63(3): 221-230, 2003 andMolecular Ecology13: 969-980, 2004, is available here.
  • MAC5 software for phylogenetic inference, described in McGuire, Denham & Balding (2001a,b) may be downloaded from here.
  • C programs described in Ayres & Balding, Heredity, 1998, and Ayres & Balding, Genetics, 2001, are available here.
  • BATWING software, a development of that described in Wilson & Balding, Genetics,1998, may be downloaded from Ian Wilson’s home page.