Predicting species distributions from herbarium collections: does climate bias in collection sampling influence model outcomes?
Journal of Biogeography
Aimâ€š Species distribution models and geographical information system (GIS) technologies are becoming increasingly important tools in conservation planning and decision-making. Often the rich data bases of museums and herbaria serve as the primary data for predicting species distributions. Yet key assumptions about the primary data often are untested, and violation of such assumptions may have consequences for model predictions. For example, users of primary data assume that sampling has been random with respect to geography and environmental gradients. Here we evaluate the assumption that plant voucher specimens adequately sample the climatic gradient and test whether violation of this assumption influences model predictions. Locationâ€š Bolivia and Ecuador. MethodsÃ¢â‚¬â€š Using 323,711 georeferenced herbarium collections and nine climatic variables, we predicted the distribution of 76 plant species using maximum entropy models (MAXENT) with training points that sampled the climate environments randomly and training points that reflected the climate bias in the herbarium collections. To estimate the distribution of species, MAXENT finds the distribution of maximum entropy (i.e. closest to uniform) subject to the constraint that the expected value for each environmental variable under the estimated distribution matches its empirical average. The experimental design included species that differed in geographical range and elevation; all species were modelled with 20 and 100 training points. We examined the influence of the number of training points and climate bias in training points, elevation and range size on model performance using analysis of variance models. Resultsâ€š We found that significant parts of the climatic gradient were poorly represented in herbarium collections for both countries. For the most part, existing climatic bias in collections did not greatly affect distribution predictions when compared with an unbiased data set. Although the effects of climate bias on prediction accuracy were found to be greater where geographical ranges were characterized by high spatial variation in the degree of climate bias (i.e. ranges where the bias of the various climates sampled by collections deviated considerably from the mean bias), the greatest influence on model performance was the number of presence points used to train the model. Main conclusionsâ€š These results demonstrate that predictions of species distributions can be quite good despite existing climatic biases in primary data found in natural history collections, if a sufficiently large number of training points is available. Because of consistent overprediction of models, these results also confirm the importance of validating models with independent data or expert opinion. Failure to include independent model validation, especially in cases where training points are limited, may potentially lead to grave errors in conservation decision-making and planning.