TY - JOUR
T1 - To clean or not to clean
T2 - Cleaning open‐source data improves extinction risk assessments for threatened plant species
AU - Panter, Connor
AU - Clegg, Rosemary
AU - Moat, Justin
AU - Bachman, Steven
AU - Klitgard, Bente
AU - White, Rachel
PY - 2020/11/17
Y1 - 2020/11/17
N2 - Plants are under‐represented in conservation efforts, with only 9% of described species published on the IUCN Red List. Biodiversity aggregators including the Global Biodiversity Information Facility (GBIF) and the more recent Botanical Information and Ecology Network (BIEN) contain a wealth of potentially useful occurrence data. We investigate the influence of these data in accelerating plant extinction risk assessments for 225 endemic, near‐endemic, and socioeconomic Bolivian plant species. Geo‐referenced herbarium voucher specimens verified by taxonomic experts comprised our control data set. Open‐source data for 77 species was subjected to a two‐stage cleaning protocol (using an automated R package followed by a manual clean) and threat categories were computed based on extent of occurrence thresholds. Accuracy was the highest using cleaned GBIF data (76%) and uncleaned BIEN data (79%). Sensitivity was the highest for cleaned GBIF (73%) and BIEN (80%) data suggesting our cleaning protocol was essential to maximize sensitivity rates. Comparisons between the control, GBIF and BIEN data sets revealed a paucity of occurrence data for 148 species (66%), 72% of which qualified for a threatened category. Balancing data quantity and accuracy must be considered when using open‐source data. Filling data gaps for threatened species is a conservation priority to improve the coverage of threatened species within biodiversity aggregators.
AB - Plants are under‐represented in conservation efforts, with only 9% of described species published on the IUCN Red List. Biodiversity aggregators including the Global Biodiversity Information Facility (GBIF) and the more recent Botanical Information and Ecology Network (BIEN) contain a wealth of potentially useful occurrence data. We investigate the influence of these data in accelerating plant extinction risk assessments for 225 endemic, near‐endemic, and socioeconomic Bolivian plant species. Geo‐referenced herbarium voucher specimens verified by taxonomic experts comprised our control data set. Open‐source data for 77 species was subjected to a two‐stage cleaning protocol (using an automated R package followed by a manual clean) and threat categories were computed based on extent of occurrence thresholds. Accuracy was the highest using cleaned GBIF data (76%) and uncleaned BIEN data (79%). Sensitivity was the highest for cleaned GBIF (73%) and BIEN (80%) data suggesting our cleaning protocol was essential to maximize sensitivity rates. Comparisons between the control, GBIF and BIEN data sets revealed a paucity of occurrence data for 148 species (66%), 72% of which qualified for a threatened category. Balancing data quantity and accuracy must be considered when using open‐source data. Filling data gaps for threatened species is a conservation priority to improve the coverage of threatened species within biodiversity aggregators.
U2 - 10.1111/csp2.311
DO - 10.1111/csp2.311
M3 - Article
SN - 2578-4854
VL - 2
JO - Conservation Science and Practice
JF - Conservation Science and Practice
IS - 12
M1 - e311
ER -