Summary. Conservation Evidence has screened 1.6m+ scientific papers on conservation, as well as manually summarising 8600+ studies relating to conservation actions. However, progress is limited by the specialised skills needed to screen and summarise relevant studies -- it took more than 75 person years to manually curate the current database and only a few 100 papers can be added each year! We are working on AI-driven techniques to accelerate addition of robust evidence to the CE database via automated literature scanning, LLM-based copilots and scanning of grey literature. We aim to provide co-pilots that augment human decision making to figure out how to categorise interventions much more quickly and accurately, and ultimately accelerate the positive impact of conservation actions.
Aims. The goal of the Conservation Evidence project is to transform conservation so that evidence is routinely embedded in decisions to improve outcomes for biodiversity and society. CE is becoming the authoritative, most comprehensive, freely available platform for evidence-led conservation and is starting to profoundly change the way in which conservationists access and use evidence for improving the state of the planet.
The CE collation and synthesis work has significantly improved the availability of evidence for use in conservation practice and remains the only resource of evidence synopses for biodiversity conservation and the largest database of effectiveness reviews of actions outside the field of medicine. The approach of carrying out reviews on an industrial scale means that they can carry out reviews for a fraction (~2%) of the costs in comparable fields, such as medicine. Using subject-wide evidence synthesis, CE systematically searches the literature and summarise results from (and provides citations for) each study testing the effectiveness of an action. As of April 2024, CE has read 1.6 million paper titles in 17 languages (326 non-english journals) and reviewed evidence for >3600 conservation actions, freely available on their website, with collaboration from over 380 international academics and practitioners.
We got involved from computer science in 2023 as part of the AI@CAM competition to harness the momentum behind machine learning to accelerate conservation actions. Our overall aim is to help CE to dramatically accelerate their data searching and data extraction pipelines. Currently, the searching of literature and summarising of key data is undertaken by human experts. Although this method of working is time consuming, it does benefit from being thorough and replicable. The main difficulties come in the subtleties of deciphering study designs, methodologies and whether controls are actually appropriate for testing the effectiveness of the specified action. Any LLM-based automation that we deploy must account for these as part of the validation pipeline.
The collaboration originally began in 2022 as part of the Computer Science 1B group projects, when Bill Sutherland, Sam Reynolds and Alec Christie from Zoology proposed a group project related to CE. A team of undergraduate students (including Jamie Cao) trained an ML model to facilitate searching for papers and indexing relevant articles by species and habitat. After the group project completed with encouraging results, Sadiq Jaffer and I joined the collaboration and -- with help from the Cambridge Office for Scholarly Communication -- built up a comprehensive (and legal!) corpus of millions of academic papers related to conservation evidence.
In the summer of 2024 and beyond, we are performing the first cut of training the classifiers across this corpus, and building out an LLM-based co-pilot that can assist human experts. We are joined in the summer of 2024 by three CST undergraduates: Radhika Iyer, Shrey Biswas and Kacper Michalik who are building out various elements of the system.
[»] Can Large Language Models facilitate evidence-based decision support for conservation? |