Conservation Evidence Copilots

The Conservation Evidence team at the University of Cambridge has spent years screening 1.6m+ scientific papers on conservation, as well as manually summarising 8600+ studies relating to conservation actions. However, progress is limited by the specialised skills needed to screen and summarise relevant studies -- it took more than 75 person years to manually curate the current database and only a few 100 papers can be added each year! We are working on AI-driven techniques to accelerate addition of robust evidence to the CE database via automated literature scanning, LLM-based copilots and scanning of grey literature. We aim to provide co-pilots that augment human decision making to figure out how to categorise interventions much more quickly and accurately, and ultimately accelerate the positive impact of conservation actions.

The goal of the Conservation Evidence project is to transform conservation so that evidence is routinely embedded in decisions to improve outcomes for biodiversity and society. CE is becoming the authoritative, most comprehensive, freely available platform for evidence-led conservation and is starting to profoundly change the way in which conservationists access and use evidence for improving the state of the planet.

The CE collation and synthesis work has significantly improved the availability of evidence for use in conservation practice and remains the only resource of evidence synopses for biodiversity conservation and the largest database of effectiveness reviews of actions outside the field of medicine. The approach of carrying out reviews on an industrial scale means that they can carry out reviews for a fraction (~2%) of the costs in comparable fields, such as medicine. Using subject-wide evidence synthesis, CE systematically searches the literature and summarise results from (and provides citations for) each study testing the effectiveness of an action. As of April 2024, CE has read 1.6 million paper titles in 17 languages (326 non-english journals) and reviewed evidence for >3600 conservation actions, freely available on their website, with collaboration from over 380 international academics and practitioners.

Accelerating literature surveys with LLMs

We got involved from computer science in 2023 as part of the AI@CAM competition to harness the momentum behind machine learning to accelerate conservation actions. Our overall aim is to help CE to dramatically accelerate their data searching and data extraction pipelines. Currently, the searching of literature and summarising of key data is undertaken by human experts. Although this method of working is time consuming, it does benefit from being thorough and replicable. The main difficulties come in the subtleties of deciphering study designs, methodologies and whether controls are actually appropriate for testing the effectiveness of the specified action. Any LLM-based automation that we deploy must account for these as part of the validation pipeline.

Our evaluation of LLM performance against human experts on conservation intervention questions showed that properly configured LLMs with retrieval augmentation can achieve competitive performance on evidence synthesis tasks. However, out-of-the-box general LLMs performed poorly and risk misinforming decision-makers, reinforcing our commitment to careful validation and human-in-the-loop approaches.

The collaboration originally began in 2022 as part of the Computer Science 1B group projects, when Bill Sutherland, Sam Reynolds and Alec Christie from Zoology proposed a group project related to CE. A team of undergraduate students (including Jamie Cao) trained an ML model to facilitate searching for papers and indexing relevant articles by species and habitat. After the group project completed with encouraging results, Sadiq Jaffer and I joined the collaboration and -- with help from the Cambridge Office for Scholarly Communication -- built up a comprehensive (and legal!) corpus of millions of academic papers related to conservation evidence.

Through 2024, we evaluated ten different LLMs against human experts using the CE database, leading to our LLM evaluation paper showing promising but cautious results. We were joined in the summer of 2024 by three CST undergraduates: Radhika Iyer, Shrey Biswas and Kacper Michalik who built out various elements of the system. Moving into 2025, our focus shifts to production deployment of hybrid retrieval systems while maintaining rigorous validation against the expanding challenges of AI contamination in scientific literature.

Challenges in the AI-Evidence Era

Our work on LLM-based evidence synthesis has gained urgency in 2025 as the body of scientific literature faces challenges from AI-contaminated papers. Recent analysis suggests that up to 2% of submitted papers may be AI-generated, with some estimates potentially much higher. This contamination poses particular risks for evidence databases like CE, as fake papers with plausible-sounding conservation interventions could mislead decision-makers if incorporated without rigorous validation.

As AI generation becomes more sophisticated, our validation pipeline therefore needs to evolve beyond "just" simple detection to robust experimental design verification and cross-reference validation.

Technology for Conservation, Not Division

We also conducted a horizon scan and as outlined in our response to concerns about AI dividing conservation, we are committed to developing CE tools that:

Respect and amplify human expertise rather than replacing it via "human-in-the-loop" methods
Follow participatory design principles with conservation practitioners
Maintain open source and open data approaches with thorough documentation to facilitate reproducible outputs
Address capacity building needs, particularly in the Global South with respect to AI capability
Keep conservation goals and not short term technology trends at the center of our research.

Our AI@CAM interview highlights this approach: we are building detailed models of the world that can be queried by policy makers to help make informed decisions. The technology serves the evidence, and the evidence serves conservation practices.

# 1st Jan 2024

projects ai biodiversity conservation evidence llms

Relevant Research Ideas

This will be of interest to those wanting to work on LLMs and literature scanning, as a practical and very impactful application.

Evaluating a human-in-the-loop AI framework to improve inclusion criteria for evidence synthesis
Available and cosupervised with Alec Christie and Sadiq Jaffer
Evaluating LLMs for providing evidence-based information on conservation actions
Currently ongoing with Radhika Agrawal and cosupervised with Alec Christie and Sadiq Jaffer
Accurate summarisation of threats for conservation evidence literature
Currently ongoing (MPhil) with Kittson Hamill and cosupervised with Sadiq Jaffer
Generating chunk-free embeddings for LLMs
Currently ongoing (MPhil) with Mark Jacobsen and cosupervised with Sadiq Jaffer
Evaluating RAG pipelines for conservation evidence
Completed by Radhika Iyer and cosupervised with Sadiq Jaffer in 2024
Crawling grey literature for conservation evidence
Completed by Shrey Biswas and Kacper Michalik and cosupervised with Sadiq Jaffer in 2024
Assessing mangrove literature for conservation evidence
Expired (Part II) and cosupervised with Sadiq Jaffer and Tom Worthington
Spatial and multi-modal extraction from conservation literature
Expired (MPhil) and cosupervised with Sadiq Jaffer, Alec Christie and Bill Sutherland

Anil Madhavapeddy, Professor of Planetary Computing

Conservation Evidence Copilots

Accelerating literature surveys with LLMs

Challenges in the AI-Evidence Era

Technology for Conservation, Not Division

Related News

Careful design of Large Language Model pipelines enables expert-level retrieval of evidence-based information from syntheses and databases / May 2025

Technology needs to unite conservation, not divide it / Apr 2025

Fake papers abound in the literature (via The Conversation)/ Feb 2025

The potential for AI to revolutionize conservation: a horizon scan / Dec 2024

Interview with AI@CAM about conservation (via AI@Cam)/ Jun 2024

Relevant Research Ideas