Accurate summarisation of threats for conservation evidence literature
This is an idea proposed in 2024 as a Cambrige Computer Science Part III or MPhil project, and is currently being worked on by Kittson Hamill. It is supervised by Anil Madhavapeddy and Sadiq Jaffer.
At the Conservation Evidence Copilots project, we are interested in constructing a taxonomy of threats to wildlife from the literature. This involves scanning the body of conservation literature and gathering/synthesising evidence for conservation interventions from a threats perspective. Once the text has been retrieved, it needs to be summarised in a way that is accurate, concise and relevant and verified with human experts. This is particularly important for conservation evidence, where the key findings need to be communicated clearly to inform policy and practice.
This project therefore investigates how to generate threats, and to verify their accuracy as generated by LLMs and RAG pipelines from the CE literature. Our goal is to develop a pipeline that can reliably go from extracting relevant information from text to a summary that is verifiably (by a human) correct.
Related Reading
- The Ragas framework for RAG evaluation
- CheckEmbed: Effective Verification of LLM Solutions to Open Ended Tasks, arxiv:2406.02524v2, June 2024
- Calibrating Sequence Likelihood Improves Conditional Language Generation, arxiv:2210.00045, September 2000
Related News
- Conservation Evidence Copilots / Jan 2024