Spatial and multi-modal extraction from conservation literature
This is an idea proposed in 2024 as a Cambrige Computer Science Part III or MPhil project, and is available for being worked on. It will be supervised by Anil Madhavapeddy, Sadiq Jaffer, Alec Christie and Bill Sutherland.
The Conservation Evidence Copilots database contains information on numerous conservation actions and their supporting evidence. We also have access to a large corpus of academic literature detailing species presence and threats which we have assembled in Cambridge in collaboration with the various journal publishers.
This MPhil project aims to combine these published literature resources with geographic information to propose conservation interventions. The goal is to identify actions that are likely to be effective based on prior evidence and have the potential to produce significant gains in biodiversity. This approach should then enhance the targeting and impact of future conservation efforts and make them more evidence driven.
To realize this project, several key components need to be developed, each of which could constitute an MPhil project in its own right:
- Firstly, a pipeline needs to be constructed to extract actions, threats, and species information from the literature, aligning with the Conservation Evidence taxonomy. This would involve natural language processing and information extraction techniques, possibly involving LLMs.
- Secondly, the project requires multimodal models capable of analyzing both text and visual elements (such as maps and graphs) in scientific papers to identify relevant conservation data.
- Thirdly, a predictive model needs to be developed to assess the potential efficacy of conservation interventions. This model would be based on the Conservation Evidence database and should provide reasoning for its predictions, potentially utilizing techniques in explainable AI and causal inference.
If you're interested in applying machine learning and LLM techniques to global conservation, then get in touch about the above or any other ideas you might have.
Related Reading
- The Ragas framework for RAG evaluation
- CheckEmbed: Effective Verification of LLM Solutions to Open Ended Tasks, arxiv:2406.02524v2, June 2024
- Calibrating Sequence Likelihood Improves Conditional Language Generation, arxiv:2210.00045, September 2000
Related News
- Conservation Evidence Copilots / Jan 2024