Applying Multimodal Foundation Models to Identify Small and Not Well-Defined Objects
Entry requirements
Months of entry
Anytime
Course content
The field of computer vision has witnessed remarkable advancements, fuelled by the development of large-scale foundation models. These models, capable of processing and understanding multiple modalities such as text and images, have opened up new possibilities for a wide range of applications. One particularly promising area is the identification of small and not well-defined objects, which has significant implications for fields like medical imaging, healthcare, and remote sensing.
Objectives:
Our first objective is to design and optimize a multi-modal vision model that integrates both visual and textual data to enhance the detection of small, ambiguous objects in medical images. Specifically, we aim to improve the accuracy of identifying early-stage lesions by at least 10% over current benchmarks. This research will leverage state-of-the-art fine-tuning techniques and data fusion methods, with initial development and testing scheduled within the first 18 months. By advancing detection capabilities in medical imaging, we hope to contribute directly to early disease diagnosis and improved patient outcomes.
· The second objective focuses on developing a robust, end-to-end computer vision pipeline tailored for remote sensing applications. Our goal is to accurately identify subtle features such as minor infrastructural elements and environmental changes, reducing false positives by approximately 15%. By incorporating recent advancements in model interpretability and multi-modal learning, this pipeline will be designed to deliver transparent and reliable results that domain experts can trust. We plan to validate and benchmark the system within the first two years, ensuring its practical impact in environmental monitoring and disaster management.
· Our third objective is to assess and enhance the generalizability of large-scale foundation models across diverse domains, ranging from medical imaging to remote sensing. We aim to achieve an average detection performance improvement of 15% on multiple benchmark datasets by systematically adapting and refining these models for different applications. Through comprehensive cross-validation and iterative model enhancements throughout the PhD program, this research will offer valuable insights into model adaptability, ultimately ensuring that these advanced models are effective across various high-impact real-world scenarios.
Fees and funding
This programme is self-funded.
Qualification, course duration and attendance options
- PhD
- full time36 months
- Campus-based learningis available for this qualification
- part time60 months
- Campus-based learningis available for this qualification
Course contact details
- Name
- SEE PGR Support
- PGR-SupportSSEE@salford.ac.uk