| Qualification Type: | PhD |
|---|---|
| Location: | Exeter |
| Funding for: | UK Students |
| Funding amount: | £22,113 UK tuition fees and an annual tax-free stipend of £22,113 per year |
| Hours: | Full Time |
| Placed On: | 11th March 2026 |
|---|---|
| Closes: | 3rd April 2026 |
| Reference: | 5814 |
CD3 is a new, multidisciplinary and multi-institutional strategic national research programme dedicated to using data to transform our understanding of cancer risk and enable early interception of cancers. It represents a major, multi-million-pound flagship investment funded through a strategic programme award by Cancer Research UK, the National Institute for Health and Care Research (NIHR), Engineering and Physical Sciences Research Council (EPSRC), and the Peter Sowerby Foundation; in partnership with Health Data Research UK (HDR UK) and the Economic and Social Research Council’s Administrative Data Research UK programme (ADR UK). This studentship is one of a number attached to this programme and one of three linked projects addressing issues related to missing data.
Early cancer diagnosis is often challenging for patients presenting with vague, non-specific symptoms that may be linked to multiple cancer sites. This project aims to improve diagnostic decision-making in such patients by understanding how cancer risk prediction models are influenced by missing and incomplete symptom data recorded in electronic health records (EHRs) and developing methods to address any issues. Unlike standard missing data problems (e.g., missing height or lab results), researchers often do not know when information on symptoms is missing. The usual approach is to assume that if there is no code for symptoms recorded in the dataset then the symptoms were not present. However, we know that some clinicians are more likely than others to record symptoms in coded form.
Using large-scale linked electronic health record data, mixed-effects models will be employed to quantify the extent of variation between general practices and individual clinicians. Temporal analyses will assess how these patterns change over time. Building on these findings, the project will quantify how different patterns of missingness may impact risk prediction model performance and calibration. Novel methods will be developed to incorporate incomplete or uncertain information, including delta-adjustment imputation and other approaches that explicitly model symptom recording probabilities.
Applicants should be able to demonstrate excellent analytical and programming skills (for example in Stata, R or Python), experience working with data, and an enthusiasm for interdisciplinary research that bridges data science, healthcare, and population health.
The student will have the opportunity to attend the structured Early Detection Training Programme (run in partnership with the Alliance for Cancer Early Detection (ACED)), providing PhD students with a comprehensive foundation to cancer early detection.
Type / Role:
Subject Area(s):
Location(s):