Qualification Type: | PhD |
---|---|
Location: | Exeter |
Funding for: | UK Students, International Students |
Funding amount: | For eligible students the studentship will cover Home or International tuition fees plus an annual tax-free stipend of at least £19,237 for 3.5 years full-time, or pro rata for part-time study. |
Hours: | Full Time |
Placed On: | 13th May 2025 |
---|---|
Closes: | 9th June 2025 |
Reference: | 5467 |
Generative machine learning models have made significant progress in recent years. Typical examples include, for example, high-quality image or video generation using diffusion models (e.g., StableDiffusion) and large language models (LLMs) based on the transformer architecture [6] (e.g., ChatGPT). In general, the above generative models need considerable amount of computational resources in terms of GPUs and/or time in either training or inference procedures, which pose considerable challenges to both academia and industry for widespread access and deployment.
In particular, the sampling process of diffusion models usually needs to access a pre-trained model multiple times sequentially to generate high-quality images or videos, which is time-consuming. The training process of diffusion models is also considerably heavy in terms of computational resources and complexities. The common techniques for reducing computational resources include, for example, solving ordinary differential equations (ODEs) in the diffusion sampling process more accurately learning a student diffusion model from a teacher model for fast sampling, and incorporating auxiliary information to assist the sampling process. See recent results [1,3,4] for solving ODEs by Dr Guoqiang Zhang.
Training or fine-tuning of LLMs generally requires quite a few GPUs to accommodate both the model parameters and the training data, which is not always available to researchers. One active research topic is to reduce memory consumption in the training process to be able to reduce the number GPUs without performance degradation. The common techniques include, for example, parameter quantization, learning a student LLM from a teacher LLM for down-stream tasks, and design of reversible LLMs. A reversible LLM allows for online back-propagation when updating the model parameters, thus saving memory by avoiding the storage of intermediate activation values. See recent research results obtained by Dr Guoqiang Zhang in [2,5,6].
This PhD project is about effective training and inference of diffusion models and LLMs. The applicant should have a strong background in machine learning and maths.
Type / Role:
Subject Area(s):
Location(s):