|Funding for:||UK Students, EU Students|
|Placed On:||26th July 2021|
|Closes:||4th October 2021|
One of the main challenges in AI today is that of autonomous sequential decision-making: how can we give algorithms the ability to decide what actions to take whilst interacting with an uncertain environment in order to achieve a goal? Remarkable developments in this direction over the last few years have relied on deep reinforcement learning, which is based on the mathematical formalism of Markov decision processes, using artificial neural networks as flexible function approximators.
Many real-world applications are characterised by the interplay of multiple decision-makers that operate in the same shared-resources environment and need to accomplish goals cooperatively. Some of the most advanced industrial multi-agent systems in the world today are assembly lines and warehouse management systems. Whether the agents are robots, autonomous vehicles or clinical decision-makers, there is a strong desire for and increasing commercial interest in these systems: they are attractive because they can operate on their own in the world, alongside humans, under realistic constraints.
Multi-agent reinforcement learning has been studied since the 1990s; however, the last five years have been characterised by a remarkable boost in academic and commercial activity, fuelled by ground-breaking advances in deep neural networks along with the increasing power and decreasing cost of computing. The fast-developing area of multi-agent deep reinforcement learning has emerged to extend DRL to teams of autonomous agents. However, apart from a handful of highly specialised systems, the number of real-world applications powered by MADRL has still been limited.
As part of this PhD project, which is part of a UKRI Turing AI Acceleration Fellowship, you contribute to the emerging area of MADRL with a view to unleashing its full potential. You will consider the cooperative MADRL problem, in which a system of several learning agents must jointly optimise a single reward signal – the team reward – accumulated over time. Each agent has local autonomy: it can access its local observations and choose actions from its own action space. One of the most significant challenges in this context is how to foster collaborative behaviour within the system. The fundamental enabler of cooperative multi-agent skills is the ability to develop adequate communication. In previous work, we have demonstrated how explicit communication patterns emerge in systems equipped with a differential memory learned end-to-end through policy gradient methods. Even when every agent has access to every other agent’s observations, communication mechanisms still need to be learned for the task at hand to improve coordination because the information that agents possess at a given time may be noisy or not necessarily relevant regarding informing other agents’ decisions.
In this PhD project, you will develop a general graph-based framework to facilitate efficient multi-agent communication, enable learning using sparse rewards and build a relational representation of the environment. You will be joining a larger research team based at WMG at the University of Warwick working on various deep reinforcement learning problems and will support the development of an open-source library of multi-agent tasks with strong connections to industry.
Candidates should have an MSc in Statistics, Computer Science, Engineering or similar quantitative background and very strong and demonstrable programming skills especially in Python.
For informal enquires please contact Professor Giovanni Montana: email@example.com.
Funding - WMG
Funding Duration - 3.5 years
Stipend - Standard PhD at UKRI rates: £15,285
Type / Role: