PhD Studentship: Reinforcement Learning Tool for Large Language Model in Collaboration Games

Coventry University Group

Qualification Type:	PhD
Location:	Coventry
Funding for:	UK Students, EU Students, International Students
Funding amount:	Not Specified
Hours:	Full Time

Placed On:	2nd September 2025
Closes:	25th October 2025

Can We Teach AI to Outsmart Humans in the Werewolf Game—Without Changing the AI Itself?

Large Language Models (LLMs) have dazzled us with their ability to converse, code, and create—but they still struggle in areas where humans excel: reasoning about other players, forming alliances, and making long-term strategic decisions. A prime example? The social deduction game Werewolf (also known as Mafia). Even the most advanced AI systems falter against skilled human players.

That’s about to change.

In the same way AlphaGo revolutionised board game AI by teaching itself to play Go at a superhuman level, our project seeks to bring self-learning to LLMs. But there’s a catch—unlike Go, there’s no easy way to score an LLM’s conversational move. In Go, the score is clear. In open-ended language games? Not so much.

The Breakthrough Idea

Instead of trying to score every AI utterance directly, we focus on game outcome—whether the villagers win, whether the werewolves outwit everyone. We break each playthrough into partial game logs and link them to the final result. This gives us grounded, reliable feedback: we know which sequences of actions led to a win or a loss.

From these partial logs, we learn a “hidden” (latent) state representation of the game and a way to map that state to a value—essentially, how good the situation is for the AI at any given point. Once we have this, the AI can sample a range of possible next actions and evaluate their likely impact before deciding which to play.

Here’s the twist: all this happens outside the LLM. We don’t fine-tune the model, retrain it, or alter its weights. We simply wrap it with a clever layer of reasoning and evaluation. It’s like giving the AI a strategic co-pilot that helps it think ahead without changing its core personality.

Why This Matters

The implications reach far beyond a single parlour game. This approach gives LLMs the ability to account for the long-term consequences of their actions—something they currently find challenging. Imagine AI collaborators that:

Negotiate more effectively by anticipating the downstream effects of each statement.

Support complex decision-making in domains where success depends on multi-step strategy.

Learn to adapt through experience without costly retraining.

By proving the method in the challenging, high-interaction world of Werewolf, we create a benchmark for measuring and improving AI strategic reasoning. If it works there, it can work in corporate negotiations, policy simulations, cooperative robotics, and beyond.

Why Werewolf?

It’s a perfect storm for testing AI intelligence: incomplete information, shifting alliances, deceptive moves, and the need to read subtle cues in conversation. Winning isn’t about calculating a single best move—it’s about thinking several moves ahead, predicting how others will respond, and adjusting on the fly. That’s exactly the kind of capability we want LLMs to develop.

Join the Next Leap in AI Learning

This project offers a new pathway to grow AI intelligence—one that builds foresight into systems without modifying their underlying architecture. We’re bridging the gap between raw language ability and deep strategic reasoning.

Just as AlphaGo changed our understanding of what was possible in AI for games, we aim to change what’s possible for AI in collaboration, persuasion, and long-term planning. And it all starts with a simple, deceptively difficult question:

We value your feedback on the quality of our adverts. If you have a comment to make about the overall quality of this advert, or its categorisation then please send us your feedback

Advert information

Type / Role:

Subject Area(s):

Location(s):

PhD tools

PhD Alert Created

Job Alert Created

Your PhD alert has been successfully created for this search.

Your job alert has been successfully created for this search.

Ok Ok

PhD Alert Created

Job Alert Created

Your PhD alert has been successfully created for this search.

Your job alert has been successfully created for this search.

Manage your job alerts Manage your job alerts

Account Verification Missing

In order to create multiple job alerts, you must first verify your email address to complete your account creation

Request verification email Request verification email

jobs.ac.uk Account Required

In order to create multiple alerts, you must create a jobs.ac.uk jobseeker account

Create Account Create Account

Alert Creation Failed

Unfortunately, your account is currently blocked. Please login to unblock your account.

Email Address Blocked

We received a delivery failure message when attempting to send you an email and therefore your email address has been blocked. You will not receive job alerts until your email address is unblocked. To do so, please choose from one of the two options below.

Re-verify your email Update your email

Max Alerts Reached

A maximum of 5 Job Alerts can be created against your account. Please remove an existing alert in order to create this new Job Alert

Manage your job alerts Manage your job alerts

Creation Failed

Unfortunately, your alert was not created at this time. Please try again.

Ok Ok

Create PhD Alert

Create Job Alert

When you create this PhD alert we will email you a selection of PhDs matching your criteria.When you create this job alert we will email you a selection of jobs matching your criteria. Our Terms and Conditions and Privacy Policy apply to this service. Any personal data you provide in setting up this alert is processed in accordance with our Privacy Notice

Create PhD Alert

Create Job Alert

I would like to receive email communication from jobs.ac.uk relating to Company News I would like to receive email communication from jobs.ac.uk relating to Conferences, Meetings and Events

Save this job