| Qualification Type: | PhD |
|---|---|
| Location: | Birmingham |
| Funding for: | UK Students |
| Funding amount: | A fully funded studentship is available for UK applicants covering fees and stipend. |
| Hours: | Full Time |
| Placed On: | 2nd February 2026 |
|---|---|
| Closes: | 30th April 2026 |
The availability of historical time series data on transport service provision is crucial for research into a range of topics. However, data for the majority of the period when public transport services have operated is only available in paper timetables or (at best) scanned image files. In order to undertake quantitative analysis of the tabular data contained in paper timetables it is necessary to translate it into a machine readable format, but no automated methods currently exist which are capable of undertaking this translation. This means that historical analysis is currently dependent on manual digitisation of timetable data, which is extremely time-consuming and therefore also extremely limited in its scope.
This PhD project will address this gap in knowledge by developing automated methods for converting scanned image files of historic timetable data into machine readable formats suitable for mapping and quantitative analysis, such as GTFS (General Transit Feed Specification) files. The analysis will be based on a case study of overnight trains in Europe, where no time series data on service development suitable for computational analysis currently exists. However, the methods developed should be more generally applicable to a range of public transport timetables. The project will investigate a range of possible methods to identify the most suitable option for this purpose, including for example the application or extension of existing OCR software, training of custom OCR models, the use of large language models, or multistep processes combining different models. Depending on the speed of progress with digitisation, the project could also include analysis of the historical development of overnight and/or long distance trains in Europe based on the output GTFS datasets.
Applicants must be able to demonstrate based on their previous experience the capability to undertake the analysis required for this project. Proficiency in Python is essential, and experience of using machine learning techniques would be desirable. Familiarity with TensorFlow (or similar), knowledge of railway systems in Europe and the ability to read paper railway timetables would all be an advantage but are not essential.
Funding notes:
A fully funded studentship is available for UK applicants covering fees and stipend. Applicants require a minimum of a good 2:1 degree at undergraduate level in a relevant subject area.
Type / Role:
Subject Area(s):
Location(s):