Ph.D. Research Project proposal - Optimising Access to CERN’s

Queen's University Belfast - Exabyte-Scale Data Archive

This Ph.D. is offered under CERN’s Doctoral Student Programme1,2. The principal supervisor is from the student’s home institute, with a second supervisor at CERN. The student will be funded for a maximum of 36 months to conduct the research at CERN, near Geneva, Switzerland.

Keywords: Data Storage, Data Archiving, Hierarchical Storage Management, Tape Libraries, High-Performance Computing, Complexity, Optimization, Algorithms.

The High Energy Physics (HEP) experiments at CERN generate a deluge of data which must be efficiently archived for later retrieval and analysis3. The custodial copy of the data is stored on magnetic tape in CERN’s Tier-0 Data Centre. The total volume of archived data is currently in excess of 200 Petabytes4 and will exceed one Exabyte within about five years.

The data stored on tape at CERN is an active archive: data are recorded over a long period of time to be analyzed later and any data item can be accessed at any time. Tape remains a popular medium for archival systems because it is more cost-effective than disk, has huge capacity and is very durable. The main drawback of tape systems is the high time-to-first-byte latency. When data is requested, a robotic picker moves the cartridge from its slot to an available tape drive, loads it and threads and winds the tape, a process which can take several minutes.

Research Problems

The CERN data storage system must scale with the anticipated growth in data volume. Efficient use of the tape libraries can be viewed as an optimization problem with limited resources (libraries/ drives/robot pickers) and many free parameters (no. of tapes per library, location of the tapes within a library, tape mount policy, caching policy, etc.). This suggests several research areas which could be investigated:

Data-to-Tier Assignment and Caching In a Hierarchical Storage Management (HSM) system, disk storage is typically used as cache for the tape archive. Research topics could include the data-to-tier assignment and cache eviction strategies.

Data Colocation and Cartridge Placement Data created at the same time are often accessed together. Correlations between datasets could be exploited to optimize the placement of cartridges in tape library slots or to build a predictive model for a prefetch policy.

Mount Policy The Mount Policy determines whether tape mount requests should be granted or deferred, to optimize access to data and avoid unnecessary mounts. The research could investigate optimal mount policies.

Combinatorial Optimization Problems Several of the problems outlines above are computationally intractable, 
but a near-optimal solution can be found by placing constraints on the problem space, by providing an approximate solution, 
or by using simulations.

1CERN (https://home.cern/) is the European Organization for Nuclear Research, home to the Large Hadron

Collider and birthplace of the World Wide Web. CERN was founded in 1954 and has 22 member states.

2https://jobs.web.cern.ch/join-us/doctoral-student-programme

3http://cern.ch/go/hRC7

4http://cern.ch/go/7TVs

GENERAL INFORMATION:

This is a 3 year PhD studentship, potentially funded by the Northern Ireland Department for Education (DfE) with eligibility for both fees and maintenance (£14,786 - 2018/19). This will be topped up to CHF 44,148 (£33K approx.) while the student is resident at CERN. Applicants must have been resident in the UK for at least 3 years or those EU residents who have lived permanently in the UK for the 3 years immediately preceding the start of the studentship. Non UK residents who hold EU residency may also apply but if successful may receive fees and top-up only.

The student may take up to a maximum of four years to complete their Ph.D., of which three years would be based at CERN.

Contact details:

Supervisor Name: Professor Dimitrios Nikolopoulos  Tel: +44 28 9097 4620 Email: wd.nikolopoulos@qub.ac.uk

Web: http: https://www.qub.ac.uk/schools/eeecs/Research/

QUB Address:

Queens University Belfast

School of Electronics Electrical Engineering and Computer Science            

High Performance and Distributed Computing

Computer Science Building

01.007, 14-18 Malone Road

Belfast BT9 5BN

Deadline for submission of applications is 23rd February 2018.

Applicants should apply electronically through the Queen’s online application portal at: https://dap.qub.ac.uk/portal/