SAIL Databank: A World-Class Trusted Research Environment (TRE)

SAIL alt logo on white transparent
HCRW logo copy
UKRI ESRC logo

What is SAIL Databank?

SAIL stands for Secure Anonymised Information Linkage. The SAIL Databank is a world-class flagship for the robust secure storage and use of anonymised person-based data for research to improve health, wellbeing and public services. Originally a repository of health data, SAIL’s data assets now include a range of administrative datasets to create even greater opportunities to build rich longitudinal cohorts for research. Recognised internationally as one of the broadest and best-characterised population databanks, SAIL hosts comprehensive data about the Welsh population and is increasingly being entrusted to manage data representing UK populations. Backed and endorsed by the Government, the SAIL Databank receives core funding from the Welsh Government’s Health and Care Research Wales and UK Research and Innovation.

What challenges needed to be overcome by SAIL?

There were many challenges to address in setting out to establish the SAIL Databank. When it started in 2007 there were few other countries with data linkage infrastructures from which to learn, with the exception of notable examples in Australia and Canada. Consequently, even dialogues on this topic with government, professional bodies, health and social care organisations, regulators and the public were often pioneering activities. So pragmatically, the work of the SAIL Databank began as a pilot in one local authority area in the West Wales region.

  • Global Access. Until 2009, access to SAIL data could only take place onsite using dedicated terminals under the supervision of SAIL Databank staff. Although this was adequate at first, it wasn’t sufficient to meet the increasing demands for SAIL data.
  • Secure Data Repository. SAIL needed to create a central data repository as opposed to a distributed, federated data access model, to maximise data utility, maintain data quality and to manage a governance system of strict controls for data access and analysis.
  • High Performance Computing. SAIL needed an IT system that was sufficiently stable to cope with data access and processing at source, and able to develop technical processes to minimise the demand on data providers transferring their data to SAIL. SAIL also requires the storage and handling of complex data types, such genomic, imaging and free-text data for emerging areas of research.
  • Analytical Tools. A solution was required by SAIL that could support a range of analytical software to meet the various data provider requirements and to offer the tools most familiar to data users to maximise efficiency and utility.
  • Strong Governance. SAIL’s ‘Privacy-by-design’ model is an important concept to ensure that an appropriate set of control measures are built in, as opposed to ‘bolt on’ solutions, that’s applied at all operational stages but maintains flexibility and upgradability.
  • Data Linkage. A cornerstone to SAIL’s success, SAIL required an extremely accurate data linkage solution to link together its extensive range of population data to identify patterns across entire populations to give a much broader picture.
  • How does SeRP provide solutions to these challenges?

    The early SeRP technology stack helped to develop the SAIL Gateway as a remote access technology and analysis platform. SeRP’s remote access technology enables approved researchers to use data within the SAIL Databank virtual environment from their own desktop, wherever they are in the world. The protocols in place allow user authentication and monitoring, and prevent the alteration or removal of SAIL data by users.

    SeRP’s technical controls provide carefully designed Information Governance to ensure person-based data with high privacy risk is managed to the highest ISO 27001 standards. ISO 27001 is an internationally recognised best-practice standard for an information Security Management System (ISMS). The result is that SAIL Databank is entrusted by organisations like the NHS and the Office for National Statistics and is endorsed by the British Medical Association.

    The SAIL privacy-by-design model encompasses a suite of physical, technical and procedural controls applied to the data and the data environment. SeRP’s data access controls ensure that researchers are only allowed to view the data and not to have data extracted from the Databank.

    SeRP’s high-powered data management and sharing technology, is infinitely scalable to suit a range of use cases including imaging, genomics and analysis of free text that supports SAIL’s breadth of data types. This environment, as well as supporting a customisable suite of tools for data analysis, offers computing boosts to remote desktops and storage, and facilitate projects involving AI, Machine Learning and Natural Language Processing.

    The ability to link multiple sources of data together that relate to e.g. a particular individual, a geographical location or an event, brought a new dimension to answering research questions. SeRP offers award winning LinXmart data linkage software allowing researchers to use existing collections of extensive data that have been routinely collected and stored securely to identify patterns across entire populations. This state-of-the-art data linkage mechanism also has the added advantage of highlighting any irregularities and improving the quality and consistency of data for research.

    SAIL’s Impact During COVID-19

    Owing to its wealth of data, well established governance protocols, agility to respond to public health crises – all underpinned by SeRP’s advanced technology – SAIL Databank has been an important tool to support the efforts to tackle the COVID-19 pandemic…

    SAIL Databank has facilitated a secure, anonymised data pipeline to deliver information from a new COVID-19 symptom tracking app into the NHS, supporting the response to the pandemic.

    A new International COVID-19 Data Research Alliance and Workbench to support the rapid development of therapies to combat the global effects of COVID-19. Established by Health Data Research (HDR) UK and partners, following funding announced by the COVID-19 Therapeutics Accelerator and the Gates Foundation.

    The new research study investigates the risks of COVID-19 on black, Asian and minority ethnic (BAME) healthcare workers has been launched, after evidence has emerged that higher proportions of associated deaths within these groups were recorded – more than twice that of the white population.

    Research found no cases of the rare blood disorder in the COVID-19 vaccinated population of Wales. A rapid evaluation of the Welsh healthcare data was undertaken to respond to an urgent request for information on COVID vaccine-related blood clots.

    More information and technical details on the SAIL Databank/SeRP partnership are available from the following publication,