Press coverage highlights from story:
- PublicTechnology.net, Kings College turns to IBM for genetic research
After being awarded a new academic research status as a Biomedical Research Centre in 2006, Kings College London and Guy’s and St Thomas’ NHS Foundation Trust needed to bolster their High Performance Computing capabilities. Their genome sequencing work required a higher functioning cluster and more stable storage solution. The Centre chose OCF to design, deploy and manage its new IBM-based solution. The result has been a reduction from days to hours for data analysis, revolutionising the way the research team works together.
In December 2006, Guy’s and St Thomas’ NHS Foundation Trust, along with its academic partner King’s College London, was awarded the right to become one of five new National Institutes for Health Research (NIHR) comprehensive Biomedical Research Centres. The Centres have a strong focus on ‘translational research’, which take advances in medical research out of the lab and into the clinical setting.
“The sequence of the human genome has been known for ten years now so we are using new sequencing technologies to sequence specific regions of the genome in large numbers of people in order to help understand the contributory factors to a variety of common complex disorders and developmental defects such as bone diseases and cancer,” says Dr. Rebecca Oakey, Reader in Epigenetics, Department of Medical & Molecular Genetics, School of Medicine, King’s College London.
The new BRC Centre at Guy’s Hospital in London had a genomics facility equipped with instruments for genomic sequencing, genotyping and gene expression studies. At that time, the facility housed three Illumina Genome Analyser IIx (GA IIx) Sequencers, which collectively generated up to 50 billion base pairs (A, T, G, C’s) of useable DNA sequence data (the equivalent of 17 human genomes) every 10 days. This is around 400GB of useable data (the equivalent of 585 CDs).
The genomics facility was using an Illumina Pipeline Analysis Server to provide the processing power to analyse the genomics data generated by the three Illumina sequencing machines. The facility also made use of multiple local PCs and servers spread across different research groups. As demands for server power continued to increase across all research groups, the current disparate, siloed architecture meant researchers were constrained by the processing power available to them. In addition, administration of storage, back up and archiving of analysed data was logistically very difficult. At times, identifying what data was stored and then backed up, how, how often, where, i.e. memory stick or PC, or by whom was a challenge.
In April 2010, medical researchers working across all research groups at the facility took ownership of a central, shared server cluster and storage system to replace the Illumina Pipeline Analysis Server and the multiple servers and local PCs. The system’s bespoke design, rapid implementation and configuration is handled by data processing, data management and data storage provider, OCF.
As the BRC cluster and storage began to prove itself amongst the research teams, usage of it by different groups began to grow. There are now around 10-15 research groups using the sequencing hardware and the cluster. Some research groups are just starting into next generation sequencing and some are more experienced.
In 2011, the BRC increased its count of sequencing machines to five. The two, new additional sequencers (Illumina HiSeq 2000) creates five times more data (longer read length with higher read quality) than the existing GAIIx sequencers, around 1Tb of data per sequencing run, every 10 days. This means research projects are being completed with greater accuracy, and also creates a need for a larger on site storage capacity.
The server and storage cluster enables researchers to more quickly analyse data generated during their quest to understand the role of genetics in a range of common health issues. The system can reduce the time necessary to analyse sequencing data by 20 fold or more. What used to take days to analyse on the Illumina Pipeline Analysis Server, now takes just hours.
Julian Fielden, managing director, OCF. says: “The comprehensive Biomedical Research Centre is a great example of an organisation that acknowledges data on its own delivers little or no value; organisations must analyse and take value from their data as quickly as possible so that the findings can be translated to improved patient care at the earliest opportunity. In many cases, this analysis is best performed using a server cluster.”
For the administration team, the new system enables them to take control and deliver effective storage, back-up and archiving of data. Don says: “We are currently using around 50% of our 180Tb of storage. We can now automate data back-up and are doing so daily. Due to user error, we have had to restore some files. This was easy and painless. We can also manually archive data in accordance with our policies. We have two separate archives, one on-site and one off-site which reduces risk of data loss from system malfunction, disaster or other unplanned event. We will ‘run out’ of storage space within the next 4-5 months, so we’re actively looking to increase our storage capacity with additional secondary (slower), storage (around 500TBs).