Paradigm Shift in Genetic Data Analysis and Connectivity
Imagine a child with a bad cough that takes a sudden turn for the worse. You watch with helpless foreboding as he/ she is whisked off to the emergency room in the infectious diseases unit of your local hospital. There are questions streaming down your mind, what is causing this? Is it viral or bacterial? Can the doctors help? Is your child safe? What tests are the doctors doing? An epidemic may or may not be at your door. Is there another child in Atlanta or Ecuador with the same bug? Organizations like the CDC and WHO need to know as soon as possible to avoid a public health emergency. They are monitoring in real time across the world and are continually innovating on policies for a timely and effective response. Genetic Analysis tools providers like Thermo Fisher Scientific (us) make the enabling technologies and instruments to detect and identify of pathogens and other disease conditions, generating the raw data that defines the disease.
Even a few years ago, while our instruments generated the data, the organization of the data was local to the researcher. The analysis solutions were desktop-based. This severely limited the computation, storage, and collation needs that are vital to discovery. Researchers resorted to using huge Excel spreadsheets and painfully mapped the associations to find meaningful insights. It took days to weeks of precious time to analyze information. Particularly in the infectious disease scenario, with the rapidly evolving strains, this was a great roadblock to responding in time to the emergency. People begun to realize that an incredible amount of data needs to be stitched together quickly and in a time sensitive manner to tease out an understanding of the scientific truths of interest. This typically translates to millions of records in databases requiring sophisticated algorithmic processing, cross-application analysis, interactive visualizations, and infrastructure for collaboration.
The writing was on the wall: A biotechnologist needs unlimited storage, compute, memory and scalability to do science at the pace and scale commensurate with the genetic analysis data deluge happening today.
What we saw for the Future?
The writing was on the cloud! We made the decision to move all our analysis to the cloud. We aimed to give every scientist across the world unlimited storage, compute, memory and scalability; essentially providing them a supercomputer on their desk. We envisioned an eco-system with genetic analysis results from large numbers of subjects complete with meta-data about the experiments needed to make context relevant conclusions. Partners across the globe can contribute and leverage from that information source. And it will be possible to monitor the evolving fingerprint and footprint of disease in real time, empowering health organizations to react to disease trajectories in unprecedented ways. We wanted the reality where when your child visits that hospital wing with a bad cough, his blood analysis results can already be correlated to other places it has been reported, and the best possible information is accessible to his doctors maximizing his chances for survival. This vision is no longer far-fetched but a reality in the making.
In the mere span of eighteen months we have more than 10 apps on the Thermo Fisher Cloud, presenting solutions to analyze the data from CE (Capillary Electrophoresis) and qPCR (quantitative Polymerase Chain Reaction) instruments for gene expression, genotyping and Sanger sequencing. The challenge before us was providing a scalable, cost-effective solution that could handle 1000’s of samples, each interrogated at 1000’s of gene loci, translating to millions of records of raw signals. In terms of the architectural requirements, this translated to solution along four dimension of complexity:
• Storage – Ability to store millions of records
• Compute – Real time analysis of huge data sets
• Performance – Needed a response time of 2-3 seconds while manipulating complex visualizations
• Scalability – Platform to scale for thousands of users and millions of studies.
Our offering is built on top of the AWS (Amazon Web Service) platform, the largest cloud-computing platform in the world. The solution we have on top of AWS’s service framework required several iterations of architecting and engineering to deliver on the final goal of a seamless experience, from running the samples on our instruments to analysis, to collate other available data with results of that analysis, to generating scientific insights.
Our Solution Architecture
The architecture diagram shows the services from Amazon that were deployed in meeting the needs prescribed by our use case scenarios. The diagram alongside ties the elements of our use case to technological solutions we chose to deploy. Thermo Fisher Cloud is a now a PAAS (Platform as a Service) that enables customers to manage, store, analyze and share data effortlessly as we usher in a new era of data analysis power for our customers.
The cloud solutions are revolutionizing the way we do science. Graph below shows performance improvements, and timesaving we have already achieved across our qPCR and CE applications. What was often weeks’ worth of work is a coffee away with 10X improvements to the performance. The cloud is enabling studies across data volumes that were not possible to analyze together before. And these results can be reliably and securely shared, cross-analyzed, collated, and cross-correlated with ease, paving the way for validating findings of interest and turning them to breakthrough insights.
Track Evolution of Disease by Collating All Related Discoveries in Real Time
Not only is this revolutionizing the way we do science, it is allowing us to diversify R&D dollars into newer and more radical initiatives as we drive down our operating costs for development and sustaining. Deploying software updates, sharing, reconciling the differences between versions is easily maintainable. License administration becomes convenient. And the built-in visibility into our software usage patterns, failure modes is allowing us to learn about our customers’ conscious and unconscious needs, positioning us to innovate on the next generation of product offerings with maximum chances of success.
Of course, we have only just begun. A talk was given by the Software team at the recent AWS re: Invent Conference, the premier cloud conference in the world with over 13,000 attendees. Visit to learn how cloud computing is enhancing our customer workflows by optimizing applications with orders of magnitude improvement in performance and scalability.