Scaling Genomics in the Cloud with Microsoft Azure

A deep-dive into building scalable architectures in Azure for bioinformatics workloads.

Why cloud?

The cloud offers some key benefits for research organizations.

Why Azure?

Microsoft Azure is one of the leaders in the cloud space, providing a ton of platform services (PaaS) that allow you to create performant architectures to fit your unique needs.

What’s in the book?

Part 1: Data Platform

The first half of the book focuses on setting up the data platform architecture for housing your genomics data. This includes the creation of your genomics data lake and building out a variant data warehouse in Azure Synapse Analytics. We round off this half of the book by covering data orchestration using Azure Data Factory.

Part 2: Compute

The latter half focuses on the compute side of things, specifically how to scale your bioinformatics pipelines and machine learning capabilities. I start by covering tools like Azure Databricks and Azure Machine Learning, providing examples on how they can be used for scaling bioinformatics-specific tasks.

How can I get the book? #iwantitnow

The book is being published by O’Reilly Media this Fall and it will be available here and on Amazon. However, if you want to get your hands on the first few chapters, you can join O’Reilly’s Early Release Program!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Colby T. Ford, PhD

Cloud genomics and AI guy and aspiring polymath. I am a recovering academic from machine learning and bioinformatics and I sometimes write things here.