What does ChatGPT think about using the Azure cloud for genomics?

Colby T. Ford, Ph.D.
6 min readJan 13

In this quick post, I asked ChatGPT a few things about the cloud and how it may be useful for genomics. Spoiler alert: ChatGPT seems to have read my content 😎. Its responses sound very much like the answers I would give.

Query 1: What are the top three reasons the cloud should be used for genomics research?

1. Scalability: The cloud allows for the storage and processing of large genomic data sets, which can be difficult to handle using on-premises infrastructure. This can help researchers analyze more data, faster.

2. Accessibility: The cloud makes it easy for researchers to access and share genomic data from anywhere, which can help to facilitate collaboration and increase the speed of discovery.

3. Cost-efficiency: Using the cloud can help researchers reduce the costs associated with storing and processing large amounts of data. Additionally, cloud providers often offer pay-as-you-go pricing models, which can help researchers manage their budget more effectively.

In my book, Genomics in the Azure Cloud, I say that the top three things are Scalability, Collaboration (which seems like “Accessibility” above), and Automation. So, maybe there are really 4 top things?

Query 2: How could the Azure cloud give a life science organization a competitive advantage in genomics research?

Azure cloud provides several features that can give a life science organization a competitive advantage in genomics research, including:

High-performance computing: Azure offers a range of high-performance computing options, such as Azure CycleCloud and Azure N-series virtual machines, that can accelerate large-scale genomics research and analysis.

Big Data analytics: Azure offers a range of big data analytics tools, such as Azure Synapse Analytics and Azure Data Lake Storage, that allow researchers to process, store and analyze large genomics data sets.

Machine Learning and AI: Azure offers a range of machine learning and AI tools, such as Azure Machine Learning and Azure Cognitive Services, that can be used to develop predictive models and identify patterns in genomics data.

Compliance and Security: Azure provides compliance and security features such as Azure Key Vault, Azure Policy, Azure Active Directory, and Azure Security Center, that can help organizations meet regulatory and compliance requirements and protect sensitive genomics data.

Collaboration and sharing: Azure offers a range of collaboration and sharing tools such as Azure DevOps and Azure SharePoint that can help researchers share and collaborate on genomics data and projects.

These features can help life science organizations to improve their genomics research by providing more powerful computing, better data management, more accurate analysis and more secure data collaboration.

Query 3: What is lacking from the Azure cloud that still needs to be developed for bioinformatics and genomics workloads?

While Azure cloud offers a range of features that can support bioinformatics and genomics workloads, there are still some areas that could be further developed:

Interoperability: There may be a lack of standardization and interoperability between the different tools and services offered by Azure, which can make it difficult for researchers to integrate and analyze data from different sources.

Data privacy and security: The handling, storage and sharing of sensitive genomics data requires a high level of data privacy and security, and there may be areas where Azure’s current offerings could be further developed to better meet these needs.

Workflow management: Bioinformatics and genomics research often involves complex data processing workflows that can be difficult to manage and monitor. Azure could benefit from more advanced workflow management tools to make it easier for researchers to manage and monitor these workflows.

Cost-efficient and flexible pricing: Bioinformatics and genomics research can consume high amount of data and computation resources, Azure could benefit from more cost-efficient and flexible pricing options to meet the needs of researchers.

Domain-specific services: Azure offers a wide range of services, but there may be areas where domain-specific services and tools tailored specifically to bioinformatics and genomics research could be beneficial.

These are the current areas that Azure could develop to improve its bioinformatics and genomics workloads capabilities. However, it’s worth noting that Azure is constantly updating and introducing new services, so it is always worth checking to see if they have already addressed these areas.

I disagreed with some of the things in the response to Question 3, so I asked it a more pointed question. (Specifically, I disagree with the “Data privacy and security” item above as Azure has top-notch security and compliance offerings.) I leaned into the “Domain-specific services” idea to see what it thought Microsoft should add into Azure for genomics.

Question 4: Are there any services that you think Microsoft should add to Azure specifically for bioinformatics and genomics?

Here are a few services that Microsoft could add to Azure specifically for bioinformatics and genomics:

Genomic data management services: Azure could benefit from services that are specifically designed for the management, storage, and sharing of genomic data, such as data warehousing, data lineage tracking, and data annotation services.

Genomic data analysis services: Azure could benefit from services that are specifically designed for the analysis of genomic data, such as variant calling, annotation, and functional analysis.

Clinical genomics services: Azure could benefit from services that are specifically designed for the analysis of clinical genomics data, such as services for analyzing electronic health records (EHRs), integrating genomics data with clinical data, and developing predictive models.

Drug discovery services: Azure could benefit from services that are specifically designed for drug discovery and drug repurposing, such as services for protein-ligand binding prediction, drug-target interaction prediction, and drug-induced adverse event prediction.

Workflow management and integration services: Azure could benefit from services that are specifically designed for the management and integration of workflows in bioinformatics and genomics research, such as services for creating, scheduling, and monitoring workflows, and services for integrating workflows with other Azure services.

These are just a few examples of services that Microsoft could add to Azure specifically for bioinformatics and genomics research, but it’s worth noting that the field is rapidly evolving and there may be other services that would be beneficial as well.

My Thoughts?

I appreciated the answers to Question 4. I agree that Microsoft could do a little work in operationalizing some services that would help researchers scale their workloads easier. In other words, reduce the barrier to entry for bioinformaticians that don’t happen to also be cloud architects.

Obviously, workflow systems like Azure Data Factory (or integration pipelines in Azure Synapse) can work well for moving data about and parsing text-based files, but are not necessarily for building end-to-end bioinformatics workloads. Tools like Azure Machine Learning and Azure Databricks will allow you to write custom, complex pipelines, but it requires some knowledge of those services to get it working. Adding bioinformatics support modules to “simpler” services like Azure Logic Apps or Power Automate could easily enable researchers to do things in the cloud without learning a ton of nuanced service intricacies.

There is a Microsoft Genomics service in Azure that was developed by Microsoft Research. It is basically a cloud version of a GATK pipeline by the Broad institute. However, it’s underused and a bit specific to those who exclusively use GATK stuff. In my experience, I’ve always cobbled together various open source tools as part of a pipeline, but not all of it is Broad-approved. On the topic of “Genomic data analysis services” above, it would be nice to see broader support for mixing and matching popular tools that would support a broader range of bioinformatics.

One thing I found interesting is that “Compliance and Security” is mentioned in Question 3 as an advantage and then again in Question 4 with “Data privacy and security” as lacking in Azure. So, this shows that ChatGPT doesn’t show any permanence in its opinion on this matter.

In general, I really agree with a lot of what ChatGPT is saying. As I hinted at the beginning, a lot of it seems like the content I’ve written in the past. It’s nice that it shares my opinions. 😅

If you want to try ChatGPT for yourself, visit https://chat.openai.com.

What else should we ask ChatGPT concerning genomics in the cloud?

Stay Curious…

Colby T. Ford, Ph.D.

Cloud genomics and AI guy and aspiring polymath. I am a recovering academic from machine learning and bioinformatics and I sometimes write things here.