Deploying Azure OpenAI and Building a Custom Science Article Recommender App with ChatGPT

Colby T. Ford, PhD
6 min readMar 22

--

A few days ago, my friend and former colleague, Sam Edelstein, posted a really cool project where he used ChatGPT to recommend books from his GoodReads list.

I’ve been working on revisions to a new SARS-CoV-2 paper (which you can read here). In doing so, finding papers to cite that are directly related to the paper you’re working on can be challenging. So, I thought it would be interesting if ChatGPT could recommend additional scientific articles that I could read/cite based on an input list of other papers I’m citing.

Today, I’ll be showing you how to deploy OpenAI in Azure and use ChatGPT in a Streamlit app to build your own custom recommender.

Deploying Azure OpenAI

From the Azure Portal, click + Create a Resource and search for “OpenAI” in the Marketplace.

Alternatively, just go here: https://portal.azure.com/#create/Microsoft.CognitiveServicesOpenAI

This will open the form to deploy the OpenAI service. Fill in your Subscription, Resource group, Region, and service Name info and click Review + Create.

If you see the following message, click the link and fill out the form to request access to the OpenAI service. I had to do this and they got back to be in under 24 hours.

Once the deployment is complete, you can navigate to the service, which will give you a few links to tutorials, etc.

For the recommender app we’re going to build, we’ll need a few pieces of info.

Navigate to the Keys and Endpoint screen and copy your key (KEY 1 or KEY 2) and Endpoint to a notepad for use in a bit.

Next, on the Model deployments screen, create a new `gpt-35-turbo` model. This will create a GPT-3.5 model that we will use in our app.

You can name yours anything you like. Copy the model deployment name to your notepad.

That’s it! Now we can start making our app.

Creating a Streamlit application

I’ve expanded/adapted my friend Sam’s Streamlit code to work with Azure’s OpenAI service rather than the OpenAI-hosted ChatGPT API. You’ll want to clone this repository: GitHub — colbyford/scipapers_chatgpt

My SciPapers app recommends new papers that a researcher could cite based on an input list of other papers and a title of the current research.

In this repository, you’ll see a scipapersgpt.py file and a credentials.yaml file.

Credentials

In the credentials.yaml file, update the values to the endpoint, key, and model name you gathered in the last section.

---
# Azure OpenAI - API Connection Information
## Referenced from: https://github.com/Azure/openai-samples/blob/main/ChatGPT/chatGPT_managing_conversation.ipynb
api_type : azure
## The base URL for your Azure OpenAI resource. e.g. "https://<your resource name>.openai.azure.com"
api_base: https://<resource>.openai.azure.com/
## The API key for your Azure OpenAI resource.
api_key: 12345abcdef
## Currently, the only option that is available is: 2022-12-01
api_version: '2022-12-01'
## The name of your deployed model
chatgpt_model_name: gpt-35_deployment_001

scipapersgpt.py

In the scipapersgpt.py, no modifications are necessary unless you want to adapt this app and make it your own. I won’t bore you with details on making a Streamlit app, but I’ll link to some resources at the bottom. However, I’ll go through the rather magical/hacky way we turn a general ChatGPT-like model into something specific to our needs.

In this app, I have users input a CSV file with paper titles (in the “Title” column) and websites (in the “URL” column). There is an articles.csv example file in the repo.

In the Streamlit code, we’re using an OpenAI Python library that will make it easier to send inputs to the GPT-based model.

We first create a concatenated list of paper titles from the CSV input and then add this into a prompt that will automatically get sent the GPT model.

## Load in CSV
articles_df = pd.read_csv(articles)
articles_df = articles_df[["Title", "URL"]]
articles_df["Source"] = f'{articles_df["Title"]} (From: {articles_df["URL"]})'

## Create Title-URL text (concatenated)
articles_read = ', '.join(articles_df['Source'].astype(str).values.flatten())

## Create question
question = '''Based on this list of my previously read scientific articles, please recommend other journal articles I can cite for a paper titled "''' +
paper_title + '''" and paste a link to the article so I can read it.
Here is my list of journal articles I have read: ''' + articles_read

Next, we provide some context to the AI system so that it knows what its persona is and what we’re expecting as a response.

base_system_message = "You are a helpful assistant that recommends scientific journal articles to read based on lists."
system_message = f"<|im_start|>system\n{base_system_message.strip()}\n<|im_end|>"

messages = [
{
"sender": "user",
"text": question
},
{
"sender": "user",
"text": "Please list in bullet points the recommended journal articles, their URLs, and why you are recommending them"
}
]

As you can see, we add in some information to provide some stability in the responses. These queries aren’t shown to the user, but help ensure the model returns helpful information for our specific use case.

Lastly, we create a helper function to combine the messages with the input query (that contains our list of papers) and format it for the model.

## Make Prompt object
def create_prompt(system_message, messages):
prompt = system_message
for message in messages:
prompt += f"\n<|im_start|>{message['sender']}\n{ message['text']}\n<|im_end|>"
prompt += "\n<|im_start|>assistant\n"
return (prompt)

prompt = create_prompt(system_message, messages)

Using the OpenAI Python library, we can submit this prompt to the Completion part of the API.

Note that we could also tweak the temperature (randomness) and length of the output (number of tokens). The more tokens we use, the more Azure will charge us to use the API (a few cents/1k tokens).

## Run OpenAI Code
response = openai.Completion.create(
prompt=prompt,
engine=chatgpt_model_name,
temperature=0.5,
max_tokens=500,
top_p=0.9,
frequency_penalty=0,
presence_penalty=0,
stop=['<|im_end|>']
)

answer = response["choices"][0].text

Running the `streamlit run .\scipapersgpt.py` command from your terminal will allow you to run the SciPapers tool from localhost:8501.

Now you can import your articles from a CSV, tell the tool what the title of your working paper is (or just what your research project is about) and it will generate a few article recommendations for you to read!

Pretty cool, huh?. I’ve used this extensively when testing the app for this post and it worked exceptionally well.

Imagine the other use cases for a tool like this. Using a few leading context prompts totally transforms the GPT model’s behavior to meet your needs.

Now it’s your turn. If you make something unique with the Azure OpenAI service, share it with me on LinkedIn!

Resources

Stay curious…

--

--

Colby T. Ford, PhD

Cloud genomics and AI guy and aspiring polymath. I am a recovering academic from machine learning and bioinformatics and I sometimes write things here.