TL;DR
In this article, we explore how to build a conversational ai agent using climate change data from the excellent Probable futures API and the new OpenAI Wizards API. The ai agent can answer questions about how the weather might affect a specific location and also perform basic data analysis. ai assistants may be well suited for tasks like this, providing a promising channel for presenting complex data to non-technical users.
I was recently chatting with a neighbor about how climate change could affect us and how best to prepare homes for extreme weather events. There are some amazing websites that provide information related to this in the form of maps, but I was wondering if sometimes people would just want to ask questions like “How will my home be affected by climate change?” and “What can I do about it?” and get a concise summary with tips on how to prepare. So I decided to explore some of the ai tools that have become available in recent weeks.
ai agents powered by large language models like GPT-4 are emerging as a way for people to interact with documents and data through conversation. These agents interpret what the person asks, call APIs and databases to obtain data, generate and execute code to perform analysis, before presenting the results to the user. Bright frames like long chain and autogenous are leading the way, providing easy-to-deploy agent patterns. Recently, OpenAI joined the party with the release of GPT as a no-code way to create agents, which I explored in This article. They are very well designed and open the way to a much wider audience, but they have some limitations. They require an API with an openapi.json specification, which means they do not currently support standards such as graphicql. They also do not support the ability to register functions, which is to be expected in a no-code solution, but may limit their capabilities.
Enter OpenAI’s other recent release: Wizard API.
Assistants API (in beta) is a programmatic way to configure OpenAI Assistants that supports functions, web browsing, and knowledge retrieval from uploaded documents. Functions make a big difference compared to GPTs as they allow for more complex interaction with external data sources. Functions are where large language models (LLMs) like GPT-4 realize that some user input should result in a call to a code function. The LLM will generate a response in JSON format with the exact parameters needed to call the function, which can then be used to execute locally. To see how they work in detail with OpenAI, see here.
In order for us to create an ai agent to help us prepare for climate change, we need a good source of climate change data and an API to extract that information. Any such resource must apply a rigorous approach to combining General Circulation Model (GCM) predictions.
Luckily, the people of Probable futures You have done an amazing job!
Probable futures is “A nonprofit climate literacy initiative that makes practical tools, stories, and resources available online to everyone, everywhere.”, and provide a series of maps and data based on the CORDEX-CORE framework, a standardization for the results of the REMO2015 and REGCM4 regional climate models. (Side note: I am not affiliated with Probable Futures)
It is important to note that they provide a GraphQL API to access this data that I could access later requesting an API key.
based on the documentation I created functions that I saved to a file. assistant_tools.py
…
pf_api_url = "https://graphql.probablefutures.org"
pf_token_audience = "https://graphql.probablefutures.com"
pf_token_url = "https://probablefutures.us.auth0.com/oauth/token"def get_pf_token():
client_id = os.getenv("CLIENT_ID")
client_secret = os.getenv("CLIENT_SECRET")
response = requests.post(
pf_token_url,
json={
"client_id": client_id,
"client_secret": client_secret,
"audience": pf_token_audience,
"grant_type": "client_credentials",
},
)
access_token = response.json()("access_token")
return access_token
def get_pf_data(address, country, warming_scenario="1.5"):
variables = {}
location = f"""
country: "{country}"
address: "{address}"
"""
query = (
"""
mutation {
getDatasetStatistics(input: { """
+ location
+ """ \
warmingScenario: \"""" + warming_scenario + """\"
}) {
datasetStatisticsResponses{
datasetId
midValue
name
unit
warmingScenario
latitude
longitude
info
}
}
}
"""
)
print(query)
access_token = get_pf_token()
url = pf_api_url + "/graphql"
headers = {"Authorization": "Bearer " + access_token}
response = requests.post(
url, json={"query": query, "variables": variables}, headers=headers
)
return str(response.json())
I intentionally excluded it datasetId
to retrieve all indicators so that the ai agent has a wide range of information to work with.
The API is robust because it accepts towns and cities as well as full addresses. For example …
get_pf_data(address="New Delhi", country="India", warming_scenario="1.5")
Returns a JSON record with climate change information for the location…
{'data': {'getDatasetStatistics': {'datasetStatisticsResponses': ({'datasetId': 40601, 'midValue': '17.0', 'name': 'Change in total annual precipitation', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40616, 'midValue': '14.0', 'name': 'Change in wettest 90 days', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40607, 'midValue': '19.0', 'name': 'Change in dry hot days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40614, 'midValue': '0.0', 'name': 'Change in snowy days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40612, 'midValue': '2.0', 'name': 'Change in frequency of “1-in-100-year” storm', 'unit': 'x as frequent', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40101, 'midValue': '28.0', 'name': 'Average temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40901, 'midValue': '4.0', 'name': 'Climate zones', 'unit': 'class', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {'climateZoneName': 'Dry semi-arid (or steppe) hot'}}, {'datasetId': 40613, 'midValue': '49.0', 'name': 'Change in precipitation “1-in-100-year” storm', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40701, 'midValue': '7.0', 'name': 'Likelihood of year-plus extreme drought', 'unit': '%', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40702, 'midValue': '30.0', 'name': 'Likelihood of year-plus drought', 'unit': '%', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40704, 'midValue': '5.0', 'name': 'Change in wildfire danger days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40703, 'midValue': '-0.2', 'name': 'Change in water balance', 'unit': 'z-score', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40201, 'midValue': '21.0', 'name': 'Average nighttime temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40205, 'midValue': '0.0', 'name': 'Freezing days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40301, 'midValue': '71.0', 'name': 'Days above 26°C (78°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40302, 'midValue': '24.0', 'name': 'Days above 28°C (82°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40303, 'midValue': '2.0', 'name': 'Days above 30°C (86°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40102, 'midValue': '35.0', 'name': 'Average daytime temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40103, 'midValue': '49.0', 'name': '10 hottest days', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40104, 'midValue': '228.0', 'name': 'Days above 32°C (90°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40105, 'midValue': '187.0', 'name': 'Days above 35°C (95°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40106, 'midValue': '145.0', 'name': 'Days above 38°C (100°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40202, 'midValue': '0.0', 'name': 'Frost nights', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40304, 'midValue': '0.0', 'name': 'Days above 32°C (90°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40305, 'midValue': '29.0', 'name': '10 hottest wet-bulb days', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40203, 'midValue': '207.0', 'name': 'Nights above 20°C (68°F)', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40204, 'midValue': '147.0', 'name': 'Nights above 25°C (77°F)', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}})}}}
Next, we need to create the ai assistant using the beta API. There are some good resources at the documentation and also very useful OpenAI Cookbook. However, being so new and in beta, there is no that There was still a lot of information, so it was a bit of trial and error at times.
First, we need to set up tools that the wizard can use, such as the function to get data on climate change. Following the documentation …
get_pf_data_schema = {
"name": "get_pf_data",
"parameters": {
"type": "object",
"properties": {
"address": {
"type": "string",
"description": ("The address of the location to get data for"),
},
"country": {
"type": "string",
"description": ("The country of location to get data for"),
},
"warming_scenario": {
"type": "string",
"enum": ("1.0", "1.5", "2.0", "2.5", "3.0"),
"description": ("The warming scenario to get data for. Default is 1.5"),
}},
"required": ("address", "country"),
},
"description": """
This is the API call to the probable futures API to get predicted climate change indicators for a location
""",
}
You’ll notice that we’ve provided text descriptions for each function parameter. From experimentation, the agent seems to use this when populating parameters, so be careful to be as clear as possible and note any idiosyncrasies so the LLM can adjust. From this we define the tools…
tools = (
{
"type": "function",
"function": get_pf_data_schema,
}
{"type": "code_interpreter"},
)
You’ll notice that I left code_interpretor, giving the wizard the ability to run the code needed for data analysis.
Next, we must specify a set of instructions for the user (a system message). These are absolutely key to adapting the performance of the attendees to our task. Based on quick experimentation I came up with this set…
instructions = """
"Hello, Climate Change Assistant. You help people understand how climate change will affect their homes"
"You will use Probable Futures Data to predict climate change indicators for a location"
"You will summarize perfectly the returned data"
"You will also provide links to local resources and websites to help the user prepare for the predicted climate change"
"If you don't have enough address information, request it"
"You default to warming scenario of 1.5 if not specified, but ask if the user wants to try others after presenting results"
"Group results into categories"
"Always link to the probable futures website for the location using URL and replacing LATITUDE and LONGITUDE with location values: https://probablefutures.org/maps/?selected_map=days_above_32c&map_version=latest&volume=heat&warming_scenario=1.5&map_projection=mercator#9.2/LATITUDE/LONGITUDE"
"GENERATE OUTPUT THAT IS CLEAR AND EASY TO UNDERSTAND FOR A NON-TECHNICAL USER"
"""
You can see that I added instructions for the wizard to provide resources, such as websites, to help users prepare for climate change. This is a bit “open”, for a production assistant we would probably want a tighter selection of this.
One wonderful thing that is now possible is that we can also instruct on the general tone, in the former case requesting that the result be clear to a non-technical user. Obviously, all of this requires some rapid, systematic engineering, but it’s interesting to note how we now “program” in part through persuasion.
Ok, now we have our tools and instructions, let’s create the wizard…
import os
from openai import AsyncOpenAI
import asyncio
from dotenv import load_dotenv
import sysload_dotenv()
api_key = os.environ.get("OPENAI_API_KEY")
assistant_id = os.environ.get("ASSISTANT_ID")
model = os.environ.get("MODEL")
client = AsyncOpenAI(api_key=api_key)
name = "Climate Change Assistant"
try:
my_assistant = await client.beta.assistants.retrieve(assistant_id)
print("Updating existing assistant ...")
assistant = await client.beta.assistants.update(
assistant_id,
name=name,
instructions=instructions,
tools=tools,
model=model,
)
except:
print("Creating assistant ...")
assistant = await client.beta.assistants.create(
name=name,
instructions=instructions,
tools=tools,
model=model,
)
print(assistant)
print("Now save the DI in your .env file")
The above assumes that we have defined keys and our agent ID in a .env
archive. You’ll notice that the code first checks if the agent exists using the ASSISTANT_ID
in it .env
file and update it if so; otherwise, it creates a new agent and the generated ID must be copied to the .env
archive. Without this, I was creating a LOT of assistants!
Once the wizard is created, it becomes visible in the ai open user interface where you can try it Playground. Since most development and debugging is related to function calls actually calling code, I didn’t find Pitch very useful for this analysis, but it is very well designed and could be useful in other work.
For this analysis, I decided to use the new GPT-4-Turbo model configuring model
to “gpt-4–1106-preview”.
We want to be able to create a complete chatbot, so I started with this chain lit cookbook exampletweaking it slightly to separate the agent code into a dedicated file and access via…
import assistant_tools as at
Chainlit is very concise and the user interface is easy to configure; you can find the application code here.
Putting it all together – see code here — we start the agent with a simple chainlit run app.py
…
Let’s ask for a location…
Pointing out above that I intentionally misspelled Mombasa.
Then the agent starts its work, calls the API and processes the JSON response (took about 20 seconds)…
According to our instructions, then finish with…
But is it correct?
Let’s call the API and check the result…
get_pf_data(address="Mombassa", country="Kenya", warming_scenario="1.5")
Query the API with…
mutation {
getDatasetStatistics(input: {
country: "Kenya"
address: "Mombassa"
warmingScenario: "1.5"
}) {
datasetStatisticsResponses{
datasetId
midValue
name
unit
warmingScenario
latitude
longitude
info
}
}
}
This gives the following (truncated to show just a few)…
{
"data": {
"getDatasetStatistics": {
"datasetStatisticsResponses": (
{
"datasetId": 40601,
"midValue": "30.0",
"name": "Change in total annual precipitation",
"unit": "mm",
"warmingScenario": "1.5",
"latitude": -4,
"longitude": 39.6,
"info": {}
},
{
"datasetId": 40616,
"midValue": "70.0",
"name": "Change in wettest 90 days",
"unit": "mm",
"warmingScenario": "1.5",
"latitude": -4,
"longitude": 39.6,
"info": {}
},
{
"datasetId": 40607,
"midValue": "21.0",
"name": "Change in dry hot days",
"unit": "days",
"warmingScenario": "1.5",
"latitude": -4,
"longitude": 39.6,
"info": {}
},
{
"datasetId": 40614,
"midValue": "0.0",
"name": "Change in snowy days",
"unit": "days",
"warmingScenario": "1.5",
"latitude": -4,
"longitude": 39.6,
"info": {}
},
{
"datasetId": 40612,
"midValue": "1.0",
"name": "Change in frequency of \u201c1-in-100-year\u201d storm",
"unit": "x as frequent",
"warmingScenario": "1.5",
"latitude": -4,
"longitude": 39.6,
"info": {}
},.... etc
}
)
}
}
}
By performing a random check, it appears that the agent captured them perfectly and presented the user with an accurate summary.
The ai agent can be improved with some instructions on how it presents information.
One of the instructions was to always generate a link to the map display on the Probable Futures website, which when clicked goes to the correct location…
Another instruction asked the agent to always prompt the user to try other warm-up scenarios. By default, the agent produces results for a predicted global temperature rise of 1.5°C, but we allow the user to explore other (and somewhat depressing) scenarios.
Since we gave the ai agent the ability to interpret code, it should be able to execute Python code to perform basic data analysis. Let’s try this.
I first asked how climate change would affect London and New York, to which the agent provided summaries. So I asked…
This resulted in the Agent using a code interpreter to generate and execute Python code to create a plot…
Nothing bad!
Using the Probable Futures API and an OpenAI wizard, we were able to create a conversational interface showing how people could ask questions about climate change and get advice on how to prepare. The agent was able to make API calls and perform some basic data analysis. This offers another channel for climate awareness, which may be more attractive to some non-technical users.
Of course, we could have developed a chatbot to determine intent/entities and code to handle the API, but this requires more work and should be reviewed for any changes to the API and when new APIs are added. Additionally, a large language model agent does a good job of interpreting user input and summary with very limited development, and takes things to another level by being able to execute code and perform basic data analysis. Our particular use case seems particularly suited for an ai agent because the task is limited in scope.
However, there are some challenges: The technique is a bit slow (queries took 20-30 seconds to complete). Additionally, the costs of LLM tokens were not analyzed for this article and may be prohibitive.
That said, the OpenAI Assistants API is in beta. Additionally, the agent was not tuned in any way and so with more work it is likely that additional features can be optimized for common tasks, performance and cost for this exciting new technique.
This article is based on data and other content made available by Probable Futures, a project of the SouthCoast Community Foundation and some of that data may have been provided to Probable Futures by Woodwell Climate Research Center, Inc. or The Coordinated Regional Climate Downscaling Experiment ( CORDEX). )
The code for this analysis can be found here.
You can find more of my articles. here.