Preparing for Climate Change with an AI Assistant
TL;DR
In this article, we explore how to create a conversational AI agent using climate change data from the excellent Probable Futures API and the new OpenAI Assistants API. The AI agent is able to answer questions about how climate might affect a specified location and also perform basic data analysis. AI assistants can be well-suited to tasks like this, providing a promising channel for presenting complex data to non-technical users.
I was recently chatting with a neighbor about how climate change might affect us and how best to prepare homes for extreme weather events. There are some amazing websites that provide information related to this in map form, but I wondered if sometimes people might simply want to ask questions like "How will my home be affected by climate change?" and "What can I do about it?" and get a concise summary with tips on how to prepare. So I decided to explore some of the AI tools made available in the last few weeks.
Open AI's Assistant API
AI agents powered by large language models like GPT-4 are emerging as a way for people to interact with documents and data through conversation. These agents interpret what the person is asking, call APIs and databases to get data, generate and run code to carry out analysis, before presenting results back to the user. Brilliant frameworks like langchain and autogen are leading the way, providing patterns for easily implementing agents. Recently, OpenAI joined the party with their launch of GPTs as a no-code way to create agents, which I explored in this article. These are designed very well and open the way for a much wider audience but they do have a few limitations. They require an API with an openapi.json specification, which means they don't currently support standards such as graphql. They also don't support the ability to register functions, which is to be expected for a no-code solution but can limit their capabilities.
Enter OpenAI's other recent launch – Assistants API.
Assistants API (in beta) is a programmatic way to configure OpenAI Assistants which supports functions, web browsing, and knowledge retrieval from uploaded documents. The functions are a big difference compared to GPTs, as these enable more complex interaction with external data sources. Functions are where Large Language Models (LLMs) like GPT-4 are made aware that some user input should result in a call to a code function. The LLM will generate a response in JSON format with the exact parameters needed to call the function, which can then be used to execute locally. To see how they work in detail with OpenAI, see here.
A Comprehensive API for Climate Change – Probable Futures
For us to be able to create an AI agent to help with preparing for Climate Change, we need a good source of climate change data and an API to extract that information. Any such resource must apply a rigorous approach to combine General Circulation Model (GCM) predictions.
Luckily, the folks at Probable Futures have done an amazing job!

Probable Futures is "A non-profit climate literacy initiative that makes practical tools, stories, and resources available online to everyone, everywhere.", and they provide a series of maps and data based on the CORDEX-CORE framework, a standardization for climate model output from the REMO2015 and REGCM4 regional climate models. [ Side note: I am not affiliated with Probable Futures ]
Importantly, they provide a GraphQL API for accessing this data which I could access after requesting an API key.
Based on the documentation I created functions which I saved into a file assistant_tools.py
…
pf_api_url = "https://graphql.probablefutures.org"
pf_token_audience = "https://graphql.probablefutures.com"
pf_token_url = "https://probablefutures.us.auth0.com/oauth/token"
def get_pf_token():
client_id = os.getenv("CLIENT_ID")
client_secret = os.getenv("CLIENT_SECRET")
response = requests.post(
pf_token_url,
json={
"client_id": client_id,
"client_secret": client_secret,
"audience": pf_token_audience,
"grant_type": "client_credentials",
},
)
access_token = response.json()["access_token"]
return access_token
def get_pf_data(address, country, warming_scenario="1.5"):
variables = {}
location = f"""
country: "{country}"
address: "{address}"
"""
query = (
"""
mutation {
getDatasetStatistics(input: { """
+ location
+ """
warmingScenario: """" + warming_scenario + """"
}) {
datasetStatisticsResponses{
datasetId
midValue
name
unit
warmingScenario
latitude
longitude
info
}
}
}
"""
)
print(query)
access_token = get_pf_token()
url = pf_api_url + "/graphql"
headers = {"Authorization": "Bearer " + access_token}
response = requests.post(
url, json={"query": query, "variables": variables}, headers=headers
)
return str(response.json())
I intentionally excluded datasetId
in order to retrieve all indicators so that the AI agent has a wide range of information to work with.
The API is robust in that it accepts towns and cities as well as full addresses. For example …
get_pf_data(address="New Delhi", country="India", warming_scenario="1.5")
Returns a JSON record with climate change information for the location …
{'data': {'getDatasetStatistics': {'datasetStatisticsResponses': [{'datasetId': 40601, 'midValue': '17.0', 'name': 'Change in total annual precipitation', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40616, 'midValue': '14.0', 'name': 'Change in wettest 90 days', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40607, 'midValue': '19.0', 'name': 'Change in dry hot days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40614, 'midValue': '0.0', 'name': 'Change in snowy days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40612, 'midValue': '2.0', 'name': 'Change in frequency of "1-in-100-year" storm', 'unit': 'x as frequent', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40101, 'midValue': '28.0', 'name': 'Average temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40901, 'midValue': '4.0', 'name': 'Climate zones', 'unit': 'class', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {'climateZoneName': 'Dry semi-arid (or steppe) hot'}}, {'datasetId': 40613, 'midValue': '49.0', 'name': 'Change in precipitation "1-in-100-year" storm', 'unit': 'mm', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40701, 'midValue': '7.0', 'name': 'Likelihood of year-plus extreme drought', 'unit': '%', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40702, 'midValue': '30.0', 'name': 'Likelihood of year-plus drought', 'unit': '%', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40704, 'midValue': '5.0', 'name': 'Change in wildfire danger days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40703, 'midValue': '-0.2', 'name': 'Change in water balance', 'unit': 'z-score', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40201, 'midValue': '21.0', 'name': 'Average nighttime temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40205, 'midValue': '0.0', 'name': 'Freezing days', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40301, 'midValue': '71.0', 'name': 'Days above 26°C (78°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40302, 'midValue': '24.0', 'name': 'Days above 28°C (82°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40303, 'midValue': '2.0', 'name': 'Days above 30°C (86°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40102, 'midValue': '35.0', 'name': 'Average daytime temperature', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40103, 'midValue': '49.0', 'name': '10 hottest days', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40104, 'midValue': '228.0', 'name': 'Days above 32°C (90°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40105, 'midValue': '187.0', 'name': 'Days above 35°C (95°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40106, 'midValue': '145.0', 'name': 'Days above 38°C (100°F)', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40202, 'midValue': '0.0', 'name': 'Frost nights', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40304, 'midValue': '0.0', 'name': 'Days above 32°C (90°F) wet-bulb', 'unit': 'days', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40305, 'midValue': '29.0', 'name': '10 hottest wet-bulb days', 'unit': '°C', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40203, 'midValue': '207.0', 'name': 'Nights above 20°C (68°F)', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}, {'datasetId': 40204, 'midValue': '147.0', 'name': 'Nights above 25°C (77°F)', 'unit': 'nights', 'warmingScenario': '1.5', 'latitude': 28.6, 'longitude': 77.2, 'info': {}}]}}}
Creating an OpenAI Assistant
Next, we need to build the AI assistant using the beta API. There are some good resources in the documentation and also the very useful OpenAI Cookbook. However, being so new and in beta, there isn't that much information around yet so at times it was a bit of trial and error.
First, we need to configure tools the assistant can use such as the function to get climate change data. Following the documentation …
get_pf_data_schema = {
"name": "get_pf_data",
"parameters": {
"type": "object",
"properties": {
"address": {
"type": "string",
"description": ("The address of the location to get data for"),
},
"country": {
"type": "string",
"description": ("The country of location to get data for"),
},
"warming_scenario": {
"type": "string",
"enum": ["1.0", "1.5", "2.0", "2.5", "3.0"],
"description": ("The warming scenario to get data for. Default is 1.5"),
}
},
"required": ["address", "country"],
},
"description": """
This is the API call to the probable futures API to get predicted climate change indicators for a location
""",
}
You'll notice we've provided text descriptions for each parameter in the function. From experimentation, this seems to be used by the agent when populating parameters, so take care to be as clear as possible and to note any idiosyncracies so the LLM can adjust. From this we define the tools …
tools = [
{
"type": "function",
"function": get_pf_data_schema,
}
{"type": "code_interpreter"},
]
You'll notice I left code_interpretor in, giving the assistant the ability to run code needed for data analysis.
Next, we need to specify a set of user instructions (a system prompt). These are absolutely key in tailoring the assistents's performance to our task. Based on some quick experimentation I arrived at this set …
instructions = """
"Hello, Climate Change Assistant. You help people understand how climate change will affect their homes"
"You will use Probable Futures Data to predict climate change indicators for a location"
"You will summarize perfectly the returned data"
"You will also provide links to local resources and websites to help the user prepare for the predicted climate change"
"If you don't have enough address information, request it"
"You default to warming scenario of 1.5 if not specified, but ask if the user wants to try others after presenting results"
"Group results into categories"
"Always link to the probable futures website for the location using URL and replacing LATITUDE and LONGITUDE with location values: https://probablefutures.org/maps/?selected_map=days_above_32c&map_version=latest&volume=heat&warming_scenario=1.5&map_projection=mercator#9.2/LATITUDE/LONGITUDE"
"GENERATE OUTPUT THAT IS CLEAR AND EASY TO UNDERSTAND FOR A NON-TECHNICAL USER"
"""
You can see I've added instructions for the assistant to provide resources such as websites to help users prepare for climate change. This is a bit ‘Open', for a production assistant we'd probably want tighter curation of this.
One wonderful thing that's now possible is we can also instruct regarding general tone, in the above case requesting that output is clear to a non-technical user. Obviously, all of this needs some systematic prompt engineering, but it's interesting to note how we now ‘Program' in part through persuasion.