My Telegram calendar bot converts natural language inputs into calendar invites. It currently uses GPT-3.5 Turbo and I wanted to see how the recently launched GPT-4o mini compares. Not only is the new model 60% cheaper, but it should also be more intelligent. Here’s a breakdown of the journey and the notable differences between these models.
The Evolution
- Version 1: Basic hardcoded text parser with limited functionality. It worked quite well and I’m still using it as a fallback if the API has errors
- Version 2: Upgraded to GPT-3.5 Turbo. It can now extract event titles from pretty much anything. However, the date parsing required a lot of hacking to get it accurate with relative dates.
- Version 3: The new GPT 4-o mini
Parsing dates
Let’s examine how these models handle relative time without the additional tweaks I used for GPT-3.5 Turbo.
Example JSON Output:
{
"date": "2022-01-01",
"time": "14:00",
"event_title": "Lunch with friends",
"event_description": "",
"event_location": "Sushi Töölö",
"duration_in_minutes": 60
}
- This JSON output is used to generate a calendar event that's shared with people.
- The date key can only be of the following format: "2022-01-01".
- Estimate event duration
- Only add description if it provides extra context
Current date: 2024-07-19
Input: EMMA museum 12:00 today
JSON Output:
GPT 3.5 Turbo
{
"date": "2024-07-21",
"time": "12:00",
"event_title": "EMMA museum visit",
"event_description": "",
"event_location": "EMMA museum",
"duration_in_minutes": 120
}
GPT 4o Mini
{
"date": "2024-07-20",
"time": "12:00",
"event_title": "Visit to EMMA Museum",
"event_description": "A day out at the Espoo Museum of Modern Art",
"event_location": "EMMA Museum",
"duration_in_minutes": 120
}
The difference is drastic. GPT-3.5 Turbo misinterpreted the date (Sunday, not Saturday) and provided minimal event details. The new model easily handles relative days and even added a helpful event description.
Verifying results with random dates
import os
from openai import OpenAI
import json
from datetime import datetime, timedelta
import random
client = OpenAI()
def get_random_date():
"""Generate a random date in 2024"""
return datetime(2024, 1, 1) + timedelta(days=random.randint(0, 365))
def format_date(date):
"""Format the date as YYYY-MM-DD."""
return date.strftime("%Y-%m-%d")
def call_gpt(model, prompt):
"""Call the OpenAI API with the given model and prompt."""
try:
response = client.chat.completions.create(
model=model,
messages=[
{
"role": "system",
"content": "You are a helpful assistant that creates calendar events from natural language input.",
},
{"role": "user", "content": prompt},
],
temperature=0,
)
return json.loads(response.choices[0].message.content)
except json.JSONDecodeError:
return None
def check_response(response, target_date):
"""Check if the response date matches the target date and is a Friday."""
if response and "date" in response:
try:
response_date = datetime.strptime(response["date"], "%Y-%m-%d")
return response_date.weekday() == 4 # 4 represents Friday (0 is Monday)
except ValueError:
return False
return False
def main():
models = {"gpt-3.5-turbo": "GPT-3.5-Turbo", "gpt-4o-mini": "GPT-4o-Mini"}
stats = {
"total": 0,
"gpt-3.5-turbo": 0,
"gpt-4o-mini": 0,
}
for i in range(100):
random_date = get_random_date()
prompt = f"""
Example JSON Output:
{{
"date": "2022-01-01",
"time": "14:00",
"event_title": "Lunch with friends",
"event_description": "",
"event_location": "Sushi Töölö",
"duration_in_minutes": 60
}}
- This JSON output is used to generate a calendar event that's shared with people.
- The date key can only be of the following format: "2022-01-01".
- Estimate event duration
- Only add description if it provides extra context
Current date: {random_date}
Input: EMMA museum 12:00 Friday
JSON Output:"""
stats["total"] += 1
for model_id, model_name in models.items():
response = call_gpt(model_id, prompt)
is_correct = check_response(response, random_date)
try:
response_date = datetime.strptime(response["date"], "%Y-%m-%d")
parsed_weekday_humanized = response_date.strftime("%A")
# print(f"Week date: {response_date.weekday()}")
except Exception:
response_date = None
parsed_weekday_humanized = None
if is_correct:
stats[model_id] += 1
print(
f"- {model_name}: {is_correct} - {parsed_weekday_humanized} - {response_date}"
)
# Print stats
print("@ Stats (%s total):" % stats["total"])
for model_id, model_name in models.items():
score_percentage = stats[model_id] / stats["total"]
score_percentage = round(score_percentage * 100, 2)
print(f"@ {model_name}: {score_percentage}%")
if __name__ == "__main__":
main()
I might have celebrated too soon. I tried this again with random dates and while the results are better, they are not as accurate as I hoped:
Model | Accuracy |
---|---|
GPT-3.5-Turbo | 54.05% |
GPT-4o-Mini | 87.84% |
So we do have to give the model a little hint. If we add “Current weekdate: {random_date_weekday_humanized}” to the prompt, we get a lot better results. These results are after 100 calls.
Model | Accuracy |
---|---|
GPT-3.5-Turbo | 90.0% |
GPT-4o-Mini | 100% |
Previous Workaround for GPT-3.5 Turbo
For reference, here’s the extra prompt previously required for GPT-3.5-Turbo:
- The date key can only be of the following format: "TODAY", "TOMORROW", "MONDAY", "TUESDAY", "WEDNESDAY", "THURSDAY", "FRIDAY", "SATURDAY", "SUNDAY", "2022-01-01". Prefer exact date if it's provided in the input. Use the current year (%s).
Additional Python code was necessary to convert relative dates to actual dates, a step no longer required with GPT-4o-Mini.