Large language models (LLMs) are great for generating text, but getting a structured result like JSON usually requires clever prompts and hoping the LLM understands. Fortunately, JSON Mode It is becoming more and more common in LLM frameworks and services. This allows you to define the exact output schema you want.
This post discusses constrained generation using JSON mode. We will use a complex, nested, realistic JSON schema example to guide LLM frameworks/APIs like Llama.cpp or Gemini API to generate structured data, specifically tourist location information. This builds on a previous post on constrained generation using Guidebut focuses on the more widely adopted JSON mode.
Although more limited than GuideThe broader support for JSON mode makes it more accessible, especially with cloud-based LLM providers.
During a personal project, I found that while JSON mode was straightforward with Llama.cpp, getting it to work with the Gemini API required a few extra steps. In this post, I share those solutions to help you use JSON mode effectively.
Our example schema represents a TouristLocation
It is a non-trivial structure with nested objects, lists, enumerations, and various data types such as strings and numbers.
Here is a simplified version:
{
"name": "string",
"location_long_lat": ("number", "number"),
"climate_type": {"type": "string", "enum": ("tropical", "desert", "temperate", "continental", "polar")},
"activity_types": ("string"),
"attraction_list": (
{
"name": "string",
"description": "string"
}
),
"tags": ("string"),
"description": "string",
"most_notably_known_for": "string",
"location_type": {"type": "string", "enum": ("city", "country", "establishment", "landmark", "national park", "island", "region", "continent")},
"parents": ("string")
}
You can write this kind of schema by hand or you can generate it using the Pydantic library. Here we show you how to do it with a simplified example:
from typing import List
from pydantic import BaseModel, Fieldclass TouristLocation(BaseModel):
"""Model for a tourist location"""
high_season_months: List(int) = Field(
(), description="List of months (1-12) when the location is most visited"
)
tags: List(str) = Field(
...,
description="List of tags describing the location (e.g. accessible, sustainable, sunny, cheap, pricey)",
min_length=1,
)
description: str = Field(..., description="Text description of the location")
# Example usage and schema output
location = TouristLocation(
high_season_months=(6, 7, 8),
tags=("beach", "sunny", "family-friendly"),
description="A beautiful beach with white sand and clear blue water.",
)
schema = location.model_json_schema()
print(schema)
This code defines a simplified version of TouristLocation
Data class used by Pydantic. It has three fields:
high_season_months
: A list of integers representing the months of the year (1 to 12) in which the location is most visited. The default is an empty list.tags
:A list of strings describing the location with labels such as “accessible”, “sustainable”, etc. This field is required (...
) and must have at least one element (min_length=1
).description
: String field containing a textual description of the location. This field is also required.
The code then creates an instance of the TouristLocation
Class and uses model_json_schema()
to get the JSON schema representation of the model. This schema defines the structure and data types expected for this class.
model_json_schema()
returns:
{'description': 'Model for a tourist location',
'properties': {'description': {'description': 'Text description of the '
'location',
'title': 'Description',
'type': 'string'},
'high_season_months': {'default': (),
'description': 'List of months (1-12) '
'when the location is '
'most visited',
'items': {'type': 'integer'},
'title': 'High Season Months',
'type': 'array'},
'tags': {'description': 'List of tags describing the location '
'(e.g. accessible, sustainable, sunny, '
'cheap, pricey)',
'items': {'type': 'string'},
'minItems': 1,
'title': 'Tags',
'type': 'array'}},
'required': ('tags', 'description'),
'title': 'TouristLocation',
'type': 'object'}
Now that we have our schema, let's see how we can implement it. First in Llama.cpp with its Python wrapper and second using the Gemini API.
Llama.cpp, a C++ library for running Llama models locally. It's great for beginners and has an active community. We'll be using it through its Python wrapper.
Here's how to generate TouristLocation
data with him:
# Imports and stuff# Model init:
checkpoint = "lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF"
model = Llama.from_pretrained(
repo_id=checkpoint,
n_gpu_layers=-1,
filename="*Q4_K_M.gguf",
verbose=False,
n_ctx=12_000,
)
messages = (
{
"role": "system",
"content": "You are a helpful assistant that outputs in JSON."
f"Follow this schema {TouristLocation.model_json_schema()}",
},
{"role": "user", "content": "Generate information about Hawaii, US."},
{"role": "assistant", "content": f"{location.model_dump_json()}"},
{"role": "user", "content": "Generate information about Casablanca"},
)
response_format = {
"type": "json_object",
"schema": TouristLocation.model_json_schema(),
}
start = time.time()
outputs = model.create_chat_completion(
messages=messages, max_tokens=1200, response_format=response_format
)
print(outputs("choices")(0)("message")("content"))
print(f"Time: {time.time() - start}")
The code first imports the necessary libraries and initializes the LLM model. It then defines a list of messages for a conversation with the model, including a system message telling the model to generate output in JSON format according to a specific schema, requests for user information about Hawaii and Casablanca, and a response from the assistant using the specified schema.
Llama.cpp uses context-free grammars to constrain the structure and generate valid JSON output for a new city.
At the output we get the following generated string:
{'activity_types': ('shopping', 'food and wine', 'cultural'),
'attraction_list': ({'description': 'One of the largest mosques in the world '
'and a symbol of Moroccan architecture',
'name': 'Hassan II Mosque'},
{'description': 'A historic walled city with narrow '
'streets and traditional shops',
'name': 'Old Medina'},
{'description': 'A historic square with a beautiful '
'fountain and surrounding buildings',
'name': 'Mohammed V Square'},
{'description': 'A beautiful Catholic cathedral built in '
'the early 20th century',
'name': 'Casablanca Cathedral'},
{'description': 'A scenic waterfront promenade with '
'beautiful views of the city and the sea',
'name': 'Corniche'}),
'climate_type': 'temperate',
'description': 'A large and bustling city with a rich history and culture',
'location_type': 'city',
'most_notably_known_for': 'Its historic architecture and cultural '
'significance',
'name': 'Casablanca',
'parents': ('Morocco', 'Africa'),
'tags': ('city', 'cultural', 'historical', 'expensive')}
Which can then be parsed into an instance of our Pydantic class.
The Gemini API, the LLM service run by Google, states in its documentation that it offers limited JSON mode support for Gemini Flash 1.5. However, it can be made to work with some tweaks.
Here are the general instructions to get it working:
schema = TouristLocation.model_json_schema()
schema = replace_value_in_dict(schema.copy(), schema.copy())
del schema("$defs")
delete_keys_recursive(schema, key_to_delete="title")
delete_keys_recursive(schema, key_to_delete="location_long_lat")
delete_keys_recursive(schema, key_to_delete="default")
delete_keys_recursive(schema, key_to_delete="default")
delete_keys_recursive(schema, key_to_delete="minItems")print(schema)
messages = (
ContentDict(
role="user",
parts=(
"You are a helpful assistant that outputs in JSON."
f"Follow this schema {TouristLocation.model_json_schema()}"
),
),
ContentDict(role="user", parts=("Generate information about Hawaii, US.")),
ContentDict(role="model", parts=(f"{location.model_dump_json()}")),
ContentDict(role="user", parts=("Generate information about Casablanca")),
)
genai.configure(api_key=os.environ("GOOGLE_API_KEY"))
# Using `response_mime_type` with `response_schema` requires a Gemini 1.5 Pro model
model = genai.GenerativeModel(
"gemini-1.5-flash",
# Set the `response_mime_type` to output JSON
# Pass the schema object to the `response_schema` field
generation_config={
"response_mime_type": "application/json",
"response_schema": schema,
},
)
response = model.generate_content(messages)
print(response.text)
Here's how to overcome Gemini's limitations:
- Replace
$ref
with complete definitions: Gemini stumbles upon schema references ($ref
). These are used when you have a nested object definition. Replace them with the full definition from your schema.
def replace_value_in_dict(item, original_schema):
# Source: https://github.com/pydantic/pydantic/issues/889
if isinstance(item, list):
return (replace_value_in_dict(i, original_schema) for i in item)
elif isinstance(item, dict):
if list(item.keys()) == ("$ref"):
definitions = item("$ref")(2:).split("/")
res = original_schema.copy()
for definition in definitions:
res = res(definition)
return res
else:
return {
key: replace_value_in_dict(i, original_schema)
for key, i in item.items()
}
else:
return item
- Remove unsupported keys: Gemini does not yet handle keys like “title”, “AnyOf” or “minItems”. Please remove them from your schema. This results in a less readable and less restrictive schema, but we have no other choice if we insist on using Gemini.
def delete_keys_recursive(d, key_to_delete):
if isinstance(d, dict):
# Delete the key if it exists
if key_to_delete in d:
del d(key_to_delete)
# Recursively process all items in the dictionary
for k, v in d.items():
delete_keys_recursive(v, key_to_delete)
elif isinstance(d, list):
# Recursively process all items in the list
for item in d:
delete_keys_recursive(item, key_to_delete)
- Request one or more times for enumerations: Sometimes Gemini has trouble with enumerations, as it displays all possible values instead of a single selection. Values are also separated by “|” in a single chain, making them invalid Following our outline, use pointed prompts, providing a correctly formatted example, to guide him toward the desired behavior.
By applying these transformations and providing clear examples, you can successfully generate structured JSON output with the Gemini API.
JSON mode allows you to get structured data directly from your LLMs, making it more useful for practical applications. While frameworks like Llama.cpp offer straightforward implementations, you may run into issues with cloud services like Gemini API.
Hopefully, this blog gave you a better practical understanding of how JSON mode works and how you can use it even when using the Gemini API, which is only partially supported so far.
Now that I've managed to get Gemini to work somewhat with JSON mode, I can complete the implementation of my LLM workflow where it is necessary to have data structured in a specific way.
You can find the main code for this post here: https://gist.github.com/CVxTz/8eace07d9bd2c5123a89bf790b5cc39e