AI¶
The lilya.contrib.ai package gives Lilya a provider-agnostic integration layer for LLM applications.
It is designed for teams that want to use AI inside Lilya applications without coupling their codebase to one vendor, one SDK, or one transport style.
With this contrib package you can:
- configure a provider with typed dataclasses
- call models through one stable
AIClientAPI - switch between providers without rewriting endpoint logic
- inject the AI client into Lilya handlers with dependency injection
- stream model output token-by-token or chunk-by-chunk
- support OpenAI-compatible vendors like OpenAI, Groq, and Mistral
- support providers with different wire protocols such as Anthropic
Why This Exists¶
Most teams eventually want AI features inside their application, but the raw integration surface quickly becomes messy:
- provider-specific SDK imports leak everywhere
- request payloads differ from vendor to vendor
- streaming code gets duplicated in route handlers
- swapping vendors becomes expensive
- tests become tightly coupled to one external API format
lilya.contrib.ai solves that by separating the problem into two layers:
- Application layer
Your handlers, services, and dependencies talk to
AIClient. - Provider layer Provider adapters translate Lilya's normalized request objects into the wire format required by each vendor.
That means your Lilya code stays clean even if your infrastructure changes later.
Installation¶
Install the AI extra:
At the moment, the base AI integration uses httpx for outbound provider calls.
Supported Provider Families¶
Lilya's AI contrib intentionally distinguishes between:
1. OpenAI-compatible providers¶
These providers expose a request/response contract close to POST /v1/chat/completions.
Examples:
- OpenAI
- Groq
- Mistral
- any self-hosted or gateway provider that mirrors the same API surface
Use:
OpenAICompatibleProviderOpenAIProviderGroqProviderMistralProvider
2. Non-compatible providers¶
Some vendors use different wire formats, headers, and streaming event shapes.
Example:
- Anthropic Messages API
Use:
AnthropicProvider
Architecture¶
flowchart LR
A["Lilya Endpoint"] --> B["AI Dependency or app.state.ai"]
B --> C["AIClient"]
C --> D["PromptRequest"]
D --> E["Provider Adapter"]
E --> F["External AI API"]
F --> E
E --> G["AIResponse or AIResponseChunk"]
G --> A
Main building blocks¶
| Component | Responsibility |
|---|---|
ChatMessage |
Provider-agnostic chat message representation |
PromptRequest |
Normalized AI request object |
AIClient |
Main API used by Lilya apps |
AIResponse |
Non-streaming response object |
AIResponseChunk |
Streaming chunk object |
setup_ai() |
Attaches a configured AI client to the Lilya app |
AI |
Dependency injection helper that resolves the configured client |
| Provider classes | Translate Lilya request objects to vendor-specific HTTP payloads |
Quick Start¶
OpenAI¶
from lilya.apps import Lilya
from lilya.contrib.ai import AIClient, OpenAIConfig, OpenAIProvider, setup_ai
provider = OpenAIProvider(
OpenAIConfig(
api_key="your-openai-key",
)
)
client = AIClient(
provider,
default_model="gpt-4o-mini",
default_system_prompt="You are a concise assistant for Lilya users.",
)
app = Lilya()
setup_ai(app, client=client)
Groq¶
from lilya.contrib.ai import AIClient, GroqConfig, GroqProvider
provider = GroqProvider(
GroqConfig(
api_key="your-groq-key",
)
)
client = AIClient(provider, default_model="llama-3.3-70b-versatile")
Mistral¶
from lilya.contrib.ai import AIClient, MistralConfig, MistralProvider
provider = MistralProvider(
MistralConfig(
api_key="your-mistral-key",
)
)
client = AIClient(provider, default_model="mistral-small-latest")
Anthropic¶
from lilya.contrib.ai import AIClient, AnthropicConfig, AnthropicProvider
provider = AnthropicProvider(
AnthropicConfig(
api_key="your-anthropic-key",
)
)
client = AIClient(provider, default_model="claude-sonnet-4-20250514")
Full Application Example¶
If you want to see the full shape in one place, this is the most useful mental model:
- configure a provider
- build one shared
AIClient - attach it with
setup_ai() - inject it into routes with
AI
from lilya.apps import Lilya
from lilya.contrib.ai import AI, AIClient, ChatMessage, OpenAIConfig, OpenAIProvider, setup_ai
provider = OpenAIProvider(
OpenAIConfig(
api_key="your-openai-key",
timeout=20.0,
)
)
client = AIClient(
provider,
default_model="gpt-4o-mini",
default_system_prompt="You are a helpful assistant for Lilya applications.",
)
app = Lilya()
setup_ai(app, client=client)
@app.post("/summarize", dependencies={"ai": AI})
async def summarize(ai: AI):
result = await ai.chat(
[
ChatMessage.user(
"Summarize the following incident in three bullets: "
"API latency increased after deployment. Error rate stayed low. "
"A cache misconfiguration caused most of the slowdown."
)
]
)
return {
"provider": result.provider,
"model": result.model,
"text": result.text,
}
This gives you one central AI integration point for the entire app instead of scattering provider calls across handlers.
Core Types¶
ChatMessage¶
Use ChatMessage to build provider-neutral conversations.
from lilya.contrib.ai import ChatMessage
messages = [
ChatMessage.system("You are a release note generator."),
ChatMessage.user("Summarize the latest deployment."),
]
Convenience constructors are available:
ChatMessage.system(...)ChatMessage.user(...)ChatMessage.assistant(...)ChatMessage.tool(...)
PromptRequest¶
PromptRequest is the normalized request shape sent from AIClient to providers.
It contains:
messagesmodelsystem_prompttemperaturemax_tokenstop_pstop_sequencesmetadataextra
You usually do not instantiate PromptRequest directly in application code. AIClient builds it for you.
AIResponse¶
Returned for non-streaming calls.
Fields:
textmodelproviderfinish_reasonusageraw
AIResponseChunk¶
Returned during streaming.
Fields:
textdeltamodelproviderfinish_reasonraw
Configuration Dataclasses¶
Shared base config¶
Base fields:
api_keybase_urltimeoutheaders
OpenAI-compatible config¶
Additional fields:
provider_nameorganizationproject
Provider-specific convenience configs¶
Available convenience configs:
OpenAIConfigGroqConfigMistralConfigAnthropicConfig
AnthropicConfig also includes:
anthropic_versiondefault_max_tokens
Using AIClient¶
Simple prompt¶
result = await client.prompt(
"Write a short release note for the latest deployment.",
)
print(result.text)
Chat conversation¶
from lilya.contrib.ai import ChatMessage
result = await client.chat(
[
ChatMessage.system("You are a customer support assistant."),
ChatMessage.user("A customer cannot log in after resetting their password."),
]
)
print(result.text)
Override model or request settings¶
result = await client.prompt(
"Create three title options for a product page.",
model="gpt-4o-mini",
temperature=0.9,
max_tokens=200,
)
Pass provider-specific extras¶
The extra field lets you pass through advanced provider parameters without contaminating the common API.
result = await client.prompt(
"Return a JSON object with summary and priority.",
extra={"response_format": {"type": "json_object"}},
)
This is useful when:
- one provider supports a feature not yet normalized by Lilya
- you want to experiment without changing the contrib core
- you need provider-specific tuning knobs
Endpoint Integration Patterns¶
Most users will consume this feature through Lilya endpoints, not through standalone scripts, so here are the most practical route shapes.
1. Basic JSON endpoint¶
from lilya.contrib.ai import AI
@app.post("/rewrite", dependencies={"ai": AI})
async def rewrite(ai: AI):
result = await ai.prompt(
"Rewrite this sentence to sound more professional: "
"'hey, your payment failed, try again later.'"
)
return {"text": result.text}
Use this when:
- the caller expects one final answer
- you want a simple HTTP request/response interaction
- you do not need token streaming
2. Endpoint with explicit user input¶
In a real application, the prompt usually comes from the request body, query, or resolved domain data.
from lilya.contrib.ai import AI, ChatMessage
@app.post("/support/reply", dependencies={"ai": AI})
async def draft_support_reply(ai: AI, request):
payload = await request.json()
customer_message = payload["message"]
result = await ai.chat(
[
ChatMessage.system("You are a senior support engineer."),
ChatMessage.user(
f"Draft a helpful reply to this customer message:\\n\\n{customer_message}"
),
],
temperature=0.4,
)
return {"reply": result.text}
That pattern is usually clearer than creating raw provider payloads in the handler.
3. Endpoint with domain context¶
Most real AI features combine user input with business context from your own system.
from lilya.contrib.ai import AI, ChatMessage
@app.get("/accounts/{account_id}/health-summary", dependencies={"ai": AI})
async def account_health_summary(account_id: str, ai: AI):
metrics = {
"open_incidents": 2,
"last_deployment": "2026-04-09T10:00:00Z",
"error_rate": "0.08%",
"latency_p95": "420ms",
}
result = await ai.chat(
[
ChatMessage.system("You produce short, executive-friendly account summaries."),
ChatMessage.user(
"Write a concise health summary for this account using the following metrics: "
f"{metrics}"
),
]
)
return {"account_id": account_id, "summary": result.text}
This is the most common production pattern:
- fetch your own data first
- shape it into a prompt
- keep the AI call as the final transformation step
Startup Integration¶
Attach the AI client to your Lilya app with setup_ai().
from lilya.apps import Lilya
from lilya.contrib.ai import AIClient, OpenAIConfig, OpenAIProvider, setup_ai
provider = OpenAIProvider(OpenAIConfig(api_key="..."))
client = AIClient(provider, default_model="gpt-4o-mini")
app = Lilya()
setup_ai(app, client=client)
What setup_ai() does:
- stores the client on
app.state.ai - optionally registers startup and shutdown handlers
- keeps AI setup in one place instead of scattered across routes
Dependency Injection¶
Use the AI dependency to inject the configured AIClient into handlers.
from lilya.contrib.ai import AI, ChatMessage
@app.post("/summary", dependencies={"ai": AI})
async def generate_summary(ai: AI):
result = await ai.chat(
[ChatMessage.user("Summarize today's customer issues in three bullets.")]
)
return {"summary": result.text}
This is the recommended integration style because it:
- keeps handlers easy to test
- avoids direct provider construction inside routes
- follows the same pattern Lilya already uses for other contrib services
When to use request.app.state.ai¶
request.app.state.ai is still valid, but prefer the dependency helper in handlers.
Use app.state.ai directly when:
- you are in startup code
- you are wiring custom services
- you are outside a normal handler dependency flow
Use AI when:
- you are inside endpoints
- you want easy test overrides
- you want the clearest Lilya-native route signature
How-To Recipes¶
These are the most common things developers try to do when first integrating AI.
How to switch providers without rewriting handlers¶
Keep your handlers dependent on AIClient, not on a specific provider class.
# handler code stays the same
@app.post("/classify", dependencies={"ai": AI})
async def classify(ai: AI):
result = await ai.prompt("Classify this text as billing, support, or abuse.")
return {"label": result.text}
Only the startup wiring changes:
from lilya.contrib.ai import AIClient, GroqConfig, GroqProvider
provider = GroqProvider(GroqConfig(api_key="..."))
client = AIClient(provider, default_model="llama-3.3-70b-versatile")
setup_ai(app, client=client)
How to use a self-hosted or gateway provider¶
Use OpenAICompatibleProvider when the gateway exposes a compatible chat completions surface.
from lilya.contrib.ai import AIClient, OpenAICompatibleConfig, OpenAICompatibleProvider
provider = OpenAICompatibleProvider(
OpenAICompatibleConfig(
provider_name="internal-gateway",
api_key="gateway-token",
base_url="https://gateway.example.com/v1",
headers={"X-Workspace": "ops"},
)
)
client = AIClient(provider, default_model="meta-llama/llama-4")
How to add per-request system behavior¶
Use system_prompt= when the instruction should be request-specific.
result = await ai.prompt(
"Summarize the latest build output.",
system_prompt="Respond with no more than 4 bullet points.",
)
Use default_system_prompt on AIClient when the instruction should apply globally across the app.
How to return usage metadata to clients¶
AIResponse.usage is normalized when the provider sends token counts.
@app.post("/token-aware", dependencies={"ai": AI})
async def token_aware(ai: AI):
result = await ai.prompt("Explain what ASGI is.")
return {
"text": result.text,
"usage": {
"input_tokens": result.usage.input_tokens if result.usage else None,
"output_tokens": result.usage.output_tokens if result.usage else None,
"total_tokens": result.usage.total_tokens if result.usage else None,
},
}
How to keep request handlers small¶
Move prompt construction into a service function and keep the endpoint thin.
from lilya.contrib.ai import AIClient, ChatMessage
async def generate_release_summary(ai: AIClient, release_notes: str) -> str:
response = await ai.chat(
[
ChatMessage.system("You write concise engineering release summaries."),
ChatMessage.user(f"Summarize these notes:\\n\\n{release_notes}"),
]
)
return response.text
@app.post("/releases/summary", dependencies={"ai": AI})
async def release_summary(ai: AI, request):
payload = await request.json()
summary = await generate_release_summary(ai, payload["notes"])
return {"summary": summary}
This tends to produce cleaner handlers and much easier tests.
Streaming¶
Streaming is essential for chat UIs, assistant surfaces, terminals, and progressive rendering.
Stream from a simple prompt¶
Stream from a chat conversation¶
async for chunk in client.stream_chat(
[
ChatMessage.system("You are a pair-programming assistant."),
ChatMessage.user("Help me debug a 502 gateway error."),
]
):
print(chunk.delta, end="")
Streaming in a Lilya endpoint¶
from lilya.responses import StreamingResponse
@app.get("/stream", dependencies={"ai": AI})
async def stream_answer(ai: AI):
async def body():
async for chunk in ai.stream("Write a short changelog entry."):
if chunk.delta:
yield chunk.delta
return StreamingResponse(body(), media_type="text/plain")
Streaming chat to a browser client¶
For browser-like consumers, you will often want newline-delimited chunks or SSE-style output.
from lilya.responses import StreamingResponse
@app.get("/chat/stream", dependencies={"ai": AI})
async def chat_stream(ai: AI):
async def body():
async for chunk in ai.stream_chat(
[
ChatMessage.system("You are a debugging assistant."),
ChatMessage.user("Help me understand why a health check might flap."),
]
):
if chunk.delta:
yield f"{chunk.delta}\n"
return StreamingResponse(body(), media_type="text/plain")
The important part is that your handler decides how to frame streamed chunks for the client:
- plain text
- newline-delimited text
- SSE
- custom JSON fragments
Real-World Usage Patterns¶
1. Internal knowledge assistant¶
Use Lilya as the HTTP edge while the AI contrib handles the provider abstraction.
Flow:
- authenticate the user
- retrieve domain context from your own database or search system
- build messages with that context
- call
AIClient - return or stream the result
2. Support ticket summarization¶
ticket_text = """
Customer cannot access billing.
Payment method was updated yesterday.
They now receive a 403 in the billing portal.
"""
result = await ai.prompt(
f"Summarize this support ticket and suggest the likely next troubleshooting step:\\n\\n{ticket_text}"
)
2a. Support endpoint example¶
@app.post("/tickets/{ticket_id}/summary", dependencies={"ai": AI})
async def summarize_ticket(ticket_id: str, ai: AI):
ticket_text = """
Customer cannot access billing.
Payment method was updated yesterday.
They now receive a 403 in the billing portal.
"""
result = await ai.prompt(
"Summarize this ticket in two bullets and add one likely next action:\\n\\n"
f"{ticket_text}"
)
return {"ticket_id": ticket_id, "summary": result.text}
3. Structured draft generation¶
Use extra to pass provider-specific structured output options while keeping the rest of your code provider-neutral.
result = await ai.prompt(
"Return a JSON object with keys `title`, `priority`, and `summary` for this incident: "
"checkout latency increased after a cache flush",
extra={"response_format": {"type": "json_object"}},
)
4. Content moderation or classification front door¶
Even if your final application logic is not conversational, AIClient is still useful for:
- categorization
- summarization
- extraction
- rewriting
- intent detection
5. Coding assistants for internal tools¶
Because the API is streaming-capable and dependency-injection-friendly, it fits well for:
- internal IDE plugins
- review helpers
- changelog drafting
- support reply drafting
- documentation generation
OpenAI-Compatible Providers in Detail¶
The OpenAICompatibleProvider is one of the most important pieces of this contrib package.
It exists because many vendors intentionally mirror the OpenAI chat completions contract.
That means the same application code can often work with:
- OpenAI
- Groq
- Mistral
- compatible gateways
- self-hosted proxies exposing the same interface
Example:
from lilya.contrib.ai import AIClient, OpenAICompatibleConfig, OpenAICompatibleProvider
provider = OpenAICompatibleProvider(
OpenAICompatibleConfig(
provider_name="my-gateway",
base_url="https://llm-gateway.internal/v1",
api_key="internal-token",
headers={"X-Tenant": "alpha"},
)
)
client = AIClient(provider, default_model="meta-llama/llama-4")
This is the recommended extension path when a vendor already follows the same payload shape.
Anthropic in Detail¶
Anthropic uses a different messages API than OpenAI-compatible vendors, so Lilya ships a dedicated adapter.
Important differences:
- authentication headers differ
- the
anthropic-versionheader is required systemcontent is a top-level field- response and streaming event shapes differ
The Lilya adapter hides those differences behind the same AIClient methods.
Error Handling¶
The package exposes a small exception hierarchy:
AIErrorAIConfigurationErrorProviderNotConfiguredAIProviderErrorAIResponseError
Recommended usage:
from lilya.contrib.ai import AIProviderError
try:
result = await ai.prompt("Summarize this page.")
except AIProviderError:
return {"error": "The AI provider is currently unavailable."}
Testing¶
You do not need real provider calls in most tests.
Test at three levels:
1. Route or service tests¶
Inject a fake AIClient dependency.
2. Client-level tests¶
Use a fake provider that records PromptRequest objects.
3. Provider adapter tests¶
Mock the HTTP transport and assert:
- request path
- request headers
- request payload translation
- response parsing
- streaming behavior
This repo includes dedicated tests for all of these patterns.
Example: testing a Lilya endpoint with a fake AI client¶
from lilya.dependencies import Provide
class FakeAIClient:
async def prompt(self, prompt: str, **kwargs):
class Response:
text = "fake summary"
provider = "fake"
model = "fake-model"
usage = None
return Response()
fake_ai = FakeAIClient()
async def resolve_fake_ai(request, **kwargs):
return fake_ai
FakeAI = Provide(resolve_fake_ai)
@app.post("/summary", dependencies={"ai": FakeAI})
async def summary(ai):
result = await ai.prompt("Summarize this.")
return {"text": result.text}
This style makes route tests deterministic and avoids external provider calls.
Production Guidance¶
Keep provider setup centralized¶
Create the provider and client once during startup.
Prefer dependency injection over direct global access¶
Use AI in handlers rather than reaching into request.app.state.ai everywhere.
Use timeouts intentionally¶
Long-running generation can tie up request time if you do not set reasonable timeout values.
Stream for interactive workloads¶
If the user is waiting in a chat-like interface, prefer stream() or stream_chat().
Log provider failures at the boundary¶
The best place to log provider failures is where application context still exists:
- route handler
- service layer
- background job wrapper
Troubleshooting¶
RuntimeError: httpx is required for lilya.contrib.ai¶
Install the extra:
No model was provided¶
Set a default model on AIClient or pass model= for the request.
No AIClient configured¶
Make sure you called setup_ai(app, client=...).
The endpoint works locally but not in tests¶
Usually one of these is happening:
- the route did not declare
dependencies={"ai": AI} - the app did not call
setup_ai(...) - the test needs to override the AI dependency with a fake provider or fake client
Streaming returns nothing¶
Check:
- the provider supports streaming for that model
- your route is returning a streaming response correctly
- you are consuming
chunk.delta
The provider is compatible, but not fully¶
Use OpenAICompatibleProvider for providers that are mostly compatible, then pass vendor-specific options through extra. If the wire format is genuinely different, add a dedicated provider adapter instead of overloading the common one.
Reference Summary¶
| API | Purpose |
|---|---|
AIClient.prompt() |
Single-prompt non-streaming call |
AIClient.chat() |
Multi-message non-streaming call |
AIClient.stream() |
Single-prompt streaming call |
AIClient.stream_chat() |
Multi-message streaming call |
setup_ai() |
Register client on app state and lifecycle |
AI |
Inject configured client into Lilya handlers |
OpenAICompatibleProvider |
OpenAI-style compatible provider adapter |
AnthropicProvider |
Anthropic messages API adapter |
Provider Defaults Verified¶
The built-in defaults in this module are based on the vendors' official documentation:
- OpenAI chat completions under
https://api.openai.com/v1 - Groq OpenAI compatibility under
https://api.groq.com/openai/v1 - Mistral chat completions under
https://api.mistral.ai/v1 - Anthropic Messages API under
https://api.anthropic.com/v1/messageswithanthropic-version: 2023-06-01
Those defaults are still fully overridable through the configuration dataclasses if your deployment uses a proxy, gateway, or self-hosted compatibility layer.