AI¶

The lilya.contrib.ai package gives Lilya a provider-agnostic integration layer for LLM applications.

It is designed for teams that want to use AI inside Lilya applications without coupling their codebase to one vendor, one SDK, or one transport style.

With this contrib package you can:

configure a provider with typed dataclasses
call models through one stable AIClient API
switch between providers without rewriting endpoint logic
inject the AI client into Lilya handlers with dependency injection
stream model output token-by-token or chunk-by-chunk
support OpenAI-compatible vendors like OpenAI, Groq, and Mistral
support providers with different wire protocols such as Anthropic

Why This Exists¶

Most teams eventually want AI features inside their application, but the raw integration surface quickly becomes messy:

provider-specific SDK imports leak everywhere
request payloads differ from vendor to vendor
streaming code gets duplicated in route handlers
swapping vendors becomes expensive
tests become tightly coupled to one external API format

lilya.contrib.ai solves that by separating the problem into two layers:

Application layer Your handlers, services, and dependencies talk to AIClient.
Provider layer Provider adapters translate Lilya's normalized request objects into the wire format required by each vendor.

That means your Lilya code stays clean even if your infrastructure changes later.

Installation¶

Install the AI extra:

pip install "lilya[ai]"

At the moment, the base AI integration uses httpx for outbound provider calls.

Supported Provider Families¶

Lilya's AI contrib intentionally distinguishes between:

1. OpenAI-compatible providers¶

These providers expose a request/response contract close to POST /v1/chat/completions.

Examples:

OpenAI
Groq
Mistral
any self-hosted or gateway provider that mirrors the same API surface

Use:

OpenAICompatibleProvider
OpenAIProvider
GroqProvider
MistralProvider

2. Non-compatible providers¶

Some vendors use different wire formats, headers, and streaming event shapes.

Example:

Anthropic Messages API

Use:

AnthropicProvider

Architecture¶

flowchart LR
    A["Lilya Endpoint"] --> B["AI Dependency or app.state.ai"]
    B --> C["AIClient"]
    C --> D["PromptRequest"]
    D --> E["Provider Adapter"]
    E --> F["External AI API"]
    F --> E
    E --> G["AIResponse or AIResponseChunk"]
    G --> A

Main building blocks¶

Component	Responsibility
`ChatMessage`	Provider-agnostic chat message representation
`PromptRequest`	Normalized AI request object
`AIClient`	Main API used by Lilya apps
`AIResponse`	Non-streaming response object
`AIResponseChunk`	Streaming chunk object
`setup_ai()`	Attaches a configured AI client to the Lilya app
`AI`	Dependency injection helper that resolves the configured client
Provider classes	Translate Lilya request objects to vendor-specific HTTP payloads

Quick Start¶

OpenAI¶

from lilya.apps import Lilya
from lilya.contrib.ai import AIClient, OpenAIConfig, OpenAIProvider, setup_ai

provider = OpenAIProvider(
    OpenAIConfig(
        api_key="your-openai-key",
    )
)

client = AIClient(
    provider,
    default_model="gpt-4o-mini",
    default_system_prompt="You are a concise assistant for Lilya users.",
)

app = Lilya()
setup_ai(app, client=client)

Groq¶

from lilya.contrib.ai import AIClient, GroqConfig, GroqProvider

provider = GroqProvider(
    GroqConfig(
        api_key="your-groq-key",
    )
)

client = AIClient(provider, default_model="llama-3.3-70b-versatile")

Mistral¶

from lilya.contrib.ai import AIClient, MistralConfig, MistralProvider

provider = MistralProvider(
    MistralConfig(
        api_key="your-mistral-key",
    )
)

client = AIClient(provider, default_model="mistral-small-latest")

Anthropic¶

from lilya.contrib.ai import AIClient, AnthropicConfig, AnthropicProvider

provider = AnthropicProvider(
    AnthropicConfig(
        api_key="your-anthropic-key",
    )
)

client = AIClient(provider, default_model="claude-sonnet-4-20250514")

Full Application Example¶

If you want to see the full shape in one place, this is the most useful mental model:

configure a provider
build one shared AIClient
attach it with setup_ai()
inject it into routes with AI

from lilya.apps import Lilya
from lilya.contrib.ai import AI, AIClient, ChatMessage, OpenAIConfig, OpenAIProvider, setup_ai

provider = OpenAIProvider(
    OpenAIConfig(
        api_key="your-openai-key",
        timeout=20.0,
    )
)

client = AIClient(
    provider,
    default_model="gpt-4o-mini",
    default_system_prompt="You are a helpful assistant for Lilya applications.",
)

app = Lilya()
setup_ai(app, client=client)


@app.post("/summarize", dependencies={"ai": AI})
async def summarize(ai: AI):
    result = await ai.chat(
        [
            ChatMessage.user(
                "Summarize the following incident in three bullets: "
                "API latency increased after deployment. Error rate stayed low. "
                "A cache misconfiguration caused most of the slowdown."
            )
        ]
    )
    return {
        "provider": result.provider,
        "model": result.model,
        "text": result.text,
    }

This gives you one central AI integration point for the entire app instead of scattering provider calls across handlers.

Core Types¶

`ChatMessage`¶

Use ChatMessage to build provider-neutral conversations.

from lilya.contrib.ai import ChatMessage

messages = [
    ChatMessage.system("You are a release note generator."),
    ChatMessage.user("Summarize the latest deployment."),
]

Convenience constructors are available:

ChatMessage.system(...)
ChatMessage.user(...)
ChatMessage.assistant(...)
ChatMessage.tool(...)

`PromptRequest`¶

PromptRequest is the normalized request shape sent from AIClient to providers.

It contains:

messages
model
system_prompt
temperature
max_tokens
top_p
stop_sequences
metadata
extra

You usually do not instantiate PromptRequest directly in application code. AIClient builds it for you.

`AIResponse`¶

Returned for non-streaming calls.

Fields:

text
model
provider
finish_reason
usage
raw

`AIResponseChunk`¶

Returned during streaming.

Fields:

text
delta
model
provider
finish_reason
raw

Configuration Dataclasses¶

Shared base config¶

from lilya.contrib.ai import AIProviderConfig

Base fields:

api_key
base_url
timeout
headers

OpenAI-compatible config¶

from lilya.contrib.ai import OpenAICompatibleConfig

Additional fields:

provider_name
organization
project

Provider-specific convenience configs¶

Available convenience configs:

OpenAIConfig
GroqConfig
MistralConfig
AnthropicConfig

AnthropicConfig also includes:

anthropic_version
default_max_tokens

Using `AIClient`¶

Simple prompt¶

result = await client.prompt(
    "Write a short release note for the latest deployment.",
)

print(result.text)

Chat conversation¶

from lilya.contrib.ai import ChatMessage

result = await client.chat(
    [
        ChatMessage.system("You are a customer support assistant."),
        ChatMessage.user("A customer cannot log in after resetting their password."),
    ]
)

print(result.text)

Override model or request settings¶

result = await client.prompt(
    "Create three title options for a product page.",
    model="gpt-4o-mini",
    temperature=0.9,
    max_tokens=200,
)

Pass provider-specific extras¶

The extra field lets you pass through advanced provider parameters without contaminating the common API.

result = await client.prompt(
    "Return a JSON object with summary and priority.",
    extra={"response_format": {"type": "json_object"}},
)

This is useful when:

one provider supports a feature not yet normalized by Lilya
you want to experiment without changing the contrib core
you need provider-specific tuning knobs

Endpoint Integration Patterns¶

Most users will consume this feature through Lilya endpoints, not through standalone scripts, so here are the most practical route shapes.

1. Basic JSON endpoint¶

from lilya.contrib.ai import AI

@app.post("/rewrite", dependencies={"ai": AI})
async def rewrite(ai: AI):
    result = await ai.prompt(
        "Rewrite this sentence to sound more professional: "
        "'hey, your payment failed, try again later.'"
    )
    return {"text": result.text}

Use this when:

the caller expects one final answer
you want a simple HTTP request/response interaction
you do not need token streaming

2. Endpoint with explicit user input¶

In a real application, the prompt usually comes from the request body, query, or resolved domain data.

from lilya.contrib.ai import AI, ChatMessage

@app.post("/support/reply", dependencies={"ai": AI})
async def draft_support_reply(ai: AI, request):
    payload = await request.json()
    customer_message = payload["message"]

    result = await ai.chat(
        [
            ChatMessage.system("You are a senior support engineer."),
            ChatMessage.user(
                f"Draft a helpful reply to this customer message:\\n\\n{customer_message}"
            ),
        ],
        temperature=0.4,
    )

    return {"reply": result.text}

That pattern is usually clearer than creating raw provider payloads in the handler.

3. Endpoint with domain context¶

Most real AI features combine user input with business context from your own system.

from lilya.contrib.ai import AI, ChatMessage

@app.get("/accounts/{account_id}/health-summary", dependencies={"ai": AI})
async def account_health_summary(account_id: str, ai: AI):
    metrics = {
        "open_incidents": 2,
        "last_deployment": "2026-04-09T10:00:00Z",
        "error_rate": "0.08%",
        "latency_p95": "420ms",
    }

    result = await ai.chat(
        [
            ChatMessage.system("You produce short, executive-friendly account summaries."),
            ChatMessage.user(
                "Write a concise health summary for this account using the following metrics: "
                f"{metrics}"
            ),
        ]
    )

    return {"account_id": account_id, "summary": result.text}

This is the most common production pattern:

fetch your own data first
shape it into a prompt
keep the AI call as the final transformation step

Startup Integration¶

Attach the AI client to your Lilya app with setup_ai().

from lilya.apps import Lilya
from lilya.contrib.ai import AIClient, OpenAIConfig, OpenAIProvider, setup_ai

provider = OpenAIProvider(OpenAIConfig(api_key="..."))
client = AIClient(provider, default_model="gpt-4o-mini")

app = Lilya()
setup_ai(app, client=client)

What setup_ai() does:

stores the client on app.state.ai
optionally registers startup and shutdown handlers
keeps AI setup in one place instead of scattered across routes

Dependency Injection¶

Use the AI dependency to inject the configured AIClient into handlers.

from lilya.contrib.ai import AI, ChatMessage

@app.post("/summary", dependencies={"ai": AI})
async def generate_summary(ai: AI):
    result = await ai.chat(
        [ChatMessage.user("Summarize today's customer issues in three bullets.")]
    )
    return {"summary": result.text}

This is the recommended integration style because it:

keeps handlers easy to test
avoids direct provider construction inside routes
follows the same pattern Lilya already uses for other contrib services

When to use `request.app.state.ai`¶

request.app.state.ai is still valid, but prefer the dependency helper in handlers.

Use app.state.ai directly when:

you are in startup code
you are wiring custom services
you are outside a normal handler dependency flow

Use AI when:

you are inside endpoints
you want easy test overrides
you want the clearest Lilya-native route signature

How-To Recipes¶

These are the most common things developers try to do when first integrating AI.

How to switch providers without rewriting handlers¶

Keep your handlers dependent on AIClient, not on a specific provider class.

# handler code stays the same
@app.post("/classify", dependencies={"ai": AI})
async def classify(ai: AI):
    result = await ai.prompt("Classify this text as billing, support, or abuse.")
    return {"label": result.text}

Only the startup wiring changes:

from lilya.contrib.ai import AIClient, GroqConfig, GroqProvider

provider = GroqProvider(GroqConfig(api_key="..."))
client = AIClient(provider, default_model="llama-3.3-70b-versatile")
setup_ai(app, client=client)

How to use a self-hosted or gateway provider¶

Use OpenAICompatibleProvider when the gateway exposes a compatible chat completions surface.

from lilya.contrib.ai import AIClient, OpenAICompatibleConfig, OpenAICompatibleProvider

provider = OpenAICompatibleProvider(
    OpenAICompatibleConfig(
        provider_name="internal-gateway",
        api_key="gateway-token",
        base_url="https://gateway.example.com/v1",
        headers={"X-Workspace": "ops"},
    )
)

client = AIClient(provider, default_model="meta-llama/llama-4")

How to add per-request system behavior¶

Use system_prompt= when the instruction should be request-specific.

result = await ai.prompt(
    "Summarize the latest build output.",
    system_prompt="Respond with no more than 4 bullet points.",
)

Use default_system_prompt on AIClient when the instruction should apply globally across the app.

How to return usage metadata to clients¶

AIResponse.usage is normalized when the provider sends token counts.

@app.post("/token-aware", dependencies={"ai": AI})
async def token_aware(ai: AI):
    result = await ai.prompt("Explain what ASGI is.")
    return {
        "text": result.text,
        "usage": {
            "input_tokens": result.usage.input_tokens if result.usage else None,
            "output_tokens": result.usage.output_tokens if result.usage else None,
            "total_tokens": result.usage.total_tokens if result.usage else None,
        },
    }

How to keep request handlers small¶

Move prompt construction into a service function and keep the endpoint thin.

from lilya.contrib.ai import AIClient, ChatMessage


async def generate_release_summary(ai: AIClient, release_notes: str) -> str:
    response = await ai.chat(
        [
            ChatMessage.system("You write concise engineering release summaries."),
            ChatMessage.user(f"Summarize these notes:\\n\\n{release_notes}"),
        ]
    )
    return response.text


@app.post("/releases/summary", dependencies={"ai": AI})
async def release_summary(ai: AI, request):
    payload = await request.json()
    summary = await generate_release_summary(ai, payload["notes"])
    return {"summary": summary}

This tends to produce cleaner handlers and much easier tests.

Streaming¶

Streaming is essential for chat UIs, assistant surfaces, terminals, and progressive rendering.

Stream from a simple prompt¶

async for chunk in client.stream("Explain ASGI to a beginner."):
    print(chunk.delta, end="")

Stream from a chat conversation¶

async for chunk in client.stream_chat(
    [
        ChatMessage.system("You are a pair-programming assistant."),
        ChatMessage.user("Help me debug a 502 gateway error."),
    ]
):
    print(chunk.delta, end="")

Streaming in a Lilya endpoint¶

from lilya.responses import StreamingResponse

@app.get("/stream", dependencies={"ai": AI})
async def stream_answer(ai: AI):
    async def body():
        async for chunk in ai.stream("Write a short changelog entry."):
            if chunk.delta:
                yield chunk.delta

    return StreamingResponse(body(), media_type="text/plain")

Streaming chat to a browser client¶

For browser-like consumers, you will often want newline-delimited chunks or SSE-style output.

from lilya.responses import StreamingResponse

@app.get("/chat/stream", dependencies={"ai": AI})
async def chat_stream(ai: AI):
    async def body():
        async for chunk in ai.stream_chat(
            [
                ChatMessage.system("You are a debugging assistant."),
                ChatMessage.user("Help me understand why a health check might flap."),
            ]
        ):
            if chunk.delta:
                yield f"{chunk.delta}\n"

    return StreamingResponse(body(), media_type="text/plain")

The important part is that your handler decides how to frame streamed chunks for the client:

plain text
newline-delimited text
SSE
custom JSON fragments

Real-World Usage Patterns¶

1. Internal knowledge assistant¶

Use Lilya as the HTTP edge while the AI contrib handles the provider abstraction.

Flow:

authenticate the user
retrieve domain context from your own database or search system
build messages with that context
call AIClient
return or stream the result

2. Support ticket summarization¶

ticket_text = """
Customer cannot access billing.
Payment method was updated yesterday.
They now receive a 403 in the billing portal.
"""

result = await ai.prompt(
    f"Summarize this support ticket and suggest the likely next troubleshooting step:\\n\\n{ticket_text}"
)

2a. Support endpoint example¶

@app.post("/tickets/{ticket_id}/summary", dependencies={"ai": AI})
async def summarize_ticket(ticket_id: str, ai: AI):
    ticket_text = """
    Customer cannot access billing.
    Payment method was updated yesterday.
    They now receive a 403 in the billing portal.
    """

    result = await ai.prompt(
        "Summarize this ticket in two bullets and add one likely next action:\\n\\n"
        f"{ticket_text}"
    )
    return {"ticket_id": ticket_id, "summary": result.text}

3. Structured draft generation¶

Use extra to pass provider-specific structured output options while keeping the rest of your code provider-neutral.

result = await ai.prompt(
    "Return a JSON object with keys `title`, `priority`, and `summary` for this incident: "
    "checkout latency increased after a cache flush",
    extra={"response_format": {"type": "json_object"}},
)

4. Content moderation or classification front door¶

Even if your final application logic is not conversational, AIClient is still useful for:

categorization
summarization
extraction
rewriting
intent detection

5. Coding assistants for internal tools¶

Because the API is streaming-capable and dependency-injection-friendly, it fits well for:

internal IDE plugins
review helpers
changelog drafting
support reply drafting
documentation generation

OpenAI-Compatible Providers in Detail¶

The OpenAICompatibleProvider is one of the most important pieces of this contrib package.

It exists because many vendors intentionally mirror the OpenAI chat completions contract.

That means the same application code can often work with:

OpenAI
Groq
Mistral
compatible gateways
self-hosted proxies exposing the same interface

Example:

from lilya.contrib.ai import AIClient, OpenAICompatibleConfig, OpenAICompatibleProvider

provider = OpenAICompatibleProvider(
    OpenAICompatibleConfig(
        provider_name="my-gateway",
        base_url="https://llm-gateway.internal/v1",
        api_key="internal-token",
        headers={"X-Tenant": "alpha"},
    )
)

client = AIClient(provider, default_model="meta-llama/llama-4")

This is the recommended extension path when a vendor already follows the same payload shape.

Anthropic in Detail¶

Anthropic uses a different messages API than OpenAI-compatible vendors, so Lilya ships a dedicated adapter.

Important differences:

authentication headers differ
the anthropic-version header is required
system content is a top-level field
response and streaming event shapes differ

The Lilya adapter hides those differences behind the same AIClient methods.

Error Handling¶

The package exposes a small exception hierarchy:

AIError
AIConfigurationError
ProviderNotConfigured
AIProviderError
AIResponseError

Recommended usage:

from lilya.contrib.ai import AIProviderError

try:
    result = await ai.prompt("Summarize this page.")
except AIProviderError:
    return {"error": "The AI provider is currently unavailable."}

Testing¶

You do not need real provider calls in most tests.

Test at three levels:

1. Route or service tests¶

Inject a fake AIClient dependency.

2. Client-level tests¶

Use a fake provider that records PromptRequest objects.

3. Provider adapter tests¶

Mock the HTTP transport and assert:

request path
request headers
request payload translation
response parsing
streaming behavior

This repo includes dedicated tests for all of these patterns.

Example: testing a Lilya endpoint with a fake AI client¶

from lilya.dependencies import Provide


class FakeAIClient:
    async def prompt(self, prompt: str, **kwargs):
        class Response:
            text = "fake summary"
            provider = "fake"
            model = "fake-model"
            usage = None

        return Response()


fake_ai = FakeAIClient()


async def resolve_fake_ai(request, **kwargs):
    return fake_ai


FakeAI = Provide(resolve_fake_ai)


@app.post("/summary", dependencies={"ai": FakeAI})
async def summary(ai):
    result = await ai.prompt("Summarize this.")
    return {"text": result.text}

This style makes route tests deterministic and avoids external provider calls.

Production Guidance¶

Keep provider setup centralized¶

Create the provider and client once during startup.

Prefer dependency injection over direct global access¶

Use AI in handlers rather than reaching into request.app.state.ai everywhere.

Use timeouts intentionally¶

Long-running generation can tie up request time if you do not set reasonable timeout values.

Stream for interactive workloads¶

If the user is waiting in a chat-like interface, prefer stream() or stream_chat().

Log provider failures at the boundary¶

The best place to log provider failures is where application context still exists:

route handler
service layer
background job wrapper

Troubleshooting¶

`RuntimeError: httpx is required for lilya.contrib.ai`¶

Install the extra:

pip install "lilya[ai]"

`No model was provided`¶

Set a default model on AIClient or pass model= for the request.

`No AIClient configured`¶

Make sure you called setup_ai(app, client=...).

The endpoint works locally but not in tests¶

Usually one of these is happening:

the route did not declare dependencies={"ai": AI}
the app did not call setup_ai(...)
the test needs to override the AI dependency with a fake provider or fake client

Streaming returns nothing¶

Check:

the provider supports streaming for that model
your route is returning a streaming response correctly
you are consuming chunk.delta

The provider is compatible, but not fully¶

Use OpenAICompatibleProvider for providers that are mostly compatible, then pass vendor-specific options through extra. If the wire format is genuinely different, add a dedicated provider adapter instead of overloading the common one.

Reference Summary¶

API	Purpose
`AIClient.prompt()`	Single-prompt non-streaming call
`AIClient.chat()`	Multi-message non-streaming call
`AIClient.stream()`	Single-prompt streaming call
`AIClient.stream_chat()`	Multi-message streaming call
`setup_ai()`	Register client on app state and lifecycle
`AI`	Inject configured client into Lilya handlers
`OpenAICompatibleProvider`	OpenAI-style compatible provider adapter
`AnthropicProvider`	Anthropic messages API adapter

Provider Defaults Verified¶

The built-in defaults in this module are based on the vendors' official documentation:

OpenAI chat completions under https://api.openai.com/v1
Groq OpenAI compatibility under https://api.groq.com/openai/v1
Mistral chat completions under https://api.mistral.ai/v1
Anthropic Messages API under https://api.anthropic.com/v1/messages with anthropic-version: 2023-06-01

Those defaults are still fully overridable through the configuration dataclasses if your deployment uses a proxy, gateway, or self-hosted compatibility layer.

AI¶