Skip to main content

Interactive Chat

The chat endpoint provides an agentic conversational interface where an LLM orchestrator can create and manage Kubernetes resources on the user's behalf. It accepts natural language, reasons about what tasks to create, and autonomously executes them using the platform.

Endpoint

POST /api/v1/chat — Send a message (SSE streaming or JSON response)
GET /api/v1/chat/config — Get chat configuration and available tools
DELETE /api/v1/chat/:sessionId — Cancel a chat session

Architecture

User ──POST /api/v1/chat──▶ API Server ──▶ Concurrency Semaphore

├─▶ context.WithTimeout (--chat-max-duration, default 30m)
├─▶ Resolve Provider CRD → get LLM client
├─▶ Load/create chat session (prefix: chat-session-)
├─▶ Build structured system prompt (cached per request)
├─▶ Call LLM with tools
│ │
│ ├─▶ create_ai_task
│ ├─▶ create_container_task
│ ├─▶ create_agent_task
│ ├─▶ check_task_progress
│ ├─▶ fetch_task_output
│ ├─▶ wait_for_task
│ ├─▶ cancel_task
│ ├─▶ list_agents / list_tools / list_tasks
│ ├─▶ create_agent / update_agent / delete_agent (on-demand)
│ ├─▶ create_tool / delete_tool (on-demand)
│ └─▶ delete_session (on-demand)

├─▶ Stream response via SSE (with heartbeats)
└─▶ Detect client disconnect → cleanup

Request Format

{
"message": "Create an AI task that summarizes Kubernetes best practices",
"sessionId": "my-session",
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"namespace": "default",
"temperature": 0.7,
"maxTokens": 4096,
"systemPrompt": "Focus on security topics",
"agentRef": "my-agent"
}

Only message is required. All other fields are optional:

  • sessionId: Existing session for continuity (auto-created if omitted)
  • provider / model: Override the default LLM provider and model
  • agentRef: Use an Agent CRD for provider/model/temperature defaults
  • systemPrompt: Appended to the built-in orchestrator prompt
  • namespace: Namespace for task operations

Response Formats

SSE Streaming (default)

All SSE events use a structured envelope:

id: <monotonic-seq>
event: <type>
data: {"sessionId":"...","content":"...","toolCall":{...},"error":{...},"usage":{...}}

Event types:

EventDescription
statusStream opened — confirms session, provider, model
messageText content delta
tool_callTool invocation (name + args)
tool_resultTool execution result
errorError with code and message
doneStream complete with usage stats

JSON Response

Send Accept: application/json for a blocking JSON response:

{
"sessionId": "chat-session-abc123",
"message": "I created an AI task...",
"toolCalls": [{"name": "create_ai_task", "args": {...}, "result": {...}}],
"usage": {"inputTokens": 4200, "outputTokens": 1800, "llmCalls": 3}
}

Live Coverage

PR-blocking live CI covers the chat endpoint against a real backend in both:

  • SSE mode via Accept: text/event-stream
  • JSON mode via Accept: application/json

The live suite verifies transport behavior, session creation/persistence, and usage reporting. Exact model wording is still covered primarily by deterministic chat tests in test/e2e/ and unit tests.

Available Tools

Core Tools (always loaded)

ToolDescription
create_ai_taskCreate an AI/LLM-powered task
create_container_taskCreate a container task for shell commands
create_agent_taskCreate a task with CLI runtime (Copilot/Claude)
check_task_progressGet task phase/status conditions
fetch_task_outputGet completed task result (truncated to 2K chars)
wait_for_taskWait for task completion (max 60s per call, non-blocking)
cancel_taskCancel/delete a task
list_agentsList Agent CRDs with projected summaries
list_toolsList Tool CRDs and built-in tools
list_tasksList tasks with optional status filter

Management Tools (loaded on demand)

These are only included when the user's message signals CRUD intent:

ToolDescription
create_agentCreate an Agent CRD
update_agentUpdate an Agent CRD
delete_agentDelete an Agent CRD
create_toolCreate a Tool CRD with HTTP endpoint
delete_toolDelete a Tool CRD
delete_sessionDelete a session and transcript

System Prompt

The system prompt uses XML-delimited sections for optimal tool-calling accuracy:

<identity>
You are the Orka orchestrator — an AI assistant that manages
Kubernetes-native task execution.
</identity>

<capabilities>...</capabilities>
<task_types>container, ai, agent with usage guidance</task_types>
<available_agents>...dynamically injected...</available_agents>
<available_tools>...dynamically injected...</available_tools>
<rules>Operational invariants</rules>
<examples>Complete multi-step tool-calling traces</examples>

Dynamic context (agents, tools) is built once at request start and cached for the tool loop duration.

Safety Mechanisms

Concurrency Control

  • Bounded semaphore (--chat-max-concurrent, default 10)
  • Returns 429 Too Many Requests when full

Timeouts

  • --chat-max-duration (default 30m): Wall-clock timeout per request
  • --chat-tool-timeout (default 60s): Per-tool execution timeout
  • --chat-max-iterations (default 50): Max tool execution loops

Resource Limits

  • --chat-max-tasks-per-turn (default 5): Max tasks created per chat turn
  • Task names use session-scoped prefix to prevent collisions

Stuck-State Detection

  • Repetition detector: Same tool called with identical args 3 times → warning injected, 5 iterations penalized
  • Progress assertion: Every 5 iterations, LLM must summarize progress
  • Graceful termination: On iteration exhaustion, LLM must provide final summary

Error Handling

Structured error responses help the LLM self-correct:

{
"success": false,
"error": "Agent 'my-agent' not found in namespace 'default'",
"errorType": "not_found",
"suggestion": "Use list_agents to see available agents"
}

Session Management

  • Chat sessions use prefix chat-session- and type chat in the session store
  • Sessions store message summaries, not full tool outputs
  • Auto-truncation when session exceeds --chat-max-session-size (default 500KB)
  • First user message is always preserved for context

Namespace Scoping

  • Default to namespace from ChatRequest or authenticated user's namespace
  • kube-system, kube-public, and the operator's namespace are blocked
  • All orchestrator-created resources get labels: orka.ai/created-by: orchestrator, orka.ai/chat-session: <sessionId>

Configuration Flags

FlagDefaultDescription
--chat-enabledtrueEnable/disable chat endpoint
--chat-provider""Default Provider CRD name
--chat-model""Default model
--chat-max-iterations50Max tool execution loops
--chat-max-duration30mMax wall-clock time per request
--chat-tool-timeout60sMax time per tool execution
--chat-max-concurrent10Max concurrent chat sessions
--chat-max-tasks-per-turn5Max tasks per chat turn
--chat-max-session-size512000Session size soft limit (bytes)

Example Usage

# Chat with SSE streaming
curl -N http://localhost:8080/api/v1/chat \
-H "Authorization: Bearer $(kubectl create token orka-client)" \
-H "Content-Type: application/json" \
-d '{
"message": "Create an AI task that summarizes Kubernetes best practices",
"sessionId": "my-session"
}'

# Chat with JSON response
curl http://localhost:8080/api/v1/chat \
-H "Authorization: Bearer $(kubectl create token orka-client)" \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-d '{"message": "List all available agents"}'

# Get chat configuration
curl http://localhost:8080/api/v1/chat/config \
-H "Authorization: Bearer $(kubectl create token orka-client)"

# Cancel a chat session
curl -X DELETE http://localhost:8080/api/v1/chat/my-session \
-H "Authorization: Bearer $(kubectl create token orka-client)"