Anthropic-Compatible API
Orka exposes an Anthropic-compatible Messages API at /anthropic/v1/messages, enabling Anthropic-compatible clients (Claude Code, etc.) to use Orka as a transparent proxy.
Orka acts as a proxy to whichever LLM provider is configured in your cluster, with credentials managed securely via Kubernetes Secrets and Provider CRDs. See also OpenAI Compatibility for the OpenAI-compatible proxy.
Endpoints
| Method | Path | Description |
|---|---|---|
POST | /anthropic/v1/messages | Create a message (streaming & non-streaming) |
GET | /anthropic/v1/models | List available models from configured providers |
PR-blocking live CI exercises this API directly against a live Claude-family backend by checking /anthropic/v1/models and both non-streaming and streaming /anthropic/v1/messages requests. Those live checks keep the default Orka tool-loop behavior enabled unless a client explicitly sets X-Orka-Tools: disabled.
Authentication
Two authentication methods are supported:
x-api-key: <orka-token>— Anthropic convention (recommended for Anthropic clients)Authorization: Bearer <orka-token>— Standard Bearer token
Both use a Kubernetes ServiceAccount token as the value.
Model Name Format
The model field supports two formats:
provider/model— e.g.,anthropic/claude-sonnet-4-20250514. The part before/matches a Provider CRD name, and the part after is the model name sent to that provider.model— e.g.,claude-sonnet-4-20250514. Uses the default provider (from--chat-providerflag or a Provider CRD nameddefault).
Prerequisites
- Provider CRD configured in the cluster:
apiVersion: core.orka.ai/v1alpha1
kind: Provider
metadata:
name: anthropic
namespace: default
spec:
type: anthropic
secretRef:
name: anthropic-secret
key: api-key
defaultModel: claude-sonnet-4-20250514
- Secret with the API key:
apiVersion: v1
kind: Secret
metadata:
name: anthropic-secret
namespace: default
type: Opaque
stringData:
api-key: sk-ant-...
- ServiceAccount token for authentication:
# Create a service account
kubectl create serviceaccount orka-client
# Bind it to the orka viewer role (or a custom role)
kubectl create clusterrolebinding orka-client-binding \
--clusterrole=orka-task-viewer \
--serviceaccount=default:orka-client
# Get a token
export ORKA_TOKEN=$(kubectl create token orka-client)
Using with Claude Code
Configure Claude Code to route all API calls through Orka:
export ANTHROPIC_BASE_URL=https://orka.example.com/anthropic
export ANTHROPIC_API_KEY=$(kubectl create token orka-client)
# Claude Code will now route all API calls through Orka
Using with curl
Non-streaming
curl -X POST https://orka.example.com/anthropic/v1/messages \
-H "x-api-key: $ORKA_TOKEN" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}]
}'
Streaming
curl -X POST https://orka.example.com/anthropic/v1/messages \
-H "x-api-key: $ORKA_TOKEN" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"max_tokens": 1024,
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'
List models
curl https://orka.example.com/anthropic/v1/models \
-H "x-api-key: $ORKA_TOKEN"
Supported Features
| Feature | Supported |
|---|---|
| Messages API | Yes |
| Streaming (SSE) | Yes |
| Tool use (function calling) | Yes |
Extended thinking (thinking content blocks with budget_tokens) | Yes |
| System messages (string format) | Yes |
| System messages (content block array format) | Yes |
max_tokens | Yes |
temperature | Yes |
stop_sequences | Yes |
| Image inputs | Not yet |
| PDF inputs | Not supported |
Server-Side Tool Execution
By default, the Anthropic endpoint enables server-side tool execution — it injects Orka's built-in tools into the request and runs an autonomous tool loop. When the LLM returns tool_use content blocks, the proxy intercepts them, executes the tools, feeds results back to the LLM, and repeats until a final text response is produced. Clients never need to execute tools locally.
To disable this and use the endpoint as a transparent proxy, set the X-Orka-Tools: disabled header:
X-Orka-Tools: disabled
When this header is set, requests are forwarded to the LLM and responses are returned without intercepting tool calls. The client manages its own tool execution loop.
Available Tools
By default (without the X-Orka-Tools: disabled header), the proxy automatically injects these built-in tools into the request:
| Tool | Description |
|---|---|
web_search | Search the web for information |
code_exec | Execute code snippets in a sandbox |
file_read | Read file contents from workspace |
file_write | Write files to workspace |
web_fetch | Fetch and extract content from URLs |
Additionally, any Tool CRDs defined in the user's namespace are automatically included as custom HTTP tools.
Client-provided tools in the request are preserved and merged with the injected tools.
How It Works
- Client sends a
POST /anthropic/v1/messagesrequest (tools are injected automatically) - Proxy injects Orka tools into the request and forwards to the LLM
- If the LLM returns
tool_useblocks:- Proxy executes each tool server-side
- Tool results are appended to the conversation
- Proxy calls the LLM again with updated context
- Steps 2-3 repeat until the LLM returns a text-only response
- Final response is returned to the client
Streaming Behavior
When stream: true, the proxy streams Anthropic SSE events throughout the entire tool loop:
message_start: Emitted once at the beginningcontent_block_start/delta/stop: Streamed for each text andtool_useblock from the LLM- Tool result blocks: After executing each tool, the result is streamed as a text content block (e.g.,
[Tool web_search result]: ...) message_delta+message_stop: Emitted once at the end
This means clients see real-time progress as tools are called and results are produced, even across multiple LLM round-trips.
Limits and Timeouts
| Setting | Default | Description |
|---|---|---|
| Max iterations | 50 | Maximum number of LLM calls per request |
| Max duration | 30 minutes | Overall request timeout |
| Tool timeout | 60 seconds | Per-tool execution timeout |
| Max session size | 500 KB | Conversation size budget (triggers truncation) |
These values come from the chat configuration and apply to both streaming and non-streaming requests.
When the iteration limit is reached, the proxy injects a summary prompt and makes one final LLM call without tools to produce a closing response.
Repetition Detection
If the LLM calls the same tool with identical arguments 3 or more times, the proxy injects a warning message asking it to try a different approach. This prevents infinite loops where the LLM repeatedly calls a failing tool.
Error Handling
- Tool execution errors: Wrapped as JSON results (
{"success": false, "error": "..."}) and fed back to the LLM, which can decide how to recover - LLM errors: If the LLM returns a context-too-long error, the proxy truncates the conversation to ~50% and retries once. Other LLM errors terminate the loop and return an Anthropic error response
- Timeout: If the overall request timeout is reached, the proxy returns whatever progress has been made
Example: curl with Server-Side Tools
Server-side tool execution is enabled by default — no special header needed:
curl -X POST https://orka.example.com/anthropic/v1/messages \
-H "x-api-key: $ORKA_TOKEN" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "anthropic/claude-sonnet-4-20250514",
"max_tokens": 4096,
"messages": [{"role": "user", "content": "Search the web for Kubernetes 1.32 release highlights and summarize them."}],
"stream": true
}'
To use as a transparent proxy instead (client manages tools), add X-Orka-Tools: disabled.
Architecture
┌─────────────┐ ┌──────────────────────────────┐ ┌───────────────┐
│ Claude Code │────▶│ Orka API Server │────▶│ Anthropic API │
│ (or any │◀────│ /anthropic/v1/messages │◀────│ OpenAI API │
│ Anthropic │ │ │ │ Azure OpenAI │
│ client) │ │ Provider resolution: │ └───────────────┘
└─────────────┘ │ - Provider CRD lookup │
│ - Secret-based API keys │
│ - Model routing │
│ - Server-side tool execution │
└──────────────────────────────┘
Orka injects built-in tools and runs server-side tool execution by default. Set X-Orka-Tools: disabled to use as a transparent proxy where the client manages its own tool execution loop — see Server-Side Tool Execution above.