Testing
Orka has comprehensive test coverage across all packages, including unit tests, integration tests (envtest), end-to-end tests (Kind cluster), and frontend tests.
Running Tests
# Run test pipeline (manifests, generate, fmt, vet, then Go tests)
make test
# Run Go tests with coverage report
make test
go tool cover -func=cover.out | grep total
# Run frontend tests
make ui-test # or: cd ui && bun run test
make ui-test-coverage # or: cd ui && bun run test:coverage
# Run E2E tests (requires isolated Kind cluster)
make test-e2e
# Run Agent Substrate E2E (requires Docker, Go, git, curl, kind, kubectl, ko, jq)
SUBSTRATE_E2E_EXTENDED=1 bash scripts/agent-substrate-e2e.sh
# Lint
make lint
make lint-fix
make ui-lint
Test Structure
Go Tests
Tests use Ginkgo + Gomega (BDD style) for controller/integration tests and standard Go testing for unit tests.
| Package | Test Files | Coverage Areas |
|---|---|---|
internal/api/ | handlers_test.go, internal_handlers_test.go, auth_test.go, middleware_test.go, pagination_test.go, server_test.go, openai_compat_test.go | REST API handlers, internal API handlers, memory/session APIs, authentication, middleware, pagination, OpenAI compatibility |
internal/controller/ | task_controller_test.go, agent_controller_test.go, tool_controller_test.go, session_manager_test.go, job_builder_test.go, repositoryscan_controller_test.go, webhook_test.go | Reconciliation logic, session management, job building, coordination enforcement, repository scan mapper/finding/patch ingestion |
internal/security/ | security_test.go, contracts_test.go | Repository security artifact contracts, v2 evidence validation, fingerprinting, bounded context manifests, prompt helpers |
internal/security/slices/ | mapper_test.go | Deterministic review-slice mapper coverage for Go, Node/TypeScript, Python, workflows, scripts, config, path skipping, and stable output |
internal/store/sqlite/ | security_store_test.go | Repository security store migrations, findings, review slices, dropped finding diagnostics, patch proposals |
internal/llm/ | provider_test.go | Provider registry |
internal/llm/anthropic/ | provider_test.go | Anthropic API integration |
internal/llm/openai/ | provider_test.go | OpenAI API integration |
internal/metrics/ | metrics_test.go | Prometheus metrics recording |
internal/tools/ | registry_test.go, memory tool tests, coordination tool tests, PR tool tests, agent-management tool tests, integration_test.go | Built-in tool implementations, memory tools, coordination tools, PR tools, agent management tools |
internal/worker/ | tool_executor_test.go | Custom Tool CRD executor |
workers/ai/ | main_test.go | AI worker functions |
workers/general/ | main_test.go | General worker functions |
workers/agent/copilot/ | main_test.go | Copilot agent worker |
workers/agent/claude/ | main_test.go | Claude agent worker |
workers/agent/codex/ | main_test.go | Codex agent worker |
E2E Tests
End-to-end tests run against a dedicated Kind cluster:
| Test File | Coverage |
|---|---|
test/e2e/e2e_test.go | Core task lifecycle |
test/e2e/agent_test.go | Agent task execution |
test/e2e/agent_copilot_test.go | Copilot runtime |
test/e2e/agent_claude_test.go | Claude runtime |
test/e2e/agent_workspace_test.go | Workspace/git clone |
test/e2e/agent_session_test.go | Session continuity |
test/e2e/autonomous_mode_test.go | Autonomous iterations, max-iteration stop, Plan API, suspend behavior |
test/e2e/coordination_advanced_test.go | cancel_task, inter-task messaging, auto-retry, dynamic agent create/delete |
test/e2e/pr_workflow_test.go | PR tool workflow (create_pull_request, review/comment/merge) and workspace PR env wiring |
test/e2e/api_coverage_test.go | Sessions, agent update API, single-tool API, auth validation, secrets API, chat delete, non-autonomous plan 404 |
test/e2e/chat_advanced_test.go | JSON chat mode, agentRef chat routing, management tools via chat |
test/e2e/security_enforcement_test.go | Non-root execution, read-only filesystem, deny-pattern enforcement, kube-system chat block |
test/e2e/agent_advanced_test.go | Skills ConfigMap wiring, agent resource propagation, session maxMessages behavior |
test/e2e/workspace_advanced_test.go | Advanced workspace settings (gitSecretRef, subPath, ref, fork/PR env vars, session init container) |
test/e2e/provider_advanced_test.go | Provider rate-limit config coverage |
test/e2e/live_copilot_proxy_test.go | Live Orka Provider + type: ai path using copilot-proxy as the backend harness, including durable memory recall, proposal governance, and transcript search tool execution |
test/e2e/live_chat_api_test.go | Live chat SSE and JSON transport/session coverage using a proxy-backed Provider |
test/e2e/live_anthropic_compat_test.go | Live Anthropic-compatible /anthropic/v1/models and /anthropic/v1/messages coverage with default tools-enabled behavior |
test/e2e/live_agent_runtime_matrix_test.go | Live Orka runtime matrix: Codex+GPT, Claude Code+Claude, Copilot+Gemini |
.github/workflows/live-agent-sandbox-e2e.yml / scripts/live-agent-sandbox-e2e.sh | Live upstream agent-sandbox Kind validation for Orka agent workspace claim, sandbox execution, delete cleanup, retained-session reuse, and token scrubbing using a fake model-free Claude runtime |
.github/workflows/live-github-label-trigger-e2e.yml / scripts/live-github-label-trigger-e2e.sh | Manual model-free GitHub label trigger validation for HMAC rejection, signed webhook Task creation, scoped workspace settings, and duplicate delivery idempotency |
.github/workflows/repository-monitor-smoke.yml | Focused RepositoryMonitor smoke coverage for store CRUD, API handlers, pull request event handling, targeted single-PR inventory runs, controller queue/review flow, blocked status counts, read-only review task job building, result stdout forwarding, create_pr_monitor repository URL and credential validation, GitHub tool repo_url scope enforcement, and PR review marker tooling |
.github/workflows/security-scan-e2e.yml / scripts/security-scan-e2e.sh | Secret-free repository security scan Kind validation against pinned sozercan/nodejs-goof using the real mapper, deterministic fake Codex analyzer, v2 finding ingestion/drop diagnostics, threat-model rejection, idempotent rescan, and HITL no-auto-patch gating |
test/e2e/tools_test.go | Built-in tools (including web_fetch, file_write) and custom Tool CRD |
test/e2e/scheduled_task_test.go | Cron scheduling, suspend, concurrencyPolicy: Forbid, history-limit cleanup |
test/e2e/task_lifecycle_test.go | Timeout/retry/cancel plus session serialization and lock release |
The Repository Monitor Smoke workflow runs in GitHub Actions on pull requests and pushes that touch the workflow, API, controller, CRD/config, worker, or Go dependency paths. It creates the UI embed stub and runs focused go test selections for the monitor store, API handlers, GitHub pull request event handling, targeted single-PR inventory runs, controller queue/review flow, blocked status counts, read-only review job construction, result stdout forwarding, create_pr_monitor repository URL and credential validation, GitHub tool repo_url scope enforcement, and PR review marker signing/detection tooling. The workflow is secret-free: exact PR event queueing is tested with synthetic signed webhook payloads and fake GitHub clients rather than live repository credentials. The normal Go Tests workflow runs make test for non-doc code changes and covers worker-level PR review diff context generation.
Repository security E2E coverage should include initial deterministic slice creation, incremental scan behavior, invalid v2 evidence being dropped and visible through API, validation task persistence, successful verified patch proposals, and patch proposals with missing or mismatched artifacts staying not ready.
E2E Key Requirements
E2E_OPENAI_API_KEY: required for LLM-backed tests (AI chat/tasks, coordination, PR workflow orchestration)E2E_ANTHROPIC_API_KEY: required for Anthropic-specific e2e casesE2E_GITHUB_TOKEN: required for GitHub/Copilot and live Copilot runtime testsCOPILOT_GITHUB_TOKEN: required by the livecopilot-proxyworkflow for proxy auth- The live agent sandbox workflow requires Docker, Kind, kubectl, curl, jq, and network access to install the pinned upstream
agent-sandboxrelease. It does not require model credentials. - The live GitHub label trigger workflow is manual, model-free, and secret-free. It requires Docker, Kind, kubectl, curl, jq, and Python locally, accepts
GITHUB_LABEL_TRIGGER_TARGET_REPO_URLandGITHUB_LABEL_TRIGGER_TARGET_NUMBERoverrides, and sends only synthetic webhook payloads to the local Orka API. - GitHub Actions
id-token: writepermission: required by the live GitHub OIDC workflow. For local/manual runs ofscripts/live-github-oidc-e2e.sh, setORKA_GITHUB_OIDC_TOKENto a valid JWT instead. The same workflow also runs a self-containedkontxtTxToken check using an ephemeral key/JWKS fixture, so no external kontxt secret is required. E2E_LIVE_COPILOT_PROXY_BASE_URL(orE2E_COPILOT_PROXY_BASE_URL/COPILOT_PROXY_BASE_URL): enables the focused live copilot-proxy spec against a running proxyE2E_LIVE_COPILOT_PROXY_SERVICE_NAMESPACE,E2E_LIVE_COPILOT_PROXY_SERVICE_NAME,E2E_LIVE_COPILOT_PROXY_SERVICE_PORT: optional overrides for how the live spec reaches the in-cluster proxy service for/readyzand/v1/modelschecks- Structural e2e tests (job/env/volume assertions) run without external model keys
- Security Scan E2E is secret-free and model-free, but requires Docker plus local toolchain dependencies: Go,
kind,kubectl,curl, andjq - Agent Substrate E2E is secret-free, but requires Docker plus local toolchain dependencies: Go, git, curl,
kind,kubectl,ko, andjq
The live copilot-proxy E2E path runs in a separate workflow and executes the focused live suites for:
- provider-backed
type: aitasks, including durable memory/tool execution coverage - chat SSE/JSON flows via
/api/v1/chat - Anthropic-compatible
/anthropic/v1/modelsand/anthropic/v1/messagesflows with the default Orka tool loop enabled - external agent runtimes across
codex+ GPT,claude+ Claude, andcopilot+ Gemini
This is an Orka live integration suite, not a deep copilot-proxy feature suite. The proxy is test harness infrastructure that gives non-Copilot runtimes access to live GPT, Claude, and Gemini models in CI. The only proxy-specific assertions are smoke checks that the harness is alive and usable:
/readyzreturns healthy/v1/modelsis non-empty- GPT, Claude, and Gemini model families are present
It bootstraps a fresh Kind cluster, deploys the published multi-arch docker.io/sozercan/copilot-proxy:latest image, injects COPILOT_GITHUB_TOKEN for proxy auth, requires the live proxy to expose GPT/Claude/Gemini model families, maps that same secret to E2E_GITHUB_TOKEN for the Copilot runtime case, and then runs the focused live suites against the in-cluster proxy.
Model selection is endpoint-specific. Provider-backed type: ai tasks and /api/v1/chat probe the live proxy before choosing an OpenAI-compatible Chat Completions model, because a model can appear in the catalog while still being rejected for that endpoint. The Codex runtime uses GPT models that work with the Responses API, while the Claude and Copilot runtime matrix cases use Claude and Gemini families respectively. Keep these preferences in test/e2e/helpers_test.go aligned with the live proxy's allowed models rather than assuming one model family works across every endpoint.
The live agent sandbox workflow (.github/workflows/live-agent-sandbox-e2e.yml) runs scripts/live-agent-sandbox-e2e.sh. It installs upstream agent-sandbox v0.4.6, builds the PR controller image, builds a fake Claude worker image that also hosts the sandbox /execute and file APIs, builds the pinned upstream sandbox router image, and validates that Orka can run an agent Task inside the claimed sandbox without external model access. The script asserts:
- the outer worker re-execs inside the sandbox with
ORKA_AGENT_SANDBOX_DEPTH=1and sandbox recursion disabled - the staged service account token is available to the inner worker while the command runs
cleanupPolicy: deleteremoves the generatedSandboxClaimcleanupPolicy: retainplusreusePolicy: sessionreattaches to the deterministic session claim- retained workspace state persists across tasks
- staged token files are scrubbed before the retained workspace is left behind
The live GitHub label trigger workflow (.github/workflows/live-github-label-trigger-e2e.yml) runs scripts/live-github-label-trigger-e2e.sh from manual workflow_dispatch. It builds the controller from the PR, deploys it to a fresh Kind cluster, configures a generated ORKA_GITHUB_WEBHOOK_SECRET, creates a synthetic runtime Agent, and posts a signed agent:implement issue label payload to /webhooks/github. The script asserts:
- invalid webhook signatures return
401 - a signed label event returns
201and creates atype: agentTask - the created Task points at the configured GitHub repository clone URL and default branch
- no push branch or git credential Secret is configured for the synthetic task
- GitHub delivery annotations are recorded on the Task
- a repeated delivery returns
202with the original task name
The live GitHub OIDC workflow (.github/workflows/live-github-oidc-e2e.yml) runs scripts/live-github-oidc-e2e.sh in GitHub Actions with id-token: write. It builds the controller from the PR, deploys it to a fresh Kind cluster, configures ORKA_OIDC_ISSUER=https://token.actions.githubusercontent.com and the workflow audience, fetches a real GitHub Actions OIDC token, generates a real kontxt TxToken against an in-cluster JWKS endpoint, and validates:
- unauthenticated API requests return
401 - OIDC-authenticated Task creation returns
201 - the OIDC-created Task response and persisted CR contain
spec.requestedBywith the GitHub OIDC issuer and a non-empty subject - the
kontxt-created Task response and persisted CR containspec.requestedBywith the configured kontxt issuer, subject, and scope-derived roles - top-level
requestedByand nestedspec.requestedByclient tampering are rejected with400 - a tampered
kontxtTxToken is rejected with401
The Agent Substrate workflow (.github/workflows/agent-substrate-e2e.yml) is secret-free and runs scripts/agent-substrate-e2e.sh against a fresh Kind cluster. It pins the Substrate checkout with SUBSTRATE_REF, installs Substrate, initializes the local RustFS snapshot bucket, builds local Orka controller/workspace/worker images, then validates:
- direct Substrate actor create/resume/router/daemon exec/suspend/delete
- Orka
SubstrateActorPoolreconciliation and density reporting - Orka
Taskexecution and result submission with the default Substrate workspace provider - pooled Orka
Taskplacement throughspec.execution.workspace.poolRef - MCP actor-backed
Toolexecution through a pooled Substrate actor - MCP actor reuse across forced Tool reconciles without rebooting an already booted actor
- workspace placement, density, and resume-latency status fields
- delete and retained cleanup when the pinned Substrate runtime completes
runsc delete WorkspaceCleanupFailedis tolerated only after the Task result is available, because the pinned Substrate revision can failrunsc deleteafter successful Orka execution in GitHub-hosted kind- a missing
ActorTemplatefails predictably - failure diagnostics include Orka controller logs, worker Job logs, Task YAML, Kubernetes events, and Substrate actor/worker state
Run it locally with:
PATH="$(go env GOPATH)/bin:$PATH" \
SUBSTRATE_E2E_EXTENDED=1 \
bash scripts/agent-substrate-e2e.sh
Frontend Tests
Frontend tests use Vitest + Testing Library + MSW. Coverage thresholds are enforced in vite.config.ts.
cd ui && bun run test:coverage
Testing Patterns
Table-Driven Tests
tests := []struct {
name string
input string
want string
wantErr bool
}{
{"valid", "input", "output", false},
{"invalid", "bad", "", true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// test logic
})
}
Fake Kubernetes Client
scheme := runtime.NewScheme()
corev1alpha1.AddToScheme(scheme)
corev1.AddToScheme(scheme)
client := fake.NewClientBuilder().WithScheme(scheme).WithObjects(objs...).Build()
HTTP Mocking
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte(`{"result": "ok"}`))
}))
defer server.Close()
Fiber Test App
app := fiber.New()
app.Get("/test", handler)
req := httptest.NewRequest(http.MethodGet, "/test", nil)
resp, _ := app.Test(req)
Frontend Test Mocking
// Mock zustand persist middleware
vi.mock('zustand/middleware', () => ({ persist: (fn: unknown) => fn }))
// Use test utils with QueryClient wrapper
import { render } from '@/test/test-utils'
Testing with Chat
When testing features via the chat endpoint, use natural prompts — the kind a human would actually type. Never reference internal concepts like agent names, tool names, or implementation details. Describe what you want done, not how the system should do it. The chat should infer the right agents, tools, delegation patterns, and cancellation logic on its own.
Good examples:
- "Research the benefits of Kubernetes and write a technical guide based on the findings."
- "What's the best container orchestration tool? Get me an answer as fast as possible."
- "Draft an outline for a blog post about containers and turn it into a full post."
- "Compare microservices vs monoliths from three angles, then synthesize into a recommendation."
Bad examples:
- "Create a coordinator agent and a researcher agent, then delegate two tasks..."
- "Use the send_message tool to send a message to task msg-receiver..."
- "Have three researchers race to answer..." (users don't think in terms of "researchers")
- "Use the first answer and cancel the others." (the system should infer this automatically)