Agentic OS · Harness Engineering · Sovereignty
By JG · June 2, 2026
Most agent demos start beautiful: a while-loop, the model returns tool calls, the program runs them, pushes results back into messages, repeats. Two hundred lines, elegant. Then real requirements arrive—approval, shell limits, traces, MCP, Slack and WeChat gateways, long-term memory, plugins, skills, cron, post-run learning. Three days later the elegant loop is a black hole nobody dares touch. The lesson, echoed in open architecture studies like the Go harness "Rein," is blunt: the hard part is not calling the model—it is the boundaries.
Decompose by Responsibility, Not by Feature Menu
A durable harness is not a pile of features; it is a set of modules separated by four questions asked of every component: Does it get to change the agent's behavior? Write state? Be exposed to the model? Execute side effects? Group the work into a trusted base (config, secrets, protocol, observation, state, session), a core loop (provider, tools, agent, hooks), a security boundary (policy, execpolicy, sandbox, guardrails), context and extension (memory, skills, context, MCP, plugins), entrypoints and scheduling (runtime, CLI, gateway, cron, learning), and quality (test, eval). Most agent projects fail not because a feature is missing, but because every feature grows inside one agent.Run().
Cut One: The Loop Must Stay Small
The core loop should only assemble the provider request, call the model, validate tool calls through the security chain, write the tool-observation envelope back into logical history, commit the step, and continue. It must not also read config, write the database, run shell, manage gateways, schedule cron, or perform learning. When the loop is small you can fake the provider, the store, the tools, and approval—and test the agent without Docker, network, or live platforms. When it is a grab-bag, you no longer test the agent; you test the whole world.
Cut Two: Model-Visible Observation Is Not System Observation
The tool-observation envelope is what the model sees next turn—the facts a tool call leaves behind. System observation—events, traces, logs, metrics—is a read-only side channel for humans and engineering. Mixing them poisons the context window with runtime noise, or starves the model of facts it needs to reason. The discipline is one question per piece of information: is this for the model, or for people and systems?
Cut Three: Security Is a Chain
Replace if allowed { runTool() } with a chain: policy decides allow/ask/deny; execpolicy decides how (sandbox backend, resource limits, network, secret references); sandbox isolates execution; guardrails decide what output may enter the model; hooks extend only within the controlled lifecycle. Guardrails live in code and configuration—writing "do not delete production data" in a prompt is not a security boundary. Once an agent can run tools, safety is not a boolean; it is a chain.
Cut Four: Durable Facts First, One Writer per Fact
Separate the agent-facing commit facade from the durable state backend, and give each class of fact a single writer. Land runs, events, artifacts, schedules, and learning proposals first; then let cron claim schedules reliably, gateways track platform sessions, and learning replay run history. Without durable state, automation built on top is unreliable.
Cut Five: Learning Proposes—And Why VERTU Builds This Way
The most dangerous learning rewrites the agent mid-run. The disciplined choice is proposal-only: after a run, an analyzer reads durable facts and saves proposals—memory update, skill patch, test case, policy suggestion—for review. Extensions can strengthen the system, but cannot bypass it. This is the same gate even disciplined operators keep: nothing auto-merges to production without review.
VERTU's sovereign Personal AI Harness is built on exactly these boundaries, expressed as the four duties: PROTECT on a hardware root of trust; UNDERSTAND through sovereign memory; HELP by acting across apps from AlphaFold; and ORCHESTRATE cloud models with redacted intent. The Sovereign Private Server (VPS) ERP supplies durable, provenance-checked Context; the security chain runs in code, not prompts. The reason is simple and it is our whole thesis: in a world of commoditized models, Harness > Model—and a harness only earns trust when its boundaries hold as the tenth module is added.
Frequently Asked Questions
Why does the agent loop need to stay small?
If config, provider calls, permissions, shell, database writes, logging, plugins, cron, and learning all live in one Run(), you test the whole world, not the agent. A small loop assembles the request, calls the model, validates and executes tool calls through a security chain, commits the step, and returns—everything else enters via the runtime.
Why is agent security a chain, not a boolean?
A real chain separates policy (allow/ask/deny), execpolicy (how to run), sandbox (isolation), guardrails (what may enter the model), and hooks (controlled extension). Each layer makes only its own decision, and guardrails live in code and config—not in the prompt.
How should an agent harness handle learning safely?
Proposal-only: after a run, an analyzer reads durable facts and saves proposals—memory updates, skill patches, test cases, policy suggestions—for human acceptance. It must not rewrite memory, skills, or policy mid-run. Proposals are auditable and reversible; self-modification is neither.




