Honest limits

What Vortalis does not defend against.

A protocol-level governance proxy has a real defensive perimeter and a real outside of that perimeter. This page is the outside, written for security teams, auditors, and regulators who would rather read the limit than be surprised by it. The corresponding inside is on the rest of this site; a short summary sits at the bottom of this page.

We do not detect goal drift over long-horizon agent trajectories.

Reliable goal-drift detection is an open AI safety research problem. Vortalis provides per-session trajectory indicators (change-point analysis, intent-vs-action divergence) which are useful as forensic signals after the fact. They are not a guarantee that an agent will not drift inside a session, and they do not promise to detect a patient adversarial drift that stays inside the indicator's sensitivity. We provide the forensics; we do not prevent the drift.

We do not defend against governance-aware adversarial agents in the strong sense.

An adversary that models the proxy can probe its denial surface and adapt. Vortalis raises the cost of this attack: denial reasons are sealed to the agent and visible only in the operator audit chain, and rate limits short-circuit the volume of training signal an adversary needs to fit the policy. These controls increase the work required, they do not close the problem. Continuous detector retraining is the operational answer; the platform's defence improves over time and is not static.

We do not detect emergent multi-agent coordination via legitimate shared services.

Vortalis sees the delegation graph between agents. Where two agents act in concert by direct delegation, joint-action policy rules can deny the second action. Where two agents coordinate through a neutral third service without any delegation being recorded (a shared mutable state on an unrelated system, for example), there is no delegation graph to evaluate. Correlation-based detection over shared-service activity is on the research roadmap, not in production.

We do not contain AGI.

Protocol-level governance is not the right layer for AGI containment. We will not pretend otherwise. What Vortalis does provide remains meaningful even in the presence of a much more capable agent: forensic evidence of every action, authority termination via scoped kill switches, and a cryptographically attested audit chain that an independent auditor can verify offline. These are forensics and revocation tools. They are not containment.

We do not defend against cross-modal injection at the model-capability layer.

An instruction carried in an image or audio input, invisible to text-only detection, is a model-capability problem. Vortalis supports adapter-level mode whitelists (operators can declare that an agent may not consume image input, for example), which is a hard constraint, not a defence. The defence against the underlying threat lives at the model-provider layer, not at the governance proxy layer.

We do not protect against model-provider compromise upstream of the agent.

Our scope begins at the agent's first boundary call. A compromise of the underlying model provider, or of the model weights themselves, is outside what a runtime governance proxy can observe or prevent. Where an operator's threat model includes the model provider, additional controls at the model-provider boundary (separate signing keys, dedicated inference infrastructure, attested model artefacts) are needed alongside Vortalis, not within it.

What Vortalis does claim

Inside the perimeter, the claims are concrete.

Every agent action is decided by policy before it executes, not analysed after the fact.
Sensitive fields are shielded out of agent context by default; the agent operates on safe substitutes.
Every decision is signed and chained into an audit record that an independent auditor can verify offline.
A kill switch can pause an agent, a class of action, a session, or the entire estate in seconds.
Human-in-the-loop approval is policy-driven and instrumented for the over-reliance failure mode.
Adapter execution is sandboxed with filesystem and network confinement (where the sandbox confinement work has shipped; the EXPERIMENTAL flag, where present, is observable in product).

This page is maintained as the platform evolves. Some items here are actively on the research roadmap; others are out of scope by design. We will move an item off this page only when there is real engineering behind the move, not when there is real marketing behind it.

Talk to an engineer about your threat model