Show HN: We post-trained a model that pen tests instead of refusing your code

2026-06-09T12:26 · tech

I'm Dimitrios at Cosine. Quick orientation first: the read-only scan is free and you can run it right now: that's the part to try. The pen-test mode is gated behind written authorisation, because it's live offensive testing against real systems; I'll explain that below, it's not a paywall thing.The reason `cos` exists: most "AI security" tools wrap a general model, so they inherit its refusals — point one at a real offensive task and it hedges or declines, because the base model was trained to. We went the other way and post-trained our own model for offensive security, so it does the work instead of apologising for it. It's our model, not a wrapper.Under the hood it's a multi-agent swarm: an orchestrator splits the job across subagents running in parallel, each owning a slice, then synthesises one report. That's what gets a polyglot microservice repo done in one pass.The fair objection to a model that doesn't refuse, pointed at your code: how is that not reckless? I think refusals are the wrong layer to put safety in. A model that refuses is both useless (won't do the job) and unsafe (you're trusting a probability distribution to hold a hard line). So we don't ask the model to behave — we enforce it in the harness. A runtime guard written in Go intercepts every tool call before it runs. In scan mode it hard-blocks every mutating tool and any non-read-only shell command — the model can decide whatever it wants, the guard won't let it write. In pen-test mode the same guard pins the agent's network scope to the targets you authorised; it can't reach anything else. Safety is deterministic and sits below the model, not inside it.Two modes, one CLI:- Security Scan — read-only audit of a local codebase, every finding tied to a file and line. Free, runnable today.- Pen Test — the swarm attacks systems you authorise and hands back the request it sent and the response your code gave. Gated behind written authorisation.Demo target, and I'll be straight about it: Bank of Anthos, Google's open-source reference bank. Known app, some intentionally-soft bits — which is why I picked it, so you can reproduce the run instead of trusting a screenshot. The scan found an integer overflow in the transfer path that would let you forge an account balance, plus the usual injection/auth/secrets classes.It's a closed binary (brew/curl/winget), runs locally, by Cosine. Run it behind a firewall and `tcpdump` exactly what it does before you trust it on anything real. Install is free; the scan runs on a $20 Cosine subscription; pen test is scoped per engagement.I'll be in the thread all day. The harness-vs-refusals design is the part I most want torn apart — tell me where it breaks. Comments URL: https://news.ycombinator.com/item?id=48460210 Points: 5 # Comments: 2

Read Full Article →

www.argusred.com

← Back to Latest

Show HN: We post-trained a model that pen tests instead of refusing your code

Related News