promptexploit

i'm feeling ★ adversarial ★

$ cat now.md

current threat model: agent trust boundaries

Updated 2026-05-29 from a warm paper terminal somewhere between red-team notes and production hardening.

writing

Turning agent security notes into practical posts with examples that avoid publishing working abuse payloads.

testing

Sketching eval harness patterns that can catch model regressions without leaking the private adversarial corpus.

defending

Designing small trust gates around tool output, retrieval results, and generated actions.

reading

Prompt injection research, browser-agent failure modes, and practical isolation designs for tool-using models.

$ next

Upcoming notes: quarantined LLM architecture, schema rejection ergonomics, and how to review tool calls without turning every workflow into a modal storm.