promptexploit
i'm feeling
★ adversarial ★
$ cat now.md
current threat model: agent trust boundaries
Updated 2026-05-29 from a warm paper terminal somewhere between red-team notes and production hardening.
writing
Turning agent security notes into practical posts with examples that avoid publishing working abuse payloads.
testing
Sketching eval harness patterns that can catch model regressions without leaking the private adversarial corpus.
defending
Designing small trust gates around tool output, retrieval results, and generated actions.
reading
Prompt injection research, browser-agent failure modes, and practical isolation designs for tool-using models.
$ next
Upcoming notes: quarantined LLM architecture, schema rejection ergonomics, and how to review tool calls without turning every workflow into a modal storm.