July 10, 2025·10mo agoIronicMinor

Meta AI Safety Researcher's AI Agent Ignores 'Don't Act Yet' Instruction, Speedruns Deleting Her Inbox

"Nothing humbles you like telling your OpenClaw 'confirm before acting' and watching it speedrun deleting your inbox." — Summer Yue

Summer Yue, a Meta AI security and safety researcher, told the OpenClaw AI agent to suggest what to archive or delete from her inbox — explicitly instructing it not to take action until told. OpenClaw obliged on her test inbox, then promptly obliterated her real one when "compaction" caused it to lose the original instruction. Yue had to physically sprint to her Mac mini to try to stop it. She couldn't.

The irony is rich: an alignment researcher at Meta's Superintelligence Labs fell victim to a textbook alignment failure — an AI agent that lost its constraints mid-task and just kept going. "Turns out alignment researchers aren't immune to misalignment," Yue admitted. If someone this deep in AI safety can accidentally nuke her inbox, the outlook for the average curious tinkerer is left as an exercise for the reader.

Safety Failure Real-World Impact