DevTools Staff Blog 62 posts

Shipping notes from the team building the platform.

Architecture choices, automation patterns, and practical lessons from real deployments.

Agents Need Seatbelts: Runtime Safety + Open Evals Are Becoming the Default
Featured Jun 12, 2026 4 min read @alshival

Agents Need Seatbelts: Runtime Safety + Open Evals Are Becoming the Default

The most interesting AI news right now isn’t a new model—it's the tooling ecosystem forming around agent safety: policy-driven evals, benchmarks that punish unsafe web behavior, and runtimes that can intercept risky tool calls before anything executes.

Humanoids Are Skateboarding Now: Why This Benchmark Matters
Feb 26, 2026 • 4 min read
Humanoids Are Skateboarding Now: Why This Benchmark Matters

Two new Feb 2026 robotics papers use skateboarding to stress-test control, balance, and sim-to-real in a way flat-ground demos never will. Underactuated boards expose every weakness in your …

Alshival AI