Agentic AI Engineer

The generalist of the agentic stack. Owns the full path from prompt to production: tool design, retry logic, eval harnesses, latency budgets, cost ceilings. Equally fluent in API contracts and prompt design.

Indicative comp$180K – $260K base (US, senior)

Ranges are indicative US base salary at senior level. Actual offers depend on company stage, equity, and candidate strength.

What this role actually owns

Design and ship multi-step agent loops with tool use, memory, and graceful failure modes.
Own evals — define golden sets, write graders, run regression checks before every release.
Tune system prompts against measurable rubrics, not vibes.
Own the cost and latency profile of agent runs in production.
Partner with PM and design on what the agent surfaces, hides, and asks for.

What we screen for

5+ years software engineering, 1+ year shipping production LLM features.
Has run an agent in production — not just notebooks. Can talk concretely about a failure mode they fixed.
Comfortable with streaming, tool use, and structured output APIs.
Strong opinions about evals; can describe a grader they wrote.
Bonus: open-source contributions to LangGraph, Inspect, BAML, or similar.

Sample job description

A starting point you can paste into your ATS and adjust. The exact wording matters less than the rubric — the bullets above are what we'll calibrate against during search.

Agentic AI Engineer

Builds production agents end-to-end — tool use, memory, evals, and the unglamorous reliability work in between.

You'll own:

Design and ship multi-step agent loops with tool use, memory, and graceful failure modes.
Own evals — define golden sets, write graders, run regression checks before every release.
Tune system prompts against measurable rubrics, not vibes.
Own the cost and latency profile of agent runs in production.

We're looking for:

5+ years software engineering, 1+ year shipping production LLM features.
Has run an agent in production — not just notebooks. Can talk concretely about a failure mode they fixed.
Comfortable with streaming, tool use, and structured output APIs.
Strong opinions about evals; can describe a grader they wrote.
Bonus: open-source contributions to LangGraph, Inspect, BAML, or similar.