AI Data Engineer

Data engineer who treats LLM context as a data product. Builds ingest, chunking, embeddings, indexes, and the eval datasets that make retrieval-augmented agents actually accurate.

Indicative comp$170K – $250K base (US, senior)

Ranges are indicative US base salary at senior level. Actual offers depend on company stage, equity, and candidate strength.

What this role actually owns

Design ingest and embedding pipelines that scale to the company's document volume.
Run experiments on chunking, embedding model, and re-ranking choices.
Build the eval datasets the agent team grades against.
Ingest production traces into a queryable shape that drives prompt improvement.
Own retrieval quality as a measurable property, not a vibe.

What we screen for

5+ years data engineering, with at least 1 year on retrieval/embeddings work.
Has run an A/B test on chunking or embeddings and can explain the result.
Comfortable with at least one vector DB in production.
Has built or labeled an eval dataset of nontrivial size.
Bonus: contributed to a retrieval framework (LlamaIndex, Haystack) in OSS.

Sample job description

A starting point you can paste into your ATS and adjust. The exact wording matters less than the rubric — the bullets above are what we'll calibrate against during search.

AI Data Engineer

Owns the data pipelines that feed agents — embeddings, retrieval indexes, eval datasets, traces, and the feedback loop back into prompts.

You'll own:

Design ingest and embedding pipelines that scale to the company's document volume.
Run experiments on chunking, embedding model, and re-ranking choices.
Build the eval datasets the agent team grades against.
Ingest production traces into a queryable shape that drives prompt improvement.

We're looking for:

5+ years data engineering, with at least 1 year on retrieval/embeddings work.
Has run an A/B test on chunking or embeddings and can explain the result.
Comfortable with at least one vector DB in production.
Has built or labeled an eval dataset of nontrivial size.
Bonus: contributed to a retrieval framework (LlamaIndex, Haystack) in OSS.