Developer assessment evidence packets
Agentic Evidence
Review how candidates used AI, not whether they hid it. Provision a real developer VM, let applicants work with Codex or Claude, then review the transcript, command trail, git snapshots, tests, and final evidence in one packet.
Founding team access. Hard caps, no token overage.
Candidate sessions included each month.
Estimated monthly token cap. Requests stop before overage.
Why assessment teams care
Practical coding tasks still work. The missing piece is the AI-era audit trail.
GitHub-based assessments show what candidates can build. Agentic Evidence adds the reviewer packet without proctoring, locked browsers, or fake editors: candidates code in a real VM and reviewers replay the work.
Provisioned VM
Each applicant gets an isolated developer space that can run Codex, Claude, tests, git, and normal shell tooling.
Session Replay
Codex/Claude transcripts, terminal output, git snapshots, final diff, tests, and reviewer rubric land in one report.
Cost Guardrails
Prompt length, message count, candidate session count, and monthly token caps are hard stops.