News
Newest
Ask
Show
Jobs
Open on GitHub
Show HN: Caliper – pass@k reliability testing for Claude Code and Codex skills
(github.com)
2 points | by
edonadei
3 hours ago
1 comments
edonadei
2 hours ago
btw FYI, a really good article on evaluation, I vastly based my research and iteration from it
https://www.anthropic.com/engineering/demystifying-evals-for...
1 comments