We are looking for an AI Evaluation Scientist to design and execute evaluation processes that ensure our predictive and generative AI systems are accurate, reliable, safe, and aligned with mission requirements. This role is essential for establishing trust in AI solutions and supporting continuous improvement across the AI lifecycle. The AI Evaluation Scientist will work closely with engineers, data scientists, governance analysts, and product teams to develop evaluation metrics, build test harnesses, analyze model behavior, and support responsible deployment.
Ability to hold a position of public trust with the U.S. government.
Bachelor’s degree in Computer Science, Statistics, Machine Learning, Cognitive Science, Human-Computer Interaction, Data Science, or a related field and 5+ years of experience; OR
2+ years of experience evaluating machine learning models, NLP systems, or generative AI models (LLMs preferred).
Familiarity with evaluation metrics, statistical testing, dataset creation, and experimental design for AI systems.
Proficiency in Python and relevant libraries such as PyTorch, Hugging Face, scikit-learn, LangChain
Proficiency in AI evaluation frameworks such as Ragas
Experience analyzing structured and unstructured data, including text, documents, and embeddings.
Understanding of LLM behavior, prompt evaluation, retrieval pipelines, or RAG architectures.
Exposure to responsible AI concepts and governance-aligned evaluation criteria (e.g., fairness, transparency, reliability).
Strong analytical skills with the ability to interpret model weaknesses, extract insights, and recommend actionable improvements.
Excellent written and verbal communication skills, with the ability to present evaluation findings clearly to technical and non-technical stakeholders.
Experience working in agile or iterative development environments is a plus.
Familiarity with OWASP LLM Top 10 Risks
Relevant certifications (helpful but not required):
NIST AI RMF (AISIC)
INFORMS CAP
AWS/Azure/Google ML Certifications.
Steampunk relies on several factors to determine salary, including but not limited to geographic location, contractual requirements, education, knowledge, skills, competencies, and experience. The projected compensation range for this position is $105,000 to $145,000. The estimate displayed represents a typical annual salary range for this position. Annual salary is just one aspect of Steampunk’s total compensation package for employees. Learn more about additional Steampunk benefits here.
Identity Statement
As part of the application process, you are expected to be on camera during interviews and assessments. We reserve the right to take your picture to verify your identity and prevent fraud.
Steampunk is a Change Agent in the Federal contracting industry, bringing new thinking to clients in the Homeland, Federal Civilian, Health and DoD sectors. Through our Human-Centered delivery methodology, we are fundamentally changing the expectations our Federal clients have for true shared accountability in solving their toughest mission challenges. As an employee owned company, we focus on investing in our employees to enable them to do the greatest work of their careers – and rewarding them for outstanding contributions to our growth. If you want to learn more about our story, visit http://www.steampunk.com.
Software Powered by iCIMS
www.icims.com