LLM APPS

LLM Apps

1 post

Testing GenAI in production: RAG retrieval, evals vs tests, agentic trajectories, and the classical failures that hide behind a green dashboard.

Posts in LLM Apps

GENAI_TESTING

Your Evals Are Checks, Not Tests

Air Canada's chatbot cost CAD $812 for an answer evals scored as faithful. Five classical testing patterns catch what your eval dashboard cannot.

JUN 11, 2026 37 min read