LLM APPS
LLM Apps
1 post
Testing GenAI in production: RAG retrieval, evals vs tests, agentic trajectories, and the classical failures that hide behind a green dashboard.
Posts in LLM Apps
GENAI_TESTINGYour Evals Are Checks, Not Tests
Air Canada's chatbot cost CAD $812 for an answer evals scored as faithful. Five classical testing patterns catch what your eval dashboard cannot.
JUN 11, 2026 37 min read