LLM APPS

LLM Apps

1 post

Testing GenAI in production: RAG retrieval, evals vs tests, agentic trajectories, and the classical failures that hide behind a green dashboard.

Air Canada's chatbot cost CAD $812 for an answer evals scored as faithful. Five classical testing patterns catch what your eval dashboard cannot.

Posts in LLM Apps