v1.0 · Open Source · MIT License

RAG Doctor

A deterministic diagnostic CLI for analyzing RAG pipelines, identifying architectural failures, and surfacing root-cause insights.

Quick install

npm install -g rag-doctor then run rag-doctor analyze trace.json to get started.

What is RAG Doctor?

RAG Doctor is an open-source analysis tool for Retrieval-Augmented Generation (RAG) systems. It accepts trace files that describe a RAG pipeline execution — retrieved chunks, relevance scores, token counts, query context — and runs a rule-based engine against them to detect architectural problems.

Unlike evaluation tools that rely on language models to score output quality, RAG Doctor is fully deterministic. Every finding is the result of an inspectable rule with a defined threshold. Same input always produces the same output.

Who is it for?

  • Application engineers building RAG systems with LangChain, LlamaIndex, or custom pipelines who want to diagnose failures without manual log diving.
  • ML engineers designing chunking and retrieval strategies who need objective feedback on architectural decisions.
  • DevOps/platform teams who want to add RAG quality gates to CI pipelines.
  • Open-source contributors who want to extend the rule engine or add support for new trace formats.

What problems does it solve?

RAG systems fail in ways that are hard to observe from the outside. The language model confidently produces output even when retrieval goes wrong. RAG Doctor helps by:

  • Detecting low retrieval scores that indicate the system is pulling irrelevant context.
  • Identifying duplicate or near-identical chunks that pollute the context window with redundant information.
  • Flagging oversized chunks that consume excessive context tokens without proportional information value.
  • Warning about context overload when the total token count approaches the model's context limit.
  • Mapping findings to a root cause and providing concrete recommendations.

Design principle

RAG Doctor does not call any external APIs, does not require network access, and does not use a language model to produce its output. All analysis is local, deterministic, and fast.

Quick links