Open Source · MIT License · TypeScript

Deterministic
diagnostics
for RAG systems

RAG Doctor analyzes your retrieval-augmented generation pipelines, identifies architectural failures, and surfaces root-cause insights — without hallucination, without guesswork.

Get Started View on GitHub

$npm install -g rag-doctor

Terminal

$rag-doctor analyze trace.json

Analyzing 3 retrieved chunks across 1 query…

Running rule engine (pack: recommended)…

⚠

duplicate-chunks2 near-identical chunks detected (similarity: 0.94)

⚠

low-retrieval-scoreChunk scores below threshold (min: 0.72, found: 0.41)

⚠

context-overloadContext window at 94% capacity

Root cause: low-retrieval-score → hallucination risk

3 findings · 1 critical · 2 warnings

★Open Source

◆Deterministic

▶CLI-first

✓CI-friendly

TSTypeScript

{}JSON output

0Zero config

⚖MIT License

The Problem

RAG failures are architectural problems, not model problems

Most teams blame the language model when RAG pipelines underperform. In reality, the problem is almost always upstream — bad retrieval, poor chunking, or context window mismanagement.

RAG Doctor exists to give teams the visibility they need to diagnose and fix these issues systematically, before they reach production.

RAG systems fail silently

Hallucinations don't come with stack traces. When your retrieval step returns low-quality chunks, the LLM confidently produces plausible-sounding nonsense — and your logs show nothing unusual.

Retrieval quality is invisible

Most teams have no systematic way to measure whether retrieved chunks are actually relevant. Relevance scores are often ignored or misunderstood until something goes wrong in production.

Duplicate context pollutes answers

Near-identical chunks from overlapping data sources fill the context window with redundant information, reducing diversity and wasting tokens on content the model has already seen.

No deterministic observability

Teams resort to ad-hoc logging, manual review, or prompt engineering workarounds instead of systematic architectural analysis. Problems recur because root causes are never identified.

Capabilities

Everything you need to diagnose RAG

Purpose-built tools for the full diagnostic lifecycle — from trace ingestion to root-cause reporting.

Deterministic Rule Engine

No probabilistic scoring or model-based guessing. Every finding is the result of strict, inspectable rules with defined thresholds and conditions.

Developer-First CLI

Run diagnostics from your terminal with a single command. Integrates naturally into CI pipelines, local development, and automated testing workflows.

Structured JSON Output

Machine-readable findings and diagnosis reports. Pipe results to dashboards, alerting systems, or test assertions without any parsing.

Root Cause Diagnosis

Go beyond surface findings. The diagnosis engine maps findings to primary causes, contributing factors, and actionable recommendations.

Configurable Rule Packs

Start with the recommended pack or switch to strict mode. Override individual rule thresholds to match your pipeline's specific requirements.

Embeddable Architecture

Import the analysis engine directly into your application or testing framework. Use it as a library, not just a CLI.

First-Class Documentation

Comprehensive docs covering architecture, rules, configuration, and contributing — written for engineers who want to understand the system deeply.

Open-Source Extensibility

Write custom rules, add new diagnostic heuristics, or build adapters for your trace format. The plugin architecture is designed for real-world extension.

Pipeline

How RAG Doctor works

A clean six-stage pipeline from raw trace to actionable insights, with no black boxes and no guesswork.

Trace Input

Provide a RAG trace — JSON file, object, or programmatic input. Supports LangChain, LlamaIndex, and custom formats.

Validation & Normalization

The ingestion layer validates trace structure, normalizes field names, and extracts queryable chunk data.

Rule Engine

Each active rule evaluates the normalized trace against its threshold conditions. Rules are stateless, composable, and inspectable.

Findings

Each triggered rule produces a structured finding: rule ID, severity, evidence, and affected chunk references.

Root Cause Diagnosis

The diagnostics engine maps findings to primary causes and contributing factors using a deterministic heuristic graph.

Structured Report

Output as human-readable text or structured JSON — ready for CI assertions, developer review, or dashboard ingestion.

Trace Input

Validation & Normalization

Rule Engine

Findings

Root Cause Diagnosis

Structured Report

CLI

Analyze from the terminal

RAG Doctor ships a powerful CLI for local development and CI workflows. One command. Structured output. Actionable insights.

analyze

Run the rule engine against a trace file and surface findings

diagnose

Map findings to root causes and generate a recommendation report

--json

Output machine-readable JSON for piping into CI assertions or dashboards

--config

Load a custom rule pack and threshold configuration

Install

bash

# Install globally via npm
npm install -g rag-doctor
 
# Or use directly with npx
npx rag-doctor --help

Analyze

bash

# Analyze a trace file
rag-doctor analyze ./traces/session.json
 
# Output as structured JSON
rag-doctor analyze ./traces/session.json --json
 
# Use a custom config file
rag-doctor analyze ./traces/session.json --config .rag-doctor.json

Diagnose

bash

# Run root cause diagnosis
rag-doctor diagnose ./traces/session.json
 
# Diagnose with JSON output for CI
rag-doctor diagnose ./traces/session.json --json | jq '.diagnosis.primaryCause'

Config

.rag-doctor.json

// .rag-doctor.json
{
  "pack": "strict",
  "ruleOptions": {
    "duplicate-chunks": {
      "similarityThreshold": 0.85
    },
    "low-retrieval-score": {
      "minScore": 0.65
    },
    "oversized-chunk": {
      "maxTokens": 512
    }
  }
}

Philosophy

Why deterministic > magical

Diagnostics tools should be more trustworthy than the systems they analyze. Using another LLM to evaluate your LLM pipeline is circular — and unreliable.

Aspect

Deterministic (RAG Doctor)

LLM-based evaluation

How it works

Rule-based thresholds on trace data

Probabilistic model scoring

Output consistency

Same input → same output, always

Varies between runs and model versions

CI compatibility

Native — pass/fail is deterministic

Unreliable — thresholds shift with model updates

Debugging findings

Every finding has an inspectable reason

Black box — findings can't be verified

Cost

No API calls, no token spend

Inference cost per analysis run

Latency

Milliseconds — pure computation

Seconds — network + inference overhead

Architecture

Modular monorepo design

Each package has a single, clear responsibility. Embed only what you need, extend without forking.

Foundation

@rag-doctor/types

Shared TypeScript types and interfaces. The single source of truth for trace shapes, findings, and diagnosis structures.

Input layer

@rag-doctor/ingestion

Accepts raw trace objects, validates structure, and emits normalized internal representations.

Analysis

@rag-doctor/rules

The rule library. Each rule is a pure function: context in, finding out. Built-in packs: recommended and strict.

Orchestration

@rag-doctor/core

Wires ingestion, rules, and reporters together. The public API for embedding analysis in your own application.

Intelligence

@rag-doctor/diagnostics

Aggregates findings and applies the heuristic graph to identify primary causes and generate recommendations.

Output

@rag-doctor/reporters

Formats analysis output as structured JSON or human-readable text. Extensible for custom report formats.

CLI

rag-doctor

The command-line interface. Thin wrapper around @rag-doctor/core with argument parsing and terminal rendering.

Main entry point

Read the architecture docs

Built for contributors

Help shape the future of RAG tooling

RAG Doctor is open source and actively maintained. Whether you want to contribute rules, integrations, diagnostics, or documentation — your work directly improves the tool thousands of engineers depend on.

Custom Rules

Extend the rule engine with domain-specific analysis logic

Diagnostic Heuristics

Improve root cause mappings and recommendation quality

Trace Adapters

Add support for LangSmith, Weave, Phoenix, and more

Docs & Examples

Write guides, improve API docs, add real-world examples

View on GitHub Contributing Guide

Start diagnosing your RAG pipeline

Install in seconds. Get deterministic findings in minutes. Fix architectural problems before they reach your users.

Get Started Read the Docs View on GitHub

$npx rag-doctor analyze trace.json

Deterministicdiagnosticsfor RAG systems

RAG failures are architectural problems, not model problems

RAG systems fail silently

Retrieval quality is invisible

Duplicate context pollutes answers

No deterministic observability

Everything you need to diagnose RAG

Deterministic Rule Engine

Developer-First CLI

Structured JSON Output

Root Cause Diagnosis

Configurable Rule Packs

Embeddable Architecture

First-Class Documentation

Open-Source Extensibility

How RAG Doctor works

Trace Input

Validation & Normalization

Rule Engine

Findings

Root Cause Diagnosis

Structured Report

Analyze from the terminal

Why deterministic > magical

Modular monorepo design

@rag-doctor/types

@rag-doctor/ingestion

@rag-doctor/rules

@rag-doctor/core

@rag-doctor/diagnostics

@rag-doctor/reporters

rag-doctor

Help shape the future of RAG tooling

Start diagnosing your RAG pipeline

Deterministic
diagnostics
for RAG systems