Rules Reference

RAG Doctor ships four built-in diagnostic rules. Each rule is a pure, stateless function that evaluates a normalized trace and returns zero or more findings.

Rule ID	Default severity	Default threshold
`duplicate-chunks`	warning	`similarityThreshold: 0.85`
`low-retrieval-score`	error	`minScore: 0.72`
`oversized-chunk`	warning	`maxTokens: 512`
`context-overload`	warning	`maxUtilizationPct: 90`

Duplicate Chunks

duplicate-chunks

What it detects

Detects pairs of retrieved chunks whose text content is highly similar, as measured by cosine similarity on n-gram representations.

Why it matters

Duplicate or near-duplicate chunks waste context window space without adding information diversity. They can bias the LLM toward content that appears multiple times and reduce the effective breadth of retrieved context.

Default threshold

Similarity of 0.85 or higher (0.0–1.0). Two chunks are flagged if their normalized text similarity exceeds this value.

Interpreting findings

A finding typically indicates overlapping source documents, aggressive chunk splitting without deduplication, or poor retrieval diversity. Consider deduplicating your corpus or applying a post-retrieval diversity filter.

Example finding

json

{
  "rule": "duplicate-chunks",
  "severity": "warning",
  "message": "2 near-identical chunks detected (similarity: 0.94)",
  "evidence": {
    "pairs": [["chunk-002", "chunk-005"]],
    "similarity": 0.94,
    "threshold": 0.85
  }
}

Low Retrieval Score

low-retrieval-score

What it detects

Detects retrieved chunks whose relevance score falls below the configured minimum threshold. The score field is expected to be a normalized 0.0–1.0 similarity or relevance value.

Why it matters

Low-scoring chunks indicate the retrieval system pulled context that is not closely related to the query. When the LLM receives irrelevant context, it may hallucinate answers, ignore the context entirely, or generate contradictory responses.

Default threshold

Minimum score of 0.72. Any chunk with a score below this value triggers a finding. Set higher (e.g. 0.80) for precision-critical applications.

Interpreting findings

This is typically the most impactful finding. Primary causes include: embedding model mismatch with your domain, poor query formulation, inadequate index coverage, or low-quality source documents.

Example finding

json

{
  "rule": "low-retrieval-score",
  "severity": "error",
  "message": "Chunk score below minimum threshold (0.41 < 0.72)",
  "evidence": {
    "chunkId": "chunk-003",
    "score": 0.41,
    "threshold": 0.72
  }
}

Oversized Chunk

oversized-chunk

What it detects

Detects individual chunks whose token count exceeds the configured maximum. Token count is read from the chunk's tokens field or estimated from content length.

Why it matters

Very large chunks consume a disproportionate share of the context window and often contain mixed topics. Smaller, more focused chunks typically improve relevance scores and reduce irrelevant content in context.

Default threshold

Maximum of 512 tokens per chunk. Chunks exceeding this limit trigger a warning.

Interpreting findings

Oversized chunks usually indicate a chunking strategy that prioritizes completeness over precision. Consider re-chunking with smaller max sizes, or applying semantic chunking to split on topic boundaries.

Example finding

json

{
  "rule": "oversized-chunk",
  "severity": "warning",
  "message": "Chunk exceeds token limit (720 > 512)",
  "evidence": {
    "chunkId": "chunk-004",
    "tokens": 720,
    "threshold": 512
  }
}

Context Overload

context-overload

What it detects

Detects when the total token count across all retrieved chunks and query context exceeds a percentage threshold of the model's context window.

Why it matters

High context utilization leaves little room for the LLM's response and can cause truncation or degraded performance. When context is near capacity, the model may not be able to consider all provided chunks effectively.

Default threshold

90% of the context window by default. Triggers a warning when totalTokens / contextLimit > 0.90.

Interpreting findings

Reduce the number of retrieved chunks, apply stricter score filtering before context assembly, or switch to a model with a larger context window if the retrieval breadth is genuinely necessary.

Example finding

json

{
  "rule": "context-overload",
  "severity": "warning",
  "message": "Context window at 94% capacity",
  "evidence": {
    "totalTokens": 3820,
    "limit": 4096,
    "utilizationPct": 93.3,
    "threshold": 90
  }
}

Adding custom rules

You can register custom rules using registerRule() from @rag-doctor/rules. See the architecture docs for a full example.

Architecture

Diagnosis Engine