Diagnosis Engine

After the rule engine produces a set of findings, the diagnosis engine applies a deterministic heuristic graph to map those findings to their most likely root cause, identify contributing factors, and generate actionable recommendations.

How diagnosis works

The diagnosis engine reads the complete findings set from an analysis run and evaluates it against a heuristic graph that encodes relationships between rules, severity levels, and known architectural failure patterns in RAG systems.

The graph is acyclic and deterministic: given the same findings, the same diagnosis is always produced. There is no model inference, no external API call, and no randomness.

Diagnosis structure

json
1{
2 "diagnosis": {
3 "primaryCause": {
4 "id": "low-retrieval-score",
5 "label": "Poor retrieval quality",
6 "description": "...",
7 "severity": "critical"
8 },
9 "contributingCauses": [
10 {
11 "id": "duplicate-chunks",
12 "label": "Duplicate context",
13 "description": "Near-identical chunks are consuming context space..."
14 }
15 ],
16 "evidence": [
17 {
18 "rule": "low-retrieval-score",
19 "severity": "error",
20 "message": "Chunk score below minimum threshold (0.41 < 0.72)"
21 }
22 ],
23 "recommendations": [
24 "Review your embedding model — low scores often indicate embedding mismatch",
25 "Consider re-ranking retrieved chunks before passing to the LLM",
26 "Increase the retrieval score threshold to filter out low-quality results",
27 "Deduplicate your document corpus or increase chunk diversity in retrieval"
28 ],
29 "confidence": "high"
30 }
31}

Primary root cause

The primary cause is the single most impactful finding, determined by a combination of:

  • Finding severity (errors outweigh warnings)
  • Known impact on downstream LLM output quality
  • Priority weights defined in the heuristic graph

If no findings are present, the primary cause is null and the diagnosis result indicates a healthy trace.

Contributing causes

Contributing causes are secondary findings that amplify or worsen the primary issue but are not the root cause on their own. For example, duplicate-chunks combined with low-retrieval-score creates a compounding problem — the LLM receives both irrelevant and redundant context.

Recommendations

Recommendations are generated from the primary cause and contributing causes. They are ordered by expected impact and tailored to the specific combination of findings present. Recommendations are practical engineering actions, not vague suggestions.

Heuristic priority map

Primary causeTriggered whenHallucination risk
low-retrieval-scoreAny chunk score < minScoreHigh
context-overloadUtilization > 90% and no score issueMedium
duplicate-chunksDuplicate pairs found, no score/overload issueMedium
oversized-chunkOnly oversized finding presentLow

Confidence field

The diagnosis includes a confidence field with three possible values:

  • high — Multiple corroborating findings or a high-severity critical error. The root cause is well-supported.
  • medium — Single finding or mild evidence. Likely correct, but consider further investigation.
  • low — Edge case or ambiguous findings. The diagnosis is a best-effort interpretation.

Determinism guarantee

The diagnosis engine produces the same output for the same findings every time. It is safe to use in CI assertions, regression tests, and automated alerting.

Running diagnosis programmatically

diagnose.ts
1import { diagnose } from "@rag-doctor/core";
2
3const result = await diagnose(trace, { pack: "recommended" });
4
5console.log(result.diagnosis.primaryCause?.label);
6// → "Poor retrieval quality"
7
8console.log(result.diagnosis.recommendations[0]);
9// → "Review your embedding model..."
10
11if (result.diagnosis.confidence === "high") {
12 // Safe to act on this diagnosis in automated systems
13}