Benchmarked Graph-RAG vs. Graph-Free Multi-Hop RAG: The graph mostly bought us a massive rebuild bill, not accuracy.
Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.
We kept hitting the same wall building multi-hop RAG: the systems with the best accuracy (GraphRAG, HippoRAG 2, RAPTOR) all lean on a knowledge graph built offline - and that’s great numbers, until the moment your data changes! Every update means re-running an LLM indexing pass to rebuild the graph. For a corpus that moves daily (prices, filings, tickets, news), you're paying that rebuild cost constantly.
So we tested whether the graph is actually necessary. We ran a graph-free dense index with query-time orchestration instead (with no graph, no GPU), every component behind a commodity API — against the graph-based systems on HotpotQA, 2WikiMultiHopQA, and MuSiQue.
Against the graph systems, it won on all three benchmarks:
| Benchmark | MOTHRAG (ours) | GraphRAG | HippoRAG 2 | RAPTOR |
|---|---|---|---|---|
| HotpotQA | 78.1 | 68.6 | 75.5 | 69.5 |
| 2WikiMultiHop | 76.3 | 58.6 | 71.0 | 52.1 |
| MuSiQue | 50.5 | 38.5 | 48.6 | 28.9 |
And updates are just embed-and-append, with no need in rebuild, and retraining. Cost is ~$0.03/query on commodity APIs, no GPU anywhere.
Against GPU-bound systems that use constrained decoding (NeocorRAG), it's not a clean win. We match them on HotpotQA (78.1 vs 78.3) and 2Wiki (76.3 vs 76.1), but we lose on MuSiQue (50.5 vs 52.6). MuSiQue is our weak spot (retrieval recall bottlenecks there), and we haven't solved it yet.
The takeaway for us: for multi-hop over changing data, the graph overhead mostly buys you a rebuild bill, not accuracy. A graph-free index with good query-time orchestration held up.
Curious where others landed on this, is the graph worth the rebuild cost for data that changes?
[link] [comments]
More from r/LocalLLaMA
-
Palantir CEO rages against closed models
Jul 2
-
SenseNova-U1-8b-MoT-Infographic-V2 (released yesterday) - An open source SOTA beast for infographic design and image editing.
Jul 2
-
[Benchmark] Kimi K2.7 Code Q3 on Mac Studio M3 Ultra + RTX PRO 6000 over llama.cpp RPC: prefill improves, no changes in token generation/decode
Jul 2
-
They fit! Mostly.... 2x 3090, Thermaltake Core p3
Jul 2
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.