Skip to content

GraphRAG

GraphRAG is a family of retrieval-augmented generation techniques that put a knowledge graph in front of the language model rather than a flat vector index. The name is most closely associated with the Microsoft Research project of the same name, introduced in early 2024 and published in the arXiv paper From Local to Global: A Graph RAG Approach to Query-Focused Summarization (Edge et al., 2024). But “graph-RAG” is now an architectural pattern in its own right, with implementations from Neo4j, LlamaIndex, LangChain, FalkorDB, and others, and a research line that already includes second-generation variants such as DRIFT search and LazyGraphRAG. This page covers the original Microsoft system, its mechanics, why graphs help RAG in the first place, and the 2026 state of the wider ecosystem.

Why graphs in RAG at all

Classic, “baseline” RAG embeds document chunks into a vector database and retrieves the top-k chunks most semantically similar to a question. It is fast, cheap, and good enough for FAQ-style questions whose answer lies within a single passage. It breaks down in two situations that the Microsoft team highlighted in their original blog post and arXiv paper: connecting disparate facts across documents, and answering holistic, dataset-wide questions such as “what are the main themes?” (microsoft.github.io/graphrag; Edge et al., 2024). Neither has anything semantically similar to retrieve, because the question asks about structure, not content.

A knowledge graph encodes that structure explicitly. Entities (people, organisations, products, concepts, locations) become nodes; relationships become typed edges; text chunks attach to the entities they mention. A retriever can then traverse the graph, not just compare cosines. The PremAI implementation guide phrases the distinction cleanly: “vector search treats your corpus as isolated chunks. GraphRAG treats it as a connected network of facts” (Jalan, 2026).

The trade-off, which every honest write-up keeps coming back to, is cost: building that graph requires many LLM calls during indexing, and that cost has driven most of the post-2024 evolution of the field.

The Microsoft GraphRAG architecture

Microsoft’s reference implementation, published as the open-source microsoft/graphrag package (33k+ GitHub stars, currently at v3.0.9 as of early 2026), follows a deliberately staged pipeline. The official docs describe it as a “structured, hierarchical approach to RAG, as opposed to naive semantic-search approaches using plain text snippets” (microsoft.github.io/graphrag).

Indexing phase

  1. TextUnit chunking. Source documents are sliced into small, analyzable chunks called TextUnits. Microsoft’s research found that small chunks (around 600 tokens) yield more entities per LLM call than large chunks, at the cost of more coreference loss (Stackviv, 2026; microsoft.github.io/graphrag).
  2. Entity and relationship extraction. An LLM is prompted on each TextUnit to extract entities (with descriptions and types) and the relationships between them. A single chunk might emit triples like Jane Smith — authored — Q3 Report or Acme Corp — acquired — Beta Inc. (Jalan, 2026).
  3. Entity summarization / coreference merging. The same entity often appears across many chunks under slightly different names (“Jane Smith”, “J. Smith”, “the CFO”). The LLM merges these into one node with a unified description.
  4. Optional claim extraction. A separate LLM pass extracts factual claims attached to entities (extract_claims in the YAML, disabled by default) (yaml.md).
  5. Hierarchical community detection. The Leiden algorithm — a successor to Louvain that guarantees well-connected communities — is run recursively on the entity graph. Level 0 is the finest-grained cluster, with each subsequent level merging children into broader topical clusters (microsoft.github.io/graphrag; Stackviv, 2026).
  6. Community report generation. For every community at every level, the LLM writes a structured community report: title, summary, key entities, key relationships, findings. These pre-summarised reports are what make global queries cheap at query time.
  7. Embedding. Entity descriptions, text units, and community reports are embedded and written to a vector store (LanceDB by default, Azure AI Search or Cosmos DB in production) (yaml.md).

The output is a set of Parquet tables plus a vector index: entities, relationships, text_units, communities, community_reports, documents. Everything downstream queries these tables.

Query phase

Microsoft ships four query modes (microsoft.github.io/graphrag):

  • Local Search — for entity-anchored questions. The system identifies entities in the question, finds them in the graph via vector similarity on entity descriptions, fans out to their neighbours, and pulls in associated text units and community reports.
  • Global Search — for holistic/sense-making questions. A map-reduce over community reports: each shuffled batch of reports yields a rated intermediate response, then a reducer LLM synthesises a final answer (Global Search docs).
  • DRIFT Search — Dynamic Reasoning and Inference with Flexible Traversal. A hybrid mode introduced in October 2024 that uses community reports as a primer to generate follow-up questions, then runs local search to refine them, building a tree of intermediate answers ranked by confidence (DRIFT docs).
  • Basic Search — plain top-k vector search for cases where graph context is not needed.

Example settings.yaml

A minimal settings.yaml for the Microsoft package looks roughly like this (extracted from the config reference):

models:
  default_completion_model:
    model_provider: openai
    model: gpt-4.1
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}
  default_embedding_model:
    model_provider: openai
    model: text-embedding-3-large
    auth_method: api_key
    api_key: ${GRAPHRAG_API_KEY}

input:
  type: text
  base_dir: input
  file_pattern: ".*\\.txt$"

chunks:
  type: tokens
  size: 1200
  overlap: 100

extract_graph:
  completion_model_id: default_completion_model
  entity_types: [organization, person, geo, event]
  max_gleanings: 1

summarize_descriptions:
  completion_model_id: default_completion_model
  max_length: 500

extract_claims:
  enabled: false        # opt-in; prompts need tuning per domain

cluster_graph:
  max_cluster_size: 10
  use_lcc: true
  seed: 0xDEADBEEF

community_reports:
  completion_model_id: default_completion_model
  max_length: 2000
  max_input_length: 8000

vector_store:
  default:
    type: lancedb
    db_uri: output/lancedb

local_search:
  top_k_entities: 10
  top_k_relationships: 10
  max_context_tokens: 12000

global_search:
  map_max_length: 1000
  reduce_max_length: 2000
  data_max_tokens: 12000

The entity_types list and the summarize_descriptions, community_reports, and extract_claims prompts are the main levers when tuning GraphRAG for a new domain. Microsoft strongly recommends running the Prompt Tuning flow rather than using defaults, because the quality of the extracted graph dominates everything downstream (microsoft/graphrag).

Example global-search prompt shape

The global-search reducer prompt, simplified, takes a batch of community reports and the user question and asks the model to produce a rated intermediate response in a strict JSON shape — something like:

---SYSTEM---
You are a helpful assistant responding to questions about data in tables.
---
Generate a response of the target length that addresses the user question,
summarizing all reports in the provided data. If you don't know the answer
or if the input does not contain sufficient information, say so. Do not
make anything up. Each response point should include a relevance score
in {0..100}.
---DATA---
{community_reports}
---USER---
{user_question}

The map step runs this over shuffled batches; the reduce step takes the highest-rated points and writes the final answer (microsoft.github.io/graphrag).

When graph context actually helps

The original arXiv paper benchmarks GraphRAG against a vector-RAG baseline on million-token corpora using head-to-head LLM-judged metrics for comprehensiveness, diversity, and empowerment, and reports substantial wins for global, sense-making questions (Edge et al., 2024). Independent guides converge on a similar verdict (Jalan, 2026; Stackviv, 2026; Meilisearch, 2025):

GraphRAG wins when:

  • Multi-hop reasoning. “Which suppliers share components with our delayed orders?” needs a join across suppliers, components, and orders. Vector similarity returns chunks mentioning those words; only graph traversal returns the chain.
  • Global / sense-making queries. “What are the main themes?”, “Who are the dominant actors?”, “What changed between v2 and v3?” — community summaries are pre-built for exactly this.
  • Entity-dense corpora. Contracts that reference contracts, regulations that link to other regulations, codebases with import graphs, biomedical papers with gene/protein networks.
  • Ambiguous terminology. A graph disambiguates “Mercury” (planet vs element vs brand) through its neighbours.
  • Provenance / citation requirements. Each answer cell traces back to specific entity IDs and source TextUnits.

Vector RAG wins when:

  • The question is a single-passage lookup (“what is X’s phone number?”).
  • The corpus is FAQ-style or self-contained per document.
  • The budget for indexing is tight and the corpus is large or changes frequently.
  • The team has no LLM budget to spend on offline preprocessing.

A finance-domain study by BNP Paribas and Neo4j (Barry et al., ACL GenAI-K 2025) reports that graph-enabled approaches (FactRAG and HybridRAG) achieve a 6% reduction in hallucinations and an 80% reduction in token usage versus conventional RAG on the FinanceBench benchmark, especially on regulatory documents where entity relationships matter. Lettria and AWS benchmarks cited in PremAI’s 2026 guide report up to 35% accuracy gains over vector-only retrieval on complex documents, with FalkorDB pushing that toward 90%+ on schema-heavy enterprise queries (these specific numbers are vendor-flavoured and worth treating as upper bounds rather than median outcomes).

Variants and successors

The original Project GraphRAG timeline shows how rapidly the system evolved through 2024–26:

  • April 2024 — original arXiv paper and code drop.
  • July 2024microsoft/graphrag goes public on GitHub.
  • September 2024 — GraphRAG auto-tuning for new domains.
  • October 2024DRIFT search, developed with the Uncharted research group.
  • November 2024 — dynamic community selection for cheaper global search.
  • November 2024LazyGraphRAG.
  • December 2024 — GraphRAG 1.0 (ergonomic and packaging cleanup).
  • March 2025Claimify, high-quality claim extraction.
  • June 2025BenchmarkQED, automated RAG benchmarking, plus LazyGraphRAG’s integration into Microsoft Discovery and Azure Local services as public preview.
  • August 2025VeriTrail, hallucination tracing across multi-step workflows.
  • 2026 — the v3 codebase (current v3.0.9) consolidates these features into a single modular pipeline.

LazyGraphRAG

The most consequential follow-up is LazyGraphRAG (Edge, Trinh, Larson; Nov 25 2024). It removes the expensive LLM-driven indexing and replaces it with cheap NLP noun-phrase extraction, then defers all LLM use to query time:

  • Index — NLP noun-phrase extraction builds a concept co-occurrence graph; graph statistics produce a hierarchical community structure. No LLM calls.
  • Refine query — at query time, an LLM splits the user question into 3–5 sub-queries and expands each with concepts from the graph.
  • Match query — a per-sub-query iterative-deepening search combines best-first (vector ranking of chunks) and breadth-first (relevance-rated community expansion) until a relevance test budget is exhausted.
  • Map / reduce — only the chunks that survived the relevance test are sent to the LLM, which extracts subquery-relevant claims and produces the final answer.

The headline numbers are striking: “LazyGraphRAG data indexing costs are identical to vector RAG and 0.1% of the costs of full GraphRAG… For comparable query costs to vector RAG, LazyGraphRAG outperforms all competing methods on local queries… The same configuration also shows comparable answer quality to GraphRAG Global Search for global queries, but more than 700 times lower query cost” (Microsoft Research blog, Nov 25 2024). A June 6 2025 update confirms LazyGraphRAG has been integrated into Microsoft Discovery and is available as public preview on Azure Local.

The authors are explicit that full GraphRAG is not obsolete: the pre-summarised entity / relationship / community reports have value outside Q&A (as reports humans can read and share), and the lazy variant cannot answer truly cross-cutting questions as comprehensively at very low budgets.

DRIFT (Dynamic Reasoning and Inference with Flexible Traversal) is a query-time improvement to local search. Instead of just retrieving an entity neighbourhood, it (blog, docs):

  1. Primes with the top-K most relevant community reports to write a broad first-pass answer and a set of follow-up questions.
  2. Recursively runs local search on each follow-up, producing intermediate answers and new sub-questions, while a confidence glyph on each node decides whether to keep expanding.
  3. Outputs a hierarchical Q&A tree which is then collapsed into a final answer.

Microsoft’s comparison on AP News articles shows DRIFT surfacing details (supply-chain provenance, brand-level impact, FDA contamination ratios) that pure local search misses (DRIFT blog).

Dynamic community selection

A November 2024 enhancement narrows global search to only the most relevant community reports per level using an LLM-rated relevance score (dynamic_search_threshold in the YAML), trading a small amount of recall for a dramatic reduction in tokens for global queries (yaml.md).

Other lineages

Outside the Microsoft family, several flavours of graph-RAG matter in 2026:

  • HybridRAG / FactRAG — the BNP Paribas + Neo4j approach that blends a hand-curated KG with vector retrieval for regulatory and financial documents (Barry et al., 2025).
  • Neo4j GraphRAG — a property-graph-centric stack where the KG is a first-class Neo4j database, not a derived artefact (neo4j-graphrag-python).
  • LlamaIndex PropertyGraphIndex — a flexible KG index that supports multiple extractors (LLM, schema-based, ImplicitPathExtractor) and storage backends (LlamaIndex docs).
  • LightRAG, GraphReader, ToG (Think-on-Graph) — academic variants that emphasise lighter graphs, agentic traversal, or path-based reasoning rather than community summaries.

Open-source ecosystem

Four packages dominate practical 2026 deployments.

Microsoft graphrag

The reference implementation. Strengths: full pipeline out of the box, well-tested community summarisation, four query modes, LiteLLM backend (so the same config works with OpenAI, Azure OpenAI, Anthropic, and local models). Weaknesses: indexing-heavy, schema-light (the LLM picks entity types unless constrained), and the README explicitly warns “GraphRAG indexing can be an expensive operation… start small” (microsoft/graphrag). Use it when you need Microsoft’s exact algorithm, especially for global queries.

Neo4j neo4j-graphrag-python

The official Neo4j package (v1.16 as of May 2026) puts a property graph at the centre and exposes a family of retrievers — VectorRetriever, VectorCypherRetriever, HybridRetriever, HybridCypherRetriever, Text2CypherRetriever, ToolsRetriever — each combining vector search with Cypher traversal differently. A KnowledgeGraphBuilder handles entity extraction with optional schema constraints, and Neo4j’s recent Cypher 25 SEARCH clause is now first-class for vector retrievers.

A typical VectorCypherRetriever (do vector search to find seed nodes, then traverse with Cypher to enrich context) looks like:

import neo4j
from neo4j_graphrag.embeddings.openai import OpenAIEmbeddings
from neo4j_graphrag.retrievers import VectorCypherRetriever

URI = "neo4j+s://demo.neo4jlabs.com"
AUTH = ("recommendations", "recommendations")
INDEX_NAME = "moviePlotsEmbedding"

# After vector search seeds 'node' to a Movie, the Cypher traversal
# pulls the cast as additional structured context.
RETRIEVAL_QUERY = """
RETURN node.title       AS movieTitle,
       node.plot        AS moviePlot,
       collect {
         MATCH (actor:Actor)-[:ACTED_IN]->(node)
         RETURN actor.name
       }                AS actors,
       score            AS similarityScore
"""

with neo4j.GraphDatabase.driver(URI, auth=AUTH) as driver:
    retriever = VectorCypherRetriever(
        driver=driver,
        index_name=INDEX_NAME,
        embedder=OpenAIEmbeddings(),
        retrieval_query=RETRIEVAL_QUERY,
    )
    print(retriever.search(query_text="Who were the actors in Avatar?", top_k=5))

The pattern is general: vector search gives you a candidate set of nodes; Cypher traversal lets you reach exactly the structural context you want, instead of hoping the cosine metric picks it up (Neo4j example).

LlamaIndex and LangChain

LlamaIndex offers two relevant abstractions (LlamaIndex KG RAG docs):

  • KnowledgeGraphIndex — build a triple-store KG from documents using an LLM extractor.
  • KnowledgeGraphRAGQueryEngine / PropertyGraphIndex — query an existing KG by extracting entities from the question, retrieving their subgraphs, and synthesising an answer.
  • KnowledgeGraphQueryEngine (Text2Cypher / NL2GraphQuery) — translate natural language directly into a Cypher / nGQL query.

LangChain has a community-maintained langchain-graphrag package that re-implements Microsoft’s indexing and query flow on top of LangChain primitives, with explicit support for non-OpenAI LLMs and embedding models and a stronger focus on modularity. It is a good entry point if your stack is already LangChain-shaped and you do not want to pin to Microsoft’s LiteLLM-based runtime.

Vendor graph DBs and adjacent tools

  • FalkorDB — a Redis-based property graph optimised for sub-millisecond traversal, marketed specifically for GraphRAG workloads.
  • TigerGraph, Memgraph, Kuzu, Nebula — graph databases with first-party GraphRAG integrations.
  • BenchmarkQED — Microsoft’s open-source benchmarking suite for RAG, with synthetic local/global query generation aligned with the GraphRAG metrics.
  • VeriTrail and Claimify — hallucination-tracing and claim-quality tooling Microsoft built on top of the GraphRAG data model.

Evaluation evidence

The evidence base is healthy but still LLM-judged for most metrics, which is worth keeping in mind:

  • The original arXiv paper uses pairwise LLM judgement on comprehensiveness, diversity, and empowerment across global sense-making questions over million-token podcast transcripts and news corpora, with GraphRAG winning the majority of head-to-heads against a 16k-token semantic-search baseline (Edge et al., 2024).
  • The LazyGraphRAG study uses 100 synthetic queries (50 local, 50 global) over 5,590 AP News articles. At a relevance test budget of 500 — 4% of the cost of GraphRAG Global at community level 2 — LazyGraphRAG significantly outperforms all 8 competing conditions, including DRIFT and 64k-token semantic search (Microsoft Research, Nov 2024).
  • BenchmarkQED, published June 2025, formalises this evaluation methodology and is now the default benchmark for new GraphRAG variants (Project GraphRAG).
  • The BNP Paribas / Neo4j paper provides one of the few independent peer-reviewed results: 6% hallucination reduction and 80% token reduction on FinanceBench (Barry et al., 2025).
  • Third-party vendor benchmarks (Lettria, AWS, FalkorDB cited in PremAI 2026) report 35% to 90%+ accuracy gains, but these are vendor-funded studies on hand-picked schema-heavy corpora and should be treated as upper bounds.

The weak point of the evidence base is that LLM-as-judge metrics correlate with human preference but are not identical to it, and the most commonly used evaluator is the same family of model that generated the graph. BenchmarkQED is the right tool to reach for in 2026 if rigour matters.

Limitations

The Responsible AI FAQ in microsoft/graphrag acknowledges most of these, and the wider literature converges on the same list.

  • Indexing cost. Building the graph requires N + M + K LLM calls (chunks + entity-pair summaries + community reports). On a 1 M-token corpus this is easily a multi-hundred-dollar bill. The Microsoft README warns about this in capital letters. LazyGraphRAG is the explicit response.
  • Freshness and incremental updates. Adding a new document touches many entities and may shift community boundaries. GraphRAG 2.x added incremental indexing with a secondary update_output_storage, but full re-indexing is still the recommended path for major content shifts (yaml.md).
  • Schema rigidity. The default entity-type list (organization, person, geo, event) is generic. Domain-specific corpora need a tuned entity_types, tuned prompts, and ideally a constrained schema; without that, the graph drifts.
  • Hallucinations in summaries. Community reports are LLM-generated narratives over LLM-extracted facts. Errors compound. Microsoft has invested heavily in mitigations: Claimify for claim extraction quality, VeriTrail for tracing provenance across multi-step generation. They reduce, not eliminate, the problem.
  • Coreference and entity resolution. Subtle name variants and pronouns are routinely mis-merged or under-merged. The max_gleanings parameter controls how aggressively the extractor revisits a chunk, at linear cost.
  • Cost of LLM-as-judge benchmarking. Tuning a deployment well requires hundreds of side-by-side judgements, which itself is an LLM bill.
  • Multi-tenant and access-control complexity. Mixing entities from documents with different ACLs into the same community summary risks leaks. Production deployments typically partition graphs per tenant or per access scope, which gives up some of the global-query benefit.

2026 state of the art

The picture in mid-2026 is:

  • Full GraphRAG is mature, not dominant. It is the right tool when graphs have intrinsic value (reports humans read, regulated provenance, multi-tenant KGs), and when you can afford the indexing run.
  • LazyGraphRAG is the default for new “graph-flavoured” RAG. Vector-RAG-level indexing cost with materially better local and global answers, integrated into Microsoft Discovery and Azure Local. Most new pilots start here unless they have a specific reason to do otherwise (LazyGraphRAG blog, 2024 + Jun 2025 update).
  • Hybrid graph + vector is mainstream. Neo4j’s HybridRetriever, FalkorDB’s combined indexes, and Microsoft’s Basic Search mode all assume both modalities live side by side. The 2026 question is rarely “graph or vector?” but “which retriever for which question?”.
  • Schema-guided extraction is winning over open extraction. Constrained entity types and relationship schemas produce cleaner graphs at lower cost. LlamaIndex’s PropertyGraphIndex and Neo4j’s SchemaBuilder reflect this shift.
  • Provenance tooling is finally serious. VeriTrail-style tracing across the indexing + retrieval + generation chain is becoming table stakes for regulated industries.
  • Long-context models compress the bottom of the market. Models with 1–2 M-token context windows have eaten the simplest “just stuff everything in” use cases, but they have not displaced GraphRAG for million-document or sense-making workloads, where the structure of the corpus is itself the answer.
  • Deprecated / archived. The original “GraphRAG Solution Accelerator” on Azure is archived; Microsoft now routes production usage through Discovery and Azure Local. The 1.x configuration format is gone; v2/v3 require re-running graphrag init between minor versions.

The net effect is that “GraphRAG” as a term increasingly means the architectural pattern — entity-and-relationship extraction plus structured retrieval — rather than the specific Microsoft pipeline. The reference implementation is still the easiest way to think about the moving parts, but a 2026 production system is more likely to be a LazyGraphRAG-style lazy index on a Neo4j or FalkorDB property graph, with hybrid vector + Cypher retrieval and BenchmarkQED-style evaluation gating changes.

When to reach for GraphRAG

A pragmatic checklist:

  1. Are the most valuable questions multi-hop or holistic? If yes, graph-RAG is likely worth it.
  2. Is the corpus stable enough that an indexing run amortises? If yes, full GraphRAG; if no, LazyGraphRAG.
  3. Do you already have, or want, an explicit KG as an asset (reports, browsing, audit)? If yes, Microsoft GraphRAG or Neo4j GraphRAG.
  4. Do you need first-class graph queries from end users (Text2Cypher, schema queries)? If yes, Neo4j GraphRAG or LlamaIndex KnowledgeGraphQueryEngine.
  5. Are you cost-bound and corpus is small or simple? Stay on vector RAG.
  6. Do you need explainable provenance and per-claim citation? GraphRAG plus VeriTrail / Claimify.

For most teams in 2026, the right first move is to prototype with LazyGraphRAG on a representative slice and only graduate to full GraphRAG when global-query quality plateaus and the indexing budget is justified.

Sources

Changelog

  • 2026-05-11 — Page created from arXiv paper + Microsoft docs + ecosystem sources (Type A/B, confidence 90)