Graphify¶

What it is¶

Graphify is an MIT-licensed, open-source command-line tool that builds a structural knowledge graph of a code repository so that an AI coding agent can answer questions about the codebase from a compact, pre-computed index instead of re-reading source files on every prompt. The project is hosted at github.com/safishamsi/graphify and shipped on PyPI under the slightly unusual package name graphifyy (double-y, because the obvious slug was already taken). The author maintains a project landing page at graphify.net that mirrors the README.

The shape of the artifact Graphify produces is the point. It is a typed, directed multigraph in which nodes are the structural objects of source code — files, modules, classes, functions, methods, constants — and edges are the relationships between them: imports, calls, inheritance, type references, decorator applications, and module-to-module dependencies. The graph is purely topological. There are no embeddings, no vector store, no cosine similarity step. Edges encode facts that the parser actually saw in the AST, with a confidence tag of EXTRACTED, INFERRED, or AMBIGUOUS to mark the difference between a directly observed call and a heuristically resolved one.

That positioning — structural index for code, not semantic index for prose — is what makes Graphify a distinct member of the broader RAG/knowledge-graph cluster. It sits adjacent to general-purpose document-graph tools like Microsoft GraphRAG and compilation-stage knowledge layers like Pinecone Nexus, but it does a narrower thing and does it without paying for an LLM during indexing.

Installation and use¶

The recommended install path uses uv, Astral's Rust-based Python package manager:

uv tool install graphifyy
graphify install

uv tool install puts the graphify CLI on the user's PATH in an isolated environment, and the subsequent graphify install step registers Graphify as a Claude Code slash command, dropping the Skill manifest into .claude/skills/graphify/ (or the project-local equivalent) so that the agent can invoke it. Equivalent paths exist for OpenAI Codex CLI, OpenCode, Cursor, Gemini CLI, GitHub Copilot CLI, VS Code Copilot Chat, Aider, and several smaller agent frontends; the README enumerates the full list.

Inside a Claude Code session, the lifecycle is driven by the /graphify slash command:

/graphify build      # one-shot full indexing pass over the repo
/graphify update     # cache-aware incremental refresh
/graphify query "where is rate limiting applied?"
/graphify report     # regenerate the human-readable Markdown report
/graphify serve      # start the MCP server for tool-style queries

A successful build writes a small, predictable bundle of artifacts under .graphify/ at the repo root:

graph.json — the canonical serialised graph (nodes, edges, attributes, community labels, confidence tags). This is the file every other tool reads.
graph.html — a self-contained interactive visualisation; nodes are colour-coded by community, edges by relationship type. Useful for humans exploring the codebase, not consumed by the agent.
GRAPH_REPORT.md — a generated Markdown summary written in deliberately agent-friendly prose: top-level modules, communities and the files they group, "god nodes" (highest-degree symbols, usually the ones to be careful when refactoring), "surprise edges" (unexpected cross-module links worth investigating), and a per-community gloss.
cache/ — per-file parse hashes that let graphify update re-parse only what changed since the last run.
transcripts/ — captured prompt/response transcripts from the agent's interactions with the Graphify MCP server, used for both replay and audit.

When run as an MCP server (python -m graphify.serve), the tool exposes a small surface — query_graph, get_node, get_neighbors, shortest_path — that an agent can call directly rather than going through file-reading roundtrips.

Pipeline modules¶

Graphify's build step is a deterministic pipeline composed of eight modules that hand off in order. Each is a separate Python module under graphify/pipeline/, which makes it possible to run, test, or replace any stage in isolation.

detect — walks the repository and classifies each file by language using extension and shebang heuristics, with a content-sniffing fallback for ambiguous files. Files Graphify does not understand are recorded in a "skipped" manifest rather than silently dropped.
extract — invokes Tree-sitter parsers per language and walks the resulting concrete syntax trees, emitting node records (definitions) and edge records (references). Tree-sitter is GLR-based, incremental, and error-tolerant, which means partially broken files still yield partial graphs rather than failing the whole run.
validate — sanity-checks the extracted records: every edge endpoint resolves to a known node, every cross-file reference is reachable from at least one root, no node has contradictory type information. Mismatches downgrade the edge confidence to AMBIGUOUS instead of dropping it.
build — assembles the validated records into a NetworkX MultiDiGraph. NetworkX is in-process, pure-Python, and slow on huge graphs, but it is the canonical Python graph library and integrates cleanly with the next step.
cluster — runs Leiden community detection via graspologic, Microsoft's open-source graph statistics library. Communities are persisted as a node attribute and become the structural unit of the report.
analyze — derives summary statistics: degree distributions, betweenness centrality (which yields the "god nodes"), edge-type histograms, cross-community edges (the "surprise" set), and module-level dependency rollups.
report — renders GRAPH_REPORT.md from the analysis output using a Jinja-style templating layer with deliberately stable headings so that agents can grep it reliably.
export — writes graph.json, graph.html, and any auxiliary artifacts to disk. The HTML is generated via a vis-network bundle inlined into a single file.

Why it is not a vector store¶

The cleanest way to understand Graphify is to spell out what it deliberately is not. It does not embed anything. There is no encoder model in the pipeline, no embedding column on nodes, no approximate-nearest-neighbour index, no cosine similarity at query time. The decision is grounded in a claim the README makes explicitly and that matches the nature of source code: in code, calls are facts, similarity is a heuristic.

A semantic similarity search over function bodies will happily put two functions next to each other because they both manipulate strings, even when neither has ever called the other and they live in unrelated subsystems. For prose retrieval that is exactly the desired behaviour. For code understanding, where the question is usually "what actually happens if I change this?" or "where is this called from?", similarity is at best a hint and at worst actively misleading. A topological graph answers the structural question directly: an edge exists if and only if the parser observed a real reference.

This has three concrete consequences. First, Graphify has no notion of false positives from synonymy — if it claims request_handler calls validate_token, it is because Tree-sitter saw the call site. Second, the index is reproducible: the same repository at the same commit produces a byte-equivalent graph.json, which makes the artifact suitable as a checked-in build output. Third, the index is cheap: indexing is parsing plus graph analytics, with zero LLM tokens consumed. The first end-to-end pass over a medium repo takes seconds to a couple of minutes on a laptop; subsequent runs reuse the parse cache.

Tree-sitter language coverage¶

Graphify ships parsers for roughly twenty-five languages, leaning on the broad ecosystem of Tree-sitter grammars the community has built. The exact set evolves with releases, but the README lists Python, JavaScript, TypeScript (including TSX), Go, Rust, Java, Kotlin, Swift, C, C++, C#, Ruby, PHP, Scala, Elixir, Erlang, Haskell, OCaml, Clojure, Lua, Bash, SQL, HTML, CSS, and Markdown, with several others (Zig, Nix, Dart) present at varying depth.

Depth of extraction is uneven by language and is set per-grammar in the extract module. The strongest extractors are the ones whose grammars expose a clean named-children structure for the relevant constructs: Python, TypeScript, Go, Rust, and Java all yield function/class/method definitions, import statements, call sites, and inheritance edges with high fidelity. JavaScript is close behind, with the usual caveats around dynamic dispatch and string-based requires that no static parser can resolve. C and C++ extractors handle definitions and direct calls reliably but defer macros and template instantiations; those edges are emitted with AMBIGUOUS confidence. Scripting languages (Bash, Lua) and markup (HTML, CSS, Markdown) are indexed at file-and-symbol granularity only, with no call resolution, because the grammars do not give the parser enough to bind references to definitions.

When a file is in a language Graphify does not understand, the detect step falls back to a path-only node — the file appears in the graph as a leaf with no outgoing edges. That preserves directory structure in the visualisation and report without inventing relationships the parser did not see.

Leiden clustering via graspologic¶

Once the graph is built, Graphify runs the Leiden algorithm — the 2019 successor to Louvain by Traag, Waltman, and van Eck — to partition nodes into communities. The implementation it calls is in graspologic, Microsoft Research's graph statistics library, which is also what Microsoft GraphRAG uses for its hierarchical community detection.

Leiden's specific virtue, and the reason both projects converged on it, is that it guarantees well-connected communities: every output community is internally connected, which Louvain does not promise and which produces visibly worse clusters on real graphs. The algorithm optimises a modularity-style objective (typically the Constant Potts Model or modularity itself, depending on the parameters) and is robust to the resolution-limit pathologies that plague hierarchical clustering on sparse graphs.

For a code graph, what a community means in practice is a cohesive subsystem: a cluster of files and symbols that reference each other more densely than they reference the rest of the codebase. Empirically those map onto things humans would call "the auth subsystem", "the request-rate-limiting layer", "the SQL adapter", "the test fixtures". GRAPH_REPORT.md lists each community with its top files, its top exports, and a one-line gloss derived from the dominant file names. That is the layer of structure an agent can actually use: instead of "give me every file that mentions tokens", it can ask "give me the rate-limit community".

The clustering step is parameterised by a resolution that controls cluster granularity. The defaults are tuned for repositories in the small-to-medium range; on very large monorepos the resolution typically needs to be raised to avoid one giant cluster swallowing everything.

Concrete examples¶

Two worked examples appear repeatedly in the project's documentation and are useful as calibration points.

The first is httpx, a single mid-sized Python HTTP client library. On a recent build, Graphify produced 144 function-level nodes and 330 call edges, partitioned into 6 communities. The communities map cleanly onto the library's modules: a transport community (sync/async transports), a client community (the Client and AsyncClient surface), a request/response community, a config community (timeouts, limits, proxies), an auth community, and a small utilities cluster. Anyone who has worked in httpx will recognise the breakdown. The point of the example is not the absolute numbers but the shape: a few hundred edges over a few hundred functions is enough to give an agent a navigable map of a real library.

The second is the Karpathy mixed corpus — a deliberately heterogeneous bundle of around 52 files and roughly 92,000 words mixing Python, JavaScript, Markdown, and notebooks. Graphify reports 285 nodes, 340 edges, and 53 communities on that corpus, and the headline claim is a token-budget reduction of about 71.5×: a naive "dump the relevant files into the prompt" approach uses roughly 123,000 tokens per query, while feeding the agent the community-scoped graph slice uses about 1,700. That ratio is author-reported and was measured on a workload deliberately chosen to flatter the technique; the structural intuition behind it — that a graph is a much more compact representation than the source it indexes — is correct, but real reductions on production codebases are likely smaller and depend heavily on how cleanly the codebase clusters.

Comparison to Microsoft GraphRAG¶

Graphify and Microsoft GraphRAG sit in the same architectural family — extract entities and relationships, run Leiden community detection, expose communities as a retrieval surface — but they are not substitutes. GraphRAG is a document-general system: its entity extractor is an LLM prompted on prose chunks, its communities summarise documents, and its query path uses LLM-generated community reports. Indexing cost scales with the size of the corpus times the number of extraction prompts and is dominated by token spend.

Graphify is code-specific. Its entity extractor is a Tree-sitter parser, its communities cluster code modules, and its query path returns subgraphs and Markdown sections. Indexing cost is parser-bound and effectively free in tokens. The two systems can complement each other inside a single agent: Graphify for the codebase, GraphRAG for the design documents, runbooks, and incident postmortems that surround it.

The deeper difference is what each tool is willing to claim. GraphRAG produces community summaries — LLM-written paragraphs that compress what is in the community. Those summaries can hallucinate. Graphify produces community memberships and edge lists; there is nothing in its output to hallucinate, because every edge has a syntactic provenance.

Comparison to classical code-intel tooling¶

The natural reference points outside the LLM world are ctags, cscope, Language Server Protocol implementations, and Sourcegraph. Ctags and cscope produce symbol indexes; LSP implementations expose go-to-definition and find-references through a live process; Sourcegraph is a code-search product with cross-repo navigation.

Where Graphify wins is in the artifact shape. Its output is a JSON graph and a Markdown report — agent-friendly formats that drop into a prompt or an MCP call without further transformation. It produces communities, which none of the classical tools do, and which give an agent something to ground "which subsystem?" questions on. It records transcripts, which makes agent behaviour auditable. And it runs without any IDE or server: a single uv tool install and the index is on disk.

Where it loses is also the artifact shape. There is no IDE integration: nothing hovers over a symbol and queries Graphify. There is no live update beyond cache-aware re-runs; LSP servers reflect edits within a typing-latency window, Graphify does not. There is no cross-repo federation in the way Sourcegraph offers. And on the dimensions where LSP servers excel — accurate type resolution across complex generic boundaries — a static Tree-sitter walk is materially less precise than a real type-checker.

Limitations¶

Graphify is a single-snapshot index. graphify update is cache-aware and re-parses only changed files, but it does not run as a background daemon and does not watch the filesystem; the agent has to invoke it, or the user has to wire it into a pre-commit hook or CI job. On a fast-moving repository the graph can lag the source by minutes to hours, which matters most for symbols recently introduced or removed.

Language coverage is uneven. The strong languages (Python, TypeScript, Go, Rust, Java) produce dense, high-confidence graphs; the weaker ones produce sparser graphs with more AMBIGUOUS edges; some languages (verilog, fortran, niche DSLs) are not covered at all. Coverage tracks the upstream Tree-sitter grammars, so grammar quality is the binding constraint.

Very large monorepos stress the design. NetworkX is in-process and has a working-set cost roughly linear in the number of edges; on multi-million-line repos the build step takes long enough that incremental indexing becomes essential, and the resolution parameter has to be tuned to avoid degenerate clusterings. There is no sharded build path today.

Other caveats worth keeping in mind: the headline 71.5× token-reduction figure is author-reported on a workload designed to make the technique look good and should be treated as an upper bound rather than a typical outcome; dynamic dispatch, reflection, and string-based imports are out of reach of any static parser and show up as missing edges; and the project depends on graspologic and Tree-sitter being healthy upstreams.

Where it fits in an agent stack¶

The natural place for Graphify is as a pre-step. Before opening a Claude Code or Codex session on an unfamiliar repository, run graphify build; the agent then has graph.json, GRAPH_REPORT.md, and a live MCP server it can interrogate at no token cost. For long-running sessions on the same repository, schedule graphify update on a pre-commit or post-merge hook so the graph stays roughly current.

Inside a broader retrieval stack, Graphify pairs with vector RAG and document-graph RAG rather than replacing them. Vector RAG covers prose: comments, docstrings, design docs. A document-graph system like Microsoft GraphRAG, or a compilation-stage layer like Pinecone Nexus, covers cross-document reasoning over the surrounding written corpus. Graphify covers the code itself, with the assurance that every edge in its output is a fact the parser observed and not a similarity score that happened to be high.

That division of labour — semantic retrieval for prose, topological retrieval for code, compiled artifacts for everything else — is the shape the 2026 agent stack is converging on, and Graphify is currently the most legible, MIT-licensed, easy-to-run example of the code-graph slot.

Sources¶

safishamsi/graphify on GitHub — primary source repository and README.
graphify.net — project landing page mirroring the README.
graphifyy on PyPI — package distribution and version history.
Tree-sitter documentation — incremental GLR parsing library used for the extract stage.
graspologic on GitHub — Leiden community detection implementation.
Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain to Leiden: guaranteeing well-connected communities. arXiv:1810.08473.
microsoft/graphrag on GitHub — reference for the comparison to document-general graph RAG.
Universal Ctags project — reference for the comparison to classical code-intel tooling.
Language Server Protocol specification — reference for live code-intel comparison.

Changelog¶

2026-05-11 — Page created from Graphify repo + secondary references (Type B, confidence 82)