How Pharaoh Works

Pharaoh turns codebases into queryable knowledge graphs. Three stages: parse, store, query.

1. Parse

When you install the Pharaoh GitHub App, it clones your repository using read-only installation tokens and parses every file with tree-sitterarrow-up-right — the same parser used by GitHub's syntax highlighting, Neovim, and Helix.

Tree-sitter extracts structural metadata:

  • Functions: name, signature, parameters, return type, JSDoc, complexity score, line range

  • Imports and exports: what each file imports from and exports to

  • Call chains: which functions call which other functions

  • Endpoints: HTTP routes, their methods, and handler files

  • Cron jobs: scheduled tasks and their handlers

  • Environment variables: which env vars each function uses

Currently supported: TypeScript and Python. The parser is open sourcearrow-up-right — you can audit exactly what gets extracted.

Source code is read during parsing and immediately discarded. Only structural metadata is stored.

2. Store

The extracted metadata is written to a Neo4jarrow-up-right knowledge graph as nodes and relationships:

  • Nodes: Repo, Module, File, Function, Endpoint, CronHandler, EnvVar

  • Relationships: CONTAINS, IMPORTS, EXPORTS, CALLS, DEPENDS_ON, USES_ENV

This graph captures the architecture — every connection, dependency, and entry point — without storing any source code.

What's in the graph

Function names, file paths, module boundaries, dependency edges, complexity scores, export signatures, endpoint routes, cron schedules, env var references, function body hashes (for duplication detection).

What's NOT in the graph

Source code. Function bodies. Variable values. Comments (except JSDoc). String literals. Business logic. The graph is a table of contents, not the book.

3. Query

Your AI tool connects to Pharaoh via MCParrow-up-right (Model Context Protocol) — the same protocol Claude Code, Cursor, and other AI tools use for external tool integration.

When your AI agent needs architectural context — before writing code, before refactoring, before reviewing a PR — it calls Pharaoh's MCP tools automatically. The tools query the Neo4j graph and return structured results in ~2K tokens instead of the 40K+ tokens it would take to read files manually.

The graph is queried live on every tool call. No caching across requests. The graph updates automatically via webhooks on every push to your default branch.

The pipeline

No config files. No per-repo setup. No maintenance. Install the GitHub App and the pipeline runs automatically.

Last updated