# Pharaoh — Complete Documentation

> Pharaoh is an MCP-native codebase intelligence platform. It parses repositories into Neo4j knowledge graphs and exposes structural insights — blast radius, reachability, dead code, dependency analysis, regression risk — through Model Context Protocol (MCP) tools. AI coding agents get 2K tokens of architectural context instead of 40K tokens of blind file exploration.

## The Problem Pharaoh Solves

AI coding tools (Claude Code, Cursor, Windsurf, GitHub Copilot) read codebases file-by-file with no structural understanding. This causes:

- **Duplicate code**: The AI creates a new utility function because it didn't know one already exists three modules away
- **Breaking changes**: The AI refactors a function without knowing 14 downstream callers depend on it
- **Orphaned code**: The AI writes new endpoints or functions that are never wired to any production entry point
- **Blind PRDs**: Planning documents written without awareness of existing architecture, debt, or module boundaries
- **Wasted context**: 40K+ tokens spent reading files to learn what a 2K-token graph query could answer instantly

Pharaoh eliminates these problems by parsing the entire codebase into a knowledge graph and exposing it through MCP tools that any AI coding agent can call.

## How Pharaoh Works

1. **Connect GitHub**: Install the Pharaoh GitHub App on your repositories (read-only access)
2. **Automatic parsing**: Pharaoh's Cartographer uses tree-sitter to parse your codebase — files, functions, imports, exports, call chains, DB access, HTTP endpoints, cron handlers, environment variables
3. **Knowledge graph**: All structural relationships are stored in a Neo4j graph database with tenant isolation
4. **MCP endpoint**: You receive a unique SSE endpoint URL to add to your AI tool's MCP configuration
5. **AI queries automatically**: When you work with Claude, Cursor, or any MCP-compatible client, it can silently query Pharaoh for structural context before writing code

Parsing takes approximately 5 minutes for a typical TypeScript/Python codebase. The graph re-indexes automatically on push to the default branch via GitHub webhook.

## Languages Supported {#languages}

- TypeScript (full support — imports, exports, call chains, decorators, barrel files)
- Python (full support — imports, exports, call chains, decorators)
- More languages planned (tree-sitter supports 100+ languages; Pharaoh's Cartographer adds semantic understanding layer by layer)

## Infrastructure {#infrastructure}

- **Parsing**: tree-sitter (deterministic, no LLM hallucination risk)
- **Graph storage**: Neo4j Aura Professional (tenant-isolated databases)
- **Protocol**: Model Context Protocol (MCP) via SSE transport
- **Hosting**: Render (web service)
- **GitHub integration**: GitHub App with read-only repository access + push webhook for auto-refresh

## Getting Started

### Step 1: Sign up
Visit https://pharaoh.so and start your beta access.

### Step 2: Connect GitHub
Install the Pharaoh GitHub App and select which repositories to map. Pharaoh only requires read access to repository contents. No write access ever.

### Step 3: Add MCP endpoint
Add your unique Pharaoh endpoint to your AI tool's MCP configuration:

**Claude Desktop** (claude_desktop_config.json):
```json
{
  "mcpServers": {
    "pharaoh": {
      "url": "https://pharaoh-mcp.onrender.com/sse?token=YOUR_TOKEN"
    }
  }
}
```

**Claude Code** (.claude/settings.json):
```json
{
  "mcpServers": {
    "pharaoh": {
      "url": "https://pharaoh-mcp.onrender.com/sse?token=YOUR_TOKEN"
    }
  }
}
```

**Cursor**: Add via Settings > MCP Servers with the same URL.

### Step 4: Start building
Your AI tool now silently queries Pharaoh when it needs structural context. Ask it to refactor a module, write a PRD, or find dead code — it will check the knowledge graph first.

---

## MCP Tool Reference

Pharaoh exposes 13 analysis tools and 3 operational tools via MCP. Each tool is designed for a specific moment in the AI coding workflow.

### get_codebase_map {#codebase-map}

**Orient yourself in an unfamiliar codebase. Call this FIRST when starting work on a repo.**


Returns all modules with file counts and lines of code, the dependency graph with weights and bidirectional warnings, hot files (most changed in last 90 days), and all HTTP endpoints with their handler files.

Use when:
- Starting a new task and need to understand codebase structure
- Need to know which modules exist and how they relate
- Want to find the most actively changed files (likely where bugs live)
- Need to see all API endpoints at a glance

Parameters:
- `repo` (required): Repository name
- `include_metrics` (optional): Include LOC, complexity, change frequency

Why not just grep/read files: Manually reading directory trees gives file structure but not dependency relationships, change frequency, or endpoint mappings. This gives the full architectural picture in one call instead of 20+ file reads.

### get_module_context {#module-context}

**Get everything about a module BEFORE modifying it or writing a PRD.**


Returns the complete module profile in ~2K tokens: file count, LOC, all exported function signatures with complexity, dependency graph (imports from + imported by), DB table access, HTTP endpoints, cron jobs, env vars, vision spec alignment, and external callers from other modules.

Use when:
- About to change code in a module
- Writing a PRD or design doc and need ground-truth about what exists
- Need to know what depends on this module (who breaks if you change it)
- Want to see a module's DB tables, endpoints, cron jobs, or env vars at a glance

Parameters:
- `repo` (required): Repository name
- `module` (required): Module name (e.g., "crons", "slack", "db")

Why not just grep/read files: A module can span dozens of files. Manual exploration burns 10K-40K tokens and still misses cross-module callers, DB access patterns, and vision spec alignment.

### search_functions {#function-search}

**Check if functionality already exists BEFORE writing any new function.**


Searches all functions across the entire codebase by name or partial match. Returns matching functions with file paths, line numbers, module, export status, async flag, complexity scores, and full signatures.

Use when:
- About to create a new function — search first to prevent duplicates
- Need to find where a concept is implemented (e.g., "notify", "validate", "parse")
- Looking for the right function to import instead of reimplementing
- A task says "add X functionality" — verify X doesn't already exist

Parameters:
- `repo` (required): Repository name
- `query` (required): Function name or partial match
- `module` (optional): Filter to a specific module
- `exported_only` (optional): Only show exported functions
- `limit` (optional): Max results (default: 20, max: 50)

Why not just grep/read files: Grep only finds exact string matches and misses re-exports, aliases, and barrel-file indirection. This searches the full resolved dependency graph.

### get_blast_radius {#blast-radius}

**Check what breaks BEFORE refactoring, renaming, or deleting a function, file, or module.**


Returns risk assessment (LOW/MEDIUM/HIGH), all affected callers grouped by module with file paths, impacted HTTP endpoints, impacted cron jobs, and affected DB operations. Traces up to 5 hops deep through the call graph.

Use when:
- About to refactor or rename a function — see every caller that needs updating
- Want to know if a change is safe — check if anything depends on this code
- A PR modifies a shared utility — trace all downstream consumers
- Need to assess risk level before a change

Parameters:
- `repo` (required): Repository name
- `entity` (required): Function name, file path, or module name
- `entity_type` (required): "function", "file", or "module"
- `depth` (optional): How many hops to trace (default: 3, max: 5)

Why not just grep/read files: Grep finds direct callers but misses indirect callers 2-3 hops away. You won't see affected endpoints or cron jobs. This traces the full transitive dependency chain.

### query_dependencies {#dependencies}

**Trace how two modules are connected BEFORE splitting, merging, or decoupling them.**


Returns forward and reverse dependency paths between two modules, circular dependency detection with warnings, and all shared dependencies (modules both depend on).

Use when:
- Refactoring and need to know if two modules depend on each other
- Suspect a circular dependency and want to confirm it
- Planning to extract shared code and need to see what both modules use
- Need to understand why changing module A affects module B

Parameters:
- `repo` (required): Repository name
- `from` (required): Source module name
- `to` (required): Target module name

Why not just grep/read files: Import statements show direct dependencies but miss transitive paths (A→B→C→D). This traces the full module graph and reveals indirect connections and circular dependencies invisible from file-level inspection.

### check_reachability {#reachability}

**Verify functions are reachable from production entry points (API endpoints, CLI commands, cron jobs, event handlers, MCP tools).**


Returns whether each exported function is reachable from a production entry point, the path from entry point to function, and classification (entry_point / reachable / unreachable).

Use when:
- After implementing a feature — verify new code is wired into the app
- Reviewing a PR — are all new functions actually reachable?
- Cleaning up dead code — find functions only called by tests
- Before opening a PR — run as a pre-flight check

Parameters:
- `repo` (required): Repository name
- `module` (optional): Filter to a specific module
- `functions` (optional): Specific function names to check
- `include_paths` (optional): Include full reachability paths

### get_vision_docs {#vision}

**Get the documented intent — CLAUDE.md files, PRDs, roadmaps — to understand WHY code exists.**


Returns all vision documents grouped by type (claude_md, prd, skill, roadmap), with each spec's title, section ID, and implementation status showing which functions implement each spec.

Use when:
- Implementing a feature and need to check if a PRD or spec exists
- Want to understand the original design intent behind existing code
- Need to verify implementation matches documented requirements
- Reviewing code and want to check it against the documented vision

Parameters:
- `repo` (required): Repository name
- `module` (optional): Filter to specs related to this module
- `doc_type` (optional): Filter by type (claude_md, prd, skill, roadmap, all)

### get_vision_gaps {#vision-gaps}

**Find what's missing — specs without code AND complex code without specs.**


Returns two lists: (1) specified-but-not-built: PRD specs with no implementing functions, and (2) built-but-not-specified: complex functions above a threshold with no vision spec.

Use when:
- Planning work and need to find unimplemented features from PRDs
- Want to find complex undocumented functions that need specs or tests
- Need to audit spec-to-code alignment for a module
- Someone asks "what's left to build?" or "what's undocumented?"

Parameters:
- `repo` (required): Repository name
- `module` (optional): Filter to a specific module
- `complexity_threshold` (optional): Min complexity for undocumented function flag (default: 5)

### get_cross_repo_audit {#cross-repo}

**Compare two repositories for code duplication, structural overlap, and shared patterns.**


Returns three tiers of function matches: HIGH (exact duplicates), MEDIUM (diverged implementations), LOW (name-only matches). Also reports shared module structure and shared environment variables.

Use when:
- Need to find copy-pasted code across two repos
- Planning a shared package extraction
- Want to compare the structure of two codebases
- Auditing cross-repo duplication before a refactor

Parameters:
- `repo_a` (required): First repository name
- `repo_b` (required): Second repository name
- `exported_only` (optional): Only compare exported functions (default: true)
- `min_loc` (optional): Minimum function LOC (default: 3)

### get_consolidation_opportunities {#consolidation}

**Find code that does the same work in different places — parallel consumers, duplicated call chains, competing DB access, signature twins.**


Returns structural clusters grouped by type (parallel consumers, fan-in duplication, competing DB access, signature twins, convergent imports). Each cluster includes file paths, line numbers, and context for evaluating whether merging makes sense.

Use when:
- Looking for code to consolidate or deduplicate
- Before building something new — check if similar logic already exists
- During refactoring planning — find highest-impact merge opportunities
- When the codebase feels bloated but you can't pinpoint where

Parameters:
- `repo` (required): Repository name
- `module` (optional): Focus on opportunities involving this module
- `min_shared` (optional): Minimum shared dependencies to flag (default: 3)
- `min_loc` (optional): Minimum function LOC to include (default: 5)
- `include_low_confidence` (optional): Include lower-confidence matches
- `include_same_module` (optional): Include intra-module duplication

### get_unused_code {#dead-code}

**Find dead code — functions not reachable from any production entry point.**


Uses graph reachability + text reference backup layer for high-confidence dead code detection. Returns three tiers:
- **Dead**: Graph-unreachable AND no text references anywhere. Safe to delete.
- **Likely Dead**: Graph-unreachable BUT found as text in other files (may be string-dispatched or dynamically imported). Includes evidence file paths.
- **Alive**: Graph-reachable from entry points. Not reported.

Use when:
- Looking for code safe to delete
- Cleaning up after a refactor
- Reducing codebase surface area
- Auditing unused exports

Parameters:
- `repo` (required): Repository name
- `module` (optional): Filter to a specific module
- `reachability_analysis` (optional): Deep analysis against all entry points
- `include_exported` (optional): Include exported functions (default: true)

### get_test_coverage {#test-coverage}

**See which modules and files have test coverage and which don't.**


Returns per-module test coverage summary — which files have corresponding test files, and which high-complexity functions lack tests.

Use when:
- Before writing tests — find what's already covered
- During code review — check if changed modules have tests
- Planning test strategy — identify untested high-complexity code

Parameters:
- `repo` (required): Repository name
- `module` (optional): Filter to a specific module

### get_regression_risk {#regression-risk}

**Score functions by regression risk — how likely a change breaks production.**


Returns functions ranked by regression risk score (0-1), with tier (critical/high/medium/low), complexity, entry-point exposure, file churn, and downstream caller count.

Use when:
- Before modifying a function — understand blast radius and risk
- During code review — prioritize review effort on highest-risk changes
- Planning refactors — identify the riskiest code to change carefully
- After a regression — find other high-risk functions that need attention

Parameters:
- `repo` (required): Repository name
- `module` (optional): Filter to a specific module

---

## Operational Tools

### pharaoh_account
Manage subscription, toggle PR Guard, trigger graph refreshes for your repos.

### pharaoh_feedback
Report false positives in dead code detection or provide feedback on tool results. Directly improves result quality.

### pharaoh_admin
Administrative operations for org management. Manage repos, view org status, and perform admin tasks.

---

## Competitive Positioning

| Tool | What it does | Pharaoh's difference |
|------|-------------|---------------------|
| Sourcegraph | Code search — find code across repos | Pharaoh tells you what *breaks* if you change what you found |
| CodeScene | Code health — file-level quality scores | Pharaoh analyzes cross-module *architectural relationships* |
| SonarQube | Static analysis — line-level bugs and smells | Pharaoh provides *system-level structural intelligence* |
| Snyk | Security scanning — vulnerabilities and dependencies | Pharaoh maps your *own code's* internal structure, not supply chain |
| GitHub Copilot | Code completion — generates code | Pharaoh gives Copilot (and any AI tool) the *context to generate better code* |

**Unique to Pharaoh** (no other MCP server provides these):
- Graph-based blast radius analysis with transitive caller tracing
- Production reachability verification via entry-point tracing
- Dead code detection combining graph analysis with text-reference backup
- Cross-repo structural comparison
- Regression risk scoring combining complexity, exposure, churn, and caller count
- Vision-to-implementation alignment checking (PRDs/CLAUDE.md vs actual code)

## Pricing

See https://pharaoh.so/#pricing for current pricing.

## Links

- Website: https://pharaoh.so
- GitHub: https://github.com/0xUXDesign/pharaoh
- MCP Server: SSE transport at https://pharaoh-mcp.onrender.com/sse