HR LLM Wiki - Your Second Brain

April 11, 2026

A personal knowledge base powered by a local LLM (Ollama). Inspired by Andrej Karpathy's philosophy: simple systems, raw text, let the LLM do the work.

No vector databases. No embeddings. No cloud. No numpy.

📁 Folder Structure

hr-llm-wiki-ak/
├── data/                    ← Your .md files live here (auto-created on ingest)
├── logs/                    ← Rotating log files (auto-created)
├── cache/                   ← File read cache (auto-managed)
│
├── config.py                ← ⭐ Central config — change model, ports, limits here
├── logger.py                ← Console + rotating file logger shared by all modules
│
├── ingest.py                ← Convert PDF / DOCX / TXT → Markdown
├── file_loader.py           ← Load + mtime-cache .md files
├── chunker.py               ← Split large docs by heading + fixed-size overlap chunks
├── search.py                ← BM25 retrieval (no embeddings)
├── context_builder.py       ← Trim + assemble LLM context within token budget
├── llm.py                   ← Ollama (llama3 / mistral / phi3) interface
├── prompt_templates.py      ← Wikipedia JSON, search, and comparison prompts
│
├── api.py                   ← FastAPI: /ask  /search  /compare  /reindex  /health
├── ui.py                    ← Streamlit: Ask · Search · Compare tabs
├── cli.py                   ← ⭐ Full CLI: ask / search / ingest / list / stats
│
├── requirements.txt
└── README.md

⚙️ Setup

1. Install Python dependencies

pip install -r requirements.txt

2. Install & start Ollama

# Install from https://ollama.com
ollama pull llama3.2:1b        # or: ollama pull mistral  /  ollama pull phi3
ollama serve              # starts on http://localhost:11434

3. Ingest your documents

python ingest.py path/to/your/file.pdf

# Supported: .pdf  .docx  .txt  .md

🚀 Run

API + UI (full stack)

# Terminal 1
python api.py          # FastAPI at http://localhost:8000

# Terminal 2
streamlit run ui.py    # UI at http://localhost:8501

🔌 API Endpoints

Method	Endpoint	Description
GET	`/health`	Ollama status + file count
GET	`/files`	List indexed .md files
POST	`/reindex`	Reload all files from /data
GET	`/search?q=...`	BM25 keyword search (no LLM)
GET	`/ask?q=...`	Full wiki answer via LLM
GET	`/compare?q=...`	Cross-document comparison via LLM

# Examples
curl "http://localhost:8000/ask?q=What+is+the+sick+leave+policy"
curl "http://localhost:8000/compare?q=How+do+the+cyber+and+conduct+policies+differ+on+email"
curl -X POST "http://localhost:8000/reindex"

🧠 Architecture

[PDF / DOCX / TXT]
       │
       ▼ ingest.py
  [.md files in /data]
       │
       ▼ file_loader.py  (mtime cache)
  [doc dicts: filename, path, content]
       │
       ▼ chunker.py  (split by heading → fixed-size overlap chunks)
  [chunk dicts: heading, content, search_text]
       │
       ▼ search.py  (BM25 scoring + filename/heading bonuses)
  [top-k docs with matched sections]
       │
       ▼ context_builder.py  (trim to token budget, label sources)
  [context string ≤ 14,000 chars]
       │
       ▼ llm.py → Ollama
  [structured JSON: title, summary, sections]
       │
       ▼ api.py / cli.py / ui.py
  [Wikipedia-style output]

🔧 Configuration (`config.py`)

Setting	Default	Description
`DEFAULT_MODEL`	`llama3.2:1b`	Ollama model to use
`MAX_CONTEXT_CHARS`	`14000`	~3500 tokens
`DEFAULT_TOP_K`	`3`	Docs retrieved per query
`CHUNK_SIZE`	`1200`	Chars per chunk
`CHUNK_OVERLAP`	`150`	Overlap between chunks
`BM25_K1`	`1.5`	Term freq saturation
`BM25_B`	`0.75`	Length normalization
`API_PORT`	`8000`	FastAPI port
`LOG_LEVEL`	`INFO`	DEBUG / INFO / WARNING

💡 Design Decisions

Problem	Solution	Why
Retrieval	BM25 in `search.py`	~60 lines, zero deps, beats most vector RAG on exact-match
Large docs	Heading + overlap chunker	Only relevant paragraphs reach the LLM
Context management	Hard char trim in `context_builder.py`	Never overflow the LLM context window
Caching	mtime dict in `file_loader.py`	Re-reads only changed files
LLM output	Strict JSON prompt	Structured, parseable, Wikipedia-style
Multi-doc queries	`/compare` endpoint + `compare_prompt`	Same retrieval, different prompt
Logging	Rotating file + console in `logger.py`	Audit trail for every query

Open Github Repo