HR LLM Wiki - Your Second Brain

A personal knowledge base powered by a local LLM (Ollama). Inspired by Andrej Karpathy's philosophy: simple systems, raw text, let the LLM do the work.

No vector databases. No embeddings. No cloud. No numpy.


๐Ÿ“ Folder Structure

hr-llm-wiki-ak/
โ”œโ”€โ”€ data/                    โ† Your .md files live here (auto-created on ingest)
โ”œโ”€โ”€ logs/                    โ† Rotating log files (auto-created)
โ”œโ”€โ”€ cache/                   โ† File read cache (auto-managed)
โ”‚
โ”œโ”€โ”€ config.py                โ† โญ Central config โ€” change model, ports, limits here
โ”œโ”€โ”€ logger.py                โ† Console + rotating file logger shared by all modules
โ”‚
โ”œโ”€โ”€ ingest.py                โ† Convert PDF / DOCX / TXT โ†’ Markdown
โ”œโ”€โ”€ file_loader.py           โ† Load + mtime-cache .md files
โ”œโ”€โ”€ chunker.py               โ† Split large docs by heading + fixed-size overlap chunks
โ”œโ”€โ”€ search.py                โ† BM25 retrieval (no embeddings)
โ”œโ”€โ”€ context_builder.py       โ† Trim + assemble LLM context within token budget
โ”œโ”€โ”€ llm.py                   โ† Ollama (llama3 / mistral / phi3) interface
โ”œโ”€โ”€ prompt_templates.py      โ† Wikipedia JSON, search, and comparison prompts
โ”‚
โ”œโ”€โ”€ api.py                   โ† FastAPI: /ask  /search  /compare  /reindex  /health
โ”œโ”€โ”€ ui.py                    โ† Streamlit: Ask ยท Search ยท Compare tabs
โ”œโ”€โ”€ cli.py                   โ† โญ Full CLI: ask / search / ingest / list / stats
โ”‚
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

โš™๏ธ Setup

1. Install Python dependencies

pip install -r requirements.txt

2. Install & start Ollama

# Install from https://ollama.com
ollama pull llama3.2:1b        # or: ollama pull mistral  /  ollama pull phi3
ollama serve              # starts on http://localhost:11434

3. Ingest your documents

python ingest.py path/to/your/file.pdf

# Supported: .pdf  .docx  .txt  .md

๐Ÿš€ Run

API + UI (full stack)

# Terminal 1
python api.py          # FastAPI at http://localhost:8000

# Terminal 2
streamlit run ui.py    # UI at http://localhost:8501

๐Ÿ”Œ API Endpoints

Method Endpoint Description
GET /health Ollama status + file count
GET /files List indexed .md files
POST /reindex Reload all files from /data
GET /search?q=... BM25 keyword search (no LLM)
GET /ask?q=... Full wiki answer via LLM
GET /compare?q=... Cross-document comparison via LLM
# Examples
curl "http://localhost:8000/ask?q=What+is+the+sick+leave+policy"
curl "http://localhost:8000/compare?q=How+do+the+cyber+and+conduct+policies+differ+on+email"
curl -X POST "http://localhost:8000/reindex"

๐Ÿง  Architecture

[PDF / DOCX / TXT]
       โ”‚
       โ–ผ ingest.py
  [.md files in /data]
       โ”‚
       โ–ผ file_loader.py  (mtime cache)
  [doc dicts: filename, path, content]
       โ”‚
       โ–ผ chunker.py  (split by heading โ†’ fixed-size overlap chunks)
  [chunk dicts: heading, content, search_text]
       โ”‚
       โ–ผ search.py  (BM25 scoring + filename/heading bonuses)
  [top-k docs with matched sections]
       โ”‚
       โ–ผ context_builder.py  (trim to token budget, label sources)
  [context string โ‰ค 14,000 chars]
       โ”‚
       โ–ผ llm.py โ†’ Ollama
  [structured JSON: title, summary, sections]
       โ”‚
       โ–ผ api.py / cli.py / ui.py
  [Wikipedia-style output]

๐Ÿ”ง Configuration (config.py)

Setting Default Description
DEFAULT_MODEL llama3.2:1b Ollama model to use
MAX_CONTEXT_CHARS 14000 ~3500 tokens
DEFAULT_TOP_K 3 Docs retrieved per query
CHUNK_SIZE 1200 Chars per chunk
CHUNK_OVERLAP 150 Overlap between chunks
BM25_K1 1.5 Term freq saturation
BM25_B 0.75 Length normalization
API_PORT 8000 FastAPI port
LOG_LEVEL INFO DEBUG / INFO / WARNING

๐Ÿ’ก Design Decisions

Problem Solution Why
Retrieval BM25 in search.py ~60 lines, zero deps, beats most vector RAG on exact-match
Large docs Heading + overlap chunker Only relevant paragraphs reach the LLM
Context management Hard char trim in context_builder.py Never overflow the LLM context window
Caching mtime dict in file_loader.py Re-reads only changed files
LLM output Strict JSON prompt Structured, parseable, Wikipedia-style
Multi-doc queries /compare endpoint + compare_prompt Same retrieval, different prompt
Logging Rotating file + console in logger.py Audit trail for every query