HR LLM Wiki - Your Second Brain
A personal knowledge base powered by a local LLM (Ollama). Inspired by Andrej Karpathy's philosophy: simple systems, raw text, let the LLM do the work.
No vector databases. No embeddings. No cloud. No numpy.
๐ Folder Structure
hr-llm-wiki-ak/
โโโ data/ โ Your .md files live here (auto-created on ingest)
โโโ logs/ โ Rotating log files (auto-created)
โโโ cache/ โ File read cache (auto-managed)
โ
โโโ config.py โ โญ Central config โ change model, ports, limits here
โโโ logger.py โ Console + rotating file logger shared by all modules
โ
โโโ ingest.py โ Convert PDF / DOCX / TXT โ Markdown
โโโ file_loader.py โ Load + mtime-cache .md files
โโโ chunker.py โ Split large docs by heading + fixed-size overlap chunks
โโโ search.py โ BM25 retrieval (no embeddings)
โโโ context_builder.py โ Trim + assemble LLM context within token budget
โโโ llm.py โ Ollama (llama3 / mistral / phi3) interface
โโโ prompt_templates.py โ Wikipedia JSON, search, and comparison prompts
โ
โโโ api.py โ FastAPI: /ask /search /compare /reindex /health
โโโ ui.py โ Streamlit: Ask ยท Search ยท Compare tabs
โโโ cli.py โ โญ Full CLI: ask / search / ingest / list / stats
โ
โโโ requirements.txt
โโโ README.md
โ๏ธ Setup
1. Install Python dependencies
pip install -r requirements.txt
2. Install & start Ollama
# Install from https://ollama.com
ollama pull llama3.2:1b # or: ollama pull mistral / ollama pull phi3
ollama serve # starts on http://localhost:11434
3. Ingest your documents
python ingest.py path/to/your/file.pdf
# Supported: .pdf .docx .txt .md
๐ Run
API + UI (full stack)
# Terminal 1
python api.py # FastAPI at http://localhost:8000
# Terminal 2
streamlit run ui.py # UI at http://localhost:8501
๐ API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Ollama status + file count |
| GET | /files |
List indexed .md files |
| POST | /reindex |
Reload all files from /data |
| GET | /search?q=... |
BM25 keyword search (no LLM) |
| GET | /ask?q=... |
Full wiki answer via LLM |
| GET | /compare?q=... |
Cross-document comparison via LLM |
# Examples
curl "http://localhost:8000/ask?q=What+is+the+sick+leave+policy"
curl "http://localhost:8000/compare?q=How+do+the+cyber+and+conduct+policies+differ+on+email"
curl -X POST "http://localhost:8000/reindex"
๐ง Architecture
[PDF / DOCX / TXT]
โ
โผ ingest.py
[.md files in /data]
โ
โผ file_loader.py (mtime cache)
[doc dicts: filename, path, content]
โ
โผ chunker.py (split by heading โ fixed-size overlap chunks)
[chunk dicts: heading, content, search_text]
โ
โผ search.py (BM25 scoring + filename/heading bonuses)
[top-k docs with matched sections]
โ
โผ context_builder.py (trim to token budget, label sources)
[context string โค 14,000 chars]
โ
โผ llm.py โ Ollama
[structured JSON: title, summary, sections]
โ
โผ api.py / cli.py / ui.py
[Wikipedia-style output]
๐ง Configuration (config.py)
| Setting | Default | Description |
|---|---|---|
DEFAULT_MODEL |
llama3.2:1b |
Ollama model to use |
MAX_CONTEXT_CHARS |
14000 |
~3500 tokens |
DEFAULT_TOP_K |
3 |
Docs retrieved per query |
CHUNK_SIZE |
1200 |
Chars per chunk |
CHUNK_OVERLAP |
150 |
Overlap between chunks |
BM25_K1 |
1.5 |
Term freq saturation |
BM25_B |
0.75 |
Length normalization |
API_PORT |
8000 |
FastAPI port |
LOG_LEVEL |
INFO |
DEBUG / INFO / WARNING |
๐ก Design Decisions
| Problem | Solution | Why |
|---|---|---|
| Retrieval | BM25 in search.py |
~60 lines, zero deps, beats most vector RAG on exact-match |
| Large docs | Heading + overlap chunker | Only relevant paragraphs reach the LLM |
| Context management | Hard char trim in context_builder.py |
Never overflow the LLM context window |
| Caching | mtime dict in file_loader.py |
Re-reads only changed files |
| LLM output | Strict JSON prompt | Structured, parseable, Wikipedia-style |
| Multi-doc queries | /compare endpoint + compare_prompt |
Same retrieval, different prompt |
| Logging | Rotating file + console in logger.py |
Audit trail for every query |