EventHunter: AI-Powered Event Discovery Engine with Local LLM
I had been away from the Semantic Caching series for a while. During that time, EventHunter was one of the projects I was working on in the background.
As someone who closely follows IT events around the world, I was struggling both to stay current and to plan ahead. After pulling all these events together, as a music lover I thought — "Why not include concerts too?" — and expanded the system. The same engine that tracks IT events today can track any topic you need tomorrow.
What It Does
In short: an event aggregator that automatically scans 34+ sources, enriches results with AI, and presents everything in a single interface.
There are two content categories:
IT Events — conferences, summits, webinars, workshops, hackathons. A wide range spanning platform giants like SAP, Microsoft, AWS, Google, and Oracle; cybersecurity leaders like CrowdStrike, Palo Alto, and Fortinet; analyst firms like Gartner, IDC, and Forrester; and Turkey-based sources like Kommunity, BT Haber, and Devnot.
Concerts & Festivals — city-based tour tracking from Songkick, electronic music events from Resident Advisor, and ticket news from Biletix across Turkey. Genre tagging (rock, metal, electronic, jazz, classical...) makes filtering straightforward.
The same system finds both AWS re:Invent and a metal concert in Istanbul. Architecturally there's no difference between them — only the extraction prompt changes.
Architecture
The data flow looks like this:
Playwright (headless Chromium)
↓
HTML cleaning (script / nav / footer removed)
↓
LiteLLM Proxy
↓
LLM → Structured JSON
↓
URL-based deduplication
↓
PostgreSQL (upsert)
↓
React UICrawler Layer
Playwright renders JavaScript-heavy pages in a real browser. This is a critical detail — the vast majority of modern event pages use client-side rendering rather than SSR; you can't get meaningful content from a plain HTTP request.
For static and RSS-based sources, httpx + feedparser take over. The best method is chosen for each source.
When cleaning noise from HTML, <script>, <style>, <nav>, <header>, and <footer> tags are stripped. The content sent to the LLM should be as lean and signal-dense as possible — this matters for both accuracy and cost.
AI Layer and the Local LLM Choice
All AI inference runs locally on a Mac Studio M4 Max via Ollama. qwen3:32b is the primary model.
There are three concrete reasons for this choice:
Cost. Hundreds of pages are processed every crawl cycle. Each page triggers an LLM call. With a cloud API, this cost accumulates fast and becomes hard to control at scale. With a local model, the token cost is zero.
Privacy. Scraped content never leaves the machine. This becomes critical for enterprise use — especially under KVKK and GDPR.
Control. No rate limits. Deterministic latency. Predictable model behavior.
LiteLLM proxy sits as the single AI entry point. Switching from Ollama to OpenAI, Anthropic, or Groq is a one-line config change — application code stays the same. Fallback scenarios, provider routing, and retry logic are all managed at the proxy layer.
Automation
Celery Beat + RedBeat scheduler kicks off a full crawl every Monday at 03:00 UTC. The schedule is stored in Redis and can be updated at runtime via API — no redeploy needed.
URL-based deduplication ensures the same event is never stored twice. Past events can be purged with POST /events/purge-past.
Key Features
Chatbot
Works in two stages. The first stage resolves the user's intent and determines the appropriate filters. The second stage pulls real events from the database and composes the response.
Questions like "Is there a security webinar in Istanbul next month?" are answered with verified data only. No hallucinations — if data doesn't exist, that's stated clearly.
Recommendation Engine
Every event is scored along two dimensions:
- Quality score: source reliability, category, title analysis, data completeness
- Match score: user's interest areas and preferred city
Final score: quality × 0.4 + match × 0.6
Users define their interests (Cloud, Security, AI...) and music genres once; the system scores every new event against that profile.
Filtering
Region, category, company, city, date range, online/in-person, content type (IT / Concert), and free-text search all work together.
Tech Stack
| Layer | Technology |
|---|---|
| Backend | FastAPI 0.115 + SQLAlchemy 2.0 async |
| Database | PostgreSQL 16 |
| Task queue | Celery 5.4 + Redis 7 |
| Crawler | Playwright + httpx + feedparser |
| AI | LiteLLM Proxy → Ollama / OpenAI / Anthropic |
| Frontend | React 18 + Vite 6 + TypeScript + Tailwind CSS |
| Reverse proxy | Nginx Alpine |
| Orchestration | Docker Compose (7 services) |
| Hardware | Mac Studio M4 Max (Apple Silicon ARM64) |
A single command with Docker Compose:
cp .env.example .env
bash scripts/start.sh
# → http://localhostWhat's Next
While building this, I realized the same engine can be applied far beyond IT events. Fashion trends, competitor price tracking, tech news — the config changes, the infrastructure doesn't.