Side Project & AI

EventHunter: AI-Powered Event Discovery Engine with Local LLM

I had been away from the Semantic Caching series for a while. During that time, EventHunter was one of the projects I was working on in the background.

As someone who closely follows IT events around the world, I was struggling both to stay current and to plan ahead. After pulling all these events together, as a music lover I thought — "Why not include concerts too?" — and expanded the system. The same engine that tracks IT events today can track any topic you need tomorrow.

What It Does

In short: an event aggregator that automatically scans 34+ sources, enriches results with AI, and presents everything in a single interface.

There are two content categories:

IT Events — conferences, summits, webinars, workshops, hackathons. A wide range spanning the world's leading platform companies to cybersecurity leaders, from industry analyst firms to local technology communities in Turkey — all from publicly available event pages.

Concerts & Festivals — city-based tour tracking from global music platforms, electronic music events, and nationwide program from local ticketing channels in Turkey. Genre tagging (rock, metal, electronic, jazz, classical...) makes filtering straightforward.

The same system finds both a major global technology summit and a metal concert in Istanbul. Architecturally there's no difference between them — only the extraction prompt changes.

Architecture

The data flow looks like this:

Playwright (headless Chromium)
    ↓
HTML cleaning (script / nav / footer removed)
    ↓
LiteLLM Proxy
    ↓
LLM → Structured JSON
    ↓
URL-based deduplication
    ↓
PostgreSQL (upsert)
    ↓
React UI

Crawler Layer

Playwright renders JavaScript-heavy pages in a real browser. This is a critical detail — the vast majority of modern event pages use client-side rendering rather than SSR; you can't get meaningful content from a plain HTTP request.

For static and RSS-based sources, httpx + feedparser take over. The best method is chosen for each source.

When cleaning noise from HTML, <script>, <style>, <nav>, <header>, and <footer> tags are stripped. The content sent to the LLM should be as lean and signal-dense as possible — this matters for both accuracy and cost.

AI Layer and the Local LLM Choice

All AI inference runs locally on a Mac Studio M4 Max via Ollama. qwen3:32b is the primary model.

There are three concrete reasons for this choice:

Cost. Hundreds of pages are processed every crawl cycle. Each page triggers an LLM call. With a cloud API, this cost accumulates fast and becomes hard to control at scale. With a local model, the token cost is zero.

Privacy. Scraped content never leaves the machine. This becomes critical for enterprise use — especially under KVKK and GDPR.

Control. No rate limits. Deterministic latency. Predictable model behavior.

LiteLLM proxy sits as the single AI entry point. Switching from Ollama to OpenAI, Anthropic, or Groq is a one-line config change — application code stays the same. Fallback scenarios, provider routing, and retry logic are all managed at the proxy layer.

Automation

Celery Beat + RedBeat scheduler kicks off a full crawl every Monday at 03:00 UTC. The schedule is stored in Redis and can be updated at runtime via API — no redeploy needed.

URL-based deduplication ensures the same event is never stored twice. Past events can be purged with POST /events/purge-past.

Key Features

Chatbot

Works in two stages. The first stage resolves the user's intent and determines the appropriate filters. The second stage pulls real events from the database and composes the response.

Questions like "Is there a security webinar in Istanbul next month?" are answered with verified data only. No hallucinations — if data doesn't exist, that's stated clearly.

Recommendation Engine

Every event is scored along two dimensions:

Quality score: source reliability, category, title analysis, data completeness
Match score: user's interest areas and preferred city

Final score: quality × 0.4 + match × 0.6

Users define their interests (Cloud, Security, AI...) and music genres once; the system scores every new event against that profile.

Filtering

Region, category, company, city, date range, online/in-person, content type (IT / Concert), and free-text search all work together.

Tech Stack

Layer	Technology
Backend	FastAPI 0.115 + SQLAlchemy 2.0 async
Database	PostgreSQL 16
Task queue	Celery 5.4 + Redis 7
Crawler	Playwright + httpx + feedparser
AI	LiteLLM Proxy → Ollama / OpenAI / Anthropic
Frontend	React 18 + Vite 6 + TypeScript + Tailwind CSS
Reverse proxy	Nginx Alpine
Orchestration	Docker Compose (7 services)
Hardware	Mac Studio M4 Max (Apple Silicon ARM64)

A single command with Docker Compose:

cp .env.example .env
bash scripts/start.sh
# → http://localhost

What's Next

While building this, I realized the same engine can be applied far beyond IT events. Fashion trends, competitor price tracking, tech news — the config changes, the infrastructure doesn't.

Blog