Side Project & AI

EventHunter: AI-Powered Event Discovery Engine with Local LLM

I had been away from the Semantic Caching series for a while. During that time, EventHunter was one of the projects I was working on in the background.

As someone who closely follows IT events around the world, I was struggling both to stay current and to plan ahead. After pulling all these events together, as a music lover I thought — "Why not include concerts too?" — and expanded the system. The same engine that tracks IT events today can track any topic you need tomorrow.


What It Does

In short: an event aggregator that automatically scans 34+ sources, enriches results with AI, and presents everything in a single interface.

There are two content categories:

IT Events — conferences, summits, webinars, workshops, hackathons. A wide range spanning platform giants like SAP, Microsoft, AWS, Google, and Oracle; cybersecurity leaders like CrowdStrike, Palo Alto, and Fortinet; analyst firms like Gartner, IDC, and Forrester; and Turkey-based sources like Kommunity, BT Haber, and Devnot.

Concerts & Festivals — city-based tour tracking from Songkick, electronic music events from Resident Advisor, and ticket news from Biletix across Turkey. Genre tagging (rock, metal, electronic, jazz, classical...) makes filtering straightforward.

The same system finds both AWS re:Invent and a metal concert in Istanbul. Architecturally there's no difference between them — only the extraction prompt changes.


Architecture

The data flow looks like this:

Playwright (headless Chromium)
    ↓
HTML cleaning (script / nav / footer removed)
    ↓
LiteLLM Proxy
    ↓
LLM → Structured JSON
    ↓
URL-based deduplication
    ↓
PostgreSQL (upsert)
    ↓
React UI

Crawler Layer

Playwright renders JavaScript-heavy pages in a real browser. This is a critical detail — the vast majority of modern event pages use client-side rendering rather than SSR; you can't get meaningful content from a plain HTTP request.

For static and RSS-based sources, httpx + feedparser take over. The best method is chosen for each source.

When cleaning noise from HTML, <script>, <style>, <nav>, <header>, and <footer> tags are stripped. The content sent to the LLM should be as lean and signal-dense as possible — this matters for both accuracy and cost.

AI Layer and the Local LLM Choice

All AI inference runs locally on a Mac Studio M4 Max via Ollama. qwen3:32b is the primary model.

There are three concrete reasons for this choice:

Cost. Hundreds of pages are processed every crawl cycle. Each page triggers an LLM call. With a cloud API, this cost accumulates fast and becomes hard to control at scale. With a local model, the token cost is zero.

Privacy. Scraped content never leaves the machine. This becomes critical for enterprise use — especially under KVKK and GDPR.

Control. No rate limits. Deterministic latency. Predictable model behavior.

LiteLLM proxy sits as the single AI entry point. Switching from Ollama to OpenAI, Anthropic, or Groq is a one-line config change — application code stays the same. Fallback scenarios, provider routing, and retry logic are all managed at the proxy layer.

Automation

Celery Beat + RedBeat scheduler kicks off a full crawl every Monday at 03:00 UTC. The schedule is stored in Redis and can be updated at runtime via API — no redeploy needed.

URL-based deduplication ensures the same event is never stored twice. Past events can be purged with POST /events/purge-past.


Key Features

Chatbot

Works in two stages. The first stage resolves the user's intent and determines the appropriate filters. The second stage pulls real events from the database and composes the response.

Questions like "Is there a security webinar in Istanbul next month?" are answered with verified data only. No hallucinations — if data doesn't exist, that's stated clearly.

Recommendation Engine

Every event is scored along two dimensions:

  • Quality score: source reliability, category, title analysis, data completeness
  • Match score: user's interest areas and preferred city

Final score: quality × 0.4 + match × 0.6

Users define their interests (Cloud, Security, AI...) and music genres once; the system scores every new event against that profile.

Filtering

Region, category, company, city, date range, online/in-person, content type (IT / Concert), and free-text search all work together.


Tech Stack

LayerTechnology
BackendFastAPI 0.115 + SQLAlchemy 2.0 async
DatabasePostgreSQL 16
Task queueCelery 5.4 + Redis 7
CrawlerPlaywright + httpx + feedparser
AILiteLLM Proxy → Ollama / OpenAI / Anthropic
FrontendReact 18 + Vite 6 + TypeScript + Tailwind CSS
Reverse proxyNginx Alpine
OrchestrationDocker Compose (7 services)
HardwareMac Studio M4 Max (Apple Silicon ARM64)

A single command with Docker Compose:

cp .env.example .env
bash scripts/start.sh
# → http://localhost

What's Next

While building this, I realized the same engine can be applied far beyond IT events. Fashion trends, competitor price tracking, tech news — the config changes, the infrastructure doesn't.

Repo: github.com/EnginSahin-create/EventHunter