AI & Infrastructure
Cutting RAG Costs with Semantic Cache — Part 3: Building the Test Environment
In the previous part I walked through the test environment design and laid out the hypotheses. This part is where we actually build it — installing every component from scratch and verifying that all connections work before moving on.
By the end of this post you'll have a GPU-accelerated embedding engine, a vector-search-capable Redis instance, and a working LLM connection.
Installation Order
The sequence matters. We start with the base tools, then the database, then the Python environment. This avoids dependency conflicts down the line.
Step 1 — Homebrew: Package management on Mac. Step 2 — Python 3.11+: The runtime for all scripts. Step 3 — Docker Desktop: Runs Redis Stack in an isolated environment. Step 4 — Redis Stack: Cache and vector DB in one container. Step 5 — Python venv: Isolated environment that doesn't touch system Python. Step 6 — Libraries: sentence-transformers, openai SDK, tiktoken, and others. Step 7 — Connection test: Verifying everything works together.
Step 1 — Homebrew
Open Terminal (Cmd + Space → "Terminal") and check if Homebrew is already installed:
brew --version
If not:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Your password will be required during installation — admin access is needed. Expect 2-5 minutes.
Step 2 — Python
Check the current version first:
python3 --version
Python 3.10 or higher is fine. If you're on something older:
brew install python@3.11
Step 3 — Docker Desktop
We'll run Redis Stack inside a container. Check if Docker is already installed:
docker --version
If not, go to https://www.docker.com/products/docker-desktop and download the Apple Silicon build. Open the .dmg, drag Docker to Applications, launch it. macOS will ask for permission on first run — grant it.
Verify the installation:
docker run hello-world
"Hello from Docker!" means you're good.
Step 4 — Redis Stack
Redis Stack adds the RediSearch and RedisJSON modules on top of standard Redis. This gives us both key-value caching and vector similarity search from a single container — no need to run two separate services.
docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 -v redis-data:/data redis/redis-stack:latest
Port 6379 is where the Python scripts connect. Port 8001 is the RedisInsight web UI — open http://localhost:8001 in your browser to watch cache entries fill up in real time.
Verify the connection:
docker exec -it redis-stack redis-cli ping
"PONG" means Redis is running.
Useful container commands: docker stop redis-stack, docker start redis-stack, docker logs redis-stack -f.
Step 5 — Python Virtual Environment
Create the project folder and activate the venv:
mkdir ~/semantic-cache-lab && cd ~/semantic-cache-lab python3 -m venv venv source venv/bin/activate
When you see (venv) at the start of the terminal prompt, you're in the isolated environment. Note that venv deactivates when you close the terminal — run source ~/semantic-cache-lab/venv/bin/activate again at the start of each new session.
Step 6 — Libraries
With venv active, create requirements.txt:
openai>=1.0.0 sentence-transformers>=2.7.0 torch>=2.0.0 redis>=5.0.0 tiktoken>=0.7.0 pypdf>=4.0.0 langchain>=0.2.0 langchain-community>=0.2.0 pandas>=2.0.0 matplotlib>=3.8.0 python-dotenv>=1.0.0 tqdm>=4.0.0
Then install:
pip install -r requirements.txt
sentence-transformers and torch are large packages — allow 5-15 minutes depending on your connection speed. MPS (Metal Performance Shaders) support on M5 Max is detected automatically; no extra configuration is needed for GPU acceleration.
Verify that MPS is active:
python3 -c "import torch; print('MPS:', torch.backends.mps.is_available())"
True means embedding computations will run on the Apple GPU — roughly 3-5x faster than CPU.
Step 7 — Connection Test
First create your .env file (get your OpenRouter API key at https://openrouter.ai/keys):
OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LLM_MODEL=google/gemini-2.0-flash-lite-001 REDIS_HOST=localhost REDIS_PORT=6379
Never commit .env to git. Add a .gitignore:
.env venv/ __pycache__/ data/ results/
One thing worth flagging before running the test: when processing OpenRouter responses, choices returns a list. You need to access it with an index — response.choices[0].message.content — otherwise you'll hit "list object has no attribute message".
Expected output:
================================================== CONNECTION TEST ==================================================
[1/4] Redis connection... ✅ Redis: OK
[2/4] Loading embedding model... ✅ Embedding: OK (dim=384, device=mps)
[3/4] OpenRouter LLM connection... ✅ OpenRouter: OK Response: Yes! Token usage: 27
[4/4] Token counting... ✅ tiktoken: OK "This is a test sentence." = 7 tokens
================================================== ALL CONNECTIONS SUCCESSFUL ==================================================
Common Issues
docker: command not found → Make sure Docker Desktop is running (look for the Docker icon in the menu bar). Connection refused (Redis) → Run docker start redis-stack to bring the container back up. ModuleNotFoundError → The venv isn't active. Run source venv/bin/activate. MPS: False → Not an error; processing falls back to CPU, just a bit slower. 401 Unauthorized → Check the API key in your .env file.
What's Next
The infrastructure is in place. In Part 4 we build the RAG pipeline: loading the WEF report, chunking it, embedding the chunks into Redis, and running the first semantic search queries.