AI & Infrastructure

Cutting RAG Costs with Semantic Cache — Part 3: Building the Test Environment

In the previous part I walked through the test environment design and laid out the hypotheses. This part is where we actually build it — installing every component from scratch and verifying that all connections work before moving on.

By the end of this post you'll have a GPU-accelerated embedding engine, a vector-search-capable Redis instance, and a working LLM connection.

Installation Order

The sequence matters. We start with the base tools, then the database, then the Python environment. This avoids dependency conflicts down the line.

Step 1 — Homebrew: Package management on Mac. Step 2 — Python 3.11+: The runtime for all scripts. Step 3 — Docker Desktop: Runs Redis Stack in an isolated environment. Step 4 — Redis Stack: Cache and vector DB in one container. Step 5 — Python venv: Isolated environment that doesn't touch system Python. Step 6 — Libraries: sentence-transformers, openai SDK, tiktoken, and others. Step 7 — Connection test: Verifying everything works together.

Step 1 — Homebrew

Open Terminal (Cmd + Space → "Terminal") and check if Homebrew is already installed:

brew --version

If not:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Your password will be required during installation — admin access is needed. Expect 2-5 minutes.

Step 2 — Python

Check the current version first:

python3 --version

Python 3.10 or higher is fine. If you're on something older:

brew install python@3.11

Step 3 — Docker Desktop

We'll run Redis Stack inside a container. Check if Docker is already installed:

docker --version

If not, go to https://www.docker.com/products/docker-desktop and download the Apple Silicon build. Open the .dmg, drag Docker to Applications, launch it. macOS will ask for permission on first run — grant it.

Verify the installation:

docker run hello-world

"Hello from Docker!" means you're good.

Step 4 — Redis Stack

Redis Stack adds the RediSearch and RedisJSON modules on top of standard Redis. This gives us both key-value caching and vector similarity search from a single container — no need to run two separate services.

docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 -v redis-data:/data redis/redis-stack:latest

Port 6379 is where the Python scripts connect. Port 8001 is the RedisInsight web UI — open http://localhost:8001 in your browser to watch cache entries fill up in real time.

Verify the connection:

docker exec -it redis-stack redis-cli ping

"PONG" means Redis is running.

Useful container commands: docker stop redis-stack, docker start redis-stack, docker logs redis-stack -f.

Step 5 — Python Virtual Environment

Create the project folder and activate the venv:

mkdir ~/semantic-cache-lab && cd ~/semantic-cache-lab python3 -m venv venv source venv/bin/activate

When you see (venv) at the start of the terminal prompt, you're in the isolated environment. Note that venv deactivates when you close the terminal — run source ~/semantic-cache-lab/venv/bin/activate again at the start of each new session.

Step 6 — Libraries

With venv active, create requirements.txt:

openai>=1.0.0 sentence-transformers>=2.7.0 torch>=2.0.0 redis>=5.0.0 tiktoken>=0.7.0 pypdf>=4.0.0 langchain>=0.2.0 langchain-community>=0.2.0 pandas>=2.0.0 matplotlib>=3.8.0 python-dotenv>=1.0.0 tqdm>=4.0.0

Then install:

pip install -r requirements.txt

sentence-transformers and torch are large packages — allow 5-15 minutes depending on your connection speed. MPS (Metal Performance Shaders) support on M5 Max is detected automatically; no extra configuration is needed for GPU acceleration.

Verify that MPS is active:

python3 -c "import torch; print('MPS:', torch.backends.mps.is_available())"

True means embedding computations will run on the Apple GPU — roughly 3-5x faster than CPU.

Step 7 — Connection Test

First create your .env file (get your OpenRouter API key at https://openrouter.ai/keys):

OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LLM_MODEL=google/gemini-2.0-flash-lite-001 REDIS_HOST=localhost REDIS_PORT=6379

Never commit .env to git. Add a .gitignore:

.env venv/ __pycache__/ data/ results/

One thing worth flagging before running the test: when processing OpenRouter responses, choices returns a list. You need to access it with an index — response.choices[0].message.content — otherwise you'll hit "list object has no attribute message".

Expected output:

================================================== CONNECTION TEST ==================================================

[1/4] Redis connection... ✅ Redis: OK

[2/4] Loading embedding model... ✅ Embedding: OK (dim=384, device=mps)

[3/4] OpenRouter LLM connection... ✅ OpenRouter: OK Response: Yes! Token usage: 27

[4/4] Token counting... ✅ tiktoken: OK "This is a test sentence." = 7 tokens

================================================== ALL CONNECTIONS SUCCESSFUL ==================================================

Common Issues

docker: command not found → Make sure Docker Desktop is running (look for the Docker icon in the menu bar). Connection refused (Redis) → Run docker start redis-stack to bring the container back up. ModuleNotFoundError → The venv isn't active. Run source venv/bin/activate. MPS: False → Not an error; processing falls back to CPU, just a bit slower. 401 Unauthorized → Check the API key in your .env file.

What's Next

The infrastructure is in place. In Part 4 we build the RAG pipeline: loading the WEF report, chunking it, embedding the chunks into Redis, and running the first semantic search queries.

← Blog