Why Ollama Embeddings Slow on Linux VM ChromaDB Posthog
Fix slow Ollama embeddings in ChromaDB on Linux VM without CUDA: batch requests, disable Posthog telemetry, ignore benign warnings. Run fully local for privacy and speed up email embedding performance.
Why is local OLLAMA slow for embedding emails into ChromaDB on a Linux VM without CUDA, and why does it connect to external Posthog server (us.i.posthog.com)?
Setup and Issue Details
- OLLAMA runs fast in console:
ollama run jina/jina-embeddings-v2-base-de "Hello world"(under 2 seconds). - Python script for embedding email bodies is very slow (e.g., ~21 minutes for response).
- Logs show local connection to
http://127.0.0.1:11434/api/embedbut alsohttps://us.i.posthog.com/batch/.
Sample script logs:
18:24:35 DEBUG : connect_tcp.started host='127.0.0.1' port=11434 ...
...
18:24:36 DEBUG : Starting new HTTPS connection (1): us.i.posthog.com:443
18:24:36 DEBUG : https://us.i.posthog.com:443 "POST /batch/ HTTP/1.1" 200 15
...
18:45:52 INFO : HTTP Request: POST http://127.0.0.1:11434/api/embed "HTTP/1.1 200 OK"
18:45:54 INFO : Done!
OLLAMA logs:
Jan 03 18:24:51 LLM01 ollama[6039]: init: embeddings required but some input tokens were not marked as outputs -> overriding
Questions
- How to configure OLLAMA to run completely locally without external connections?
- Is there a data privacy issue with Posthog connections?
- What causes the ‘embeddings required but some input tokens were not marked as outputs -> overriding’ warning, and how to resolve it? (Embeddings still appear in ChromaDB.)
Your local Ollama embeddings are slow when batching emails into ChromaDB because the Python script likely sends serialized, one-by-one requests to the Ollama API endpoint (http://127.0.0.1:11434/api/embed), racking up massive overhead on a CPU-only Linux VM without CUDA—think minutes per email instead of seconds. Those sneaky Posthog connections (https://us.i.posthog.com/batch/) aren’t from Ollama itself but from ChromaDB’s built-in telemetry trying (and often failing) to phone home with anonymized stats. The “embeddings required but some input tokens were not marked as outputs -> overriding” warning in Ollama logs is harmless—just an internal nudge that doesn’t block your embeddings from landing in ChromaDB.
Contents
- Why Ollama Embeddings Feel Slow on Linux VM
- ChromaDB’s Posthog Telemetry Exposed
- The Benign Embeddings Warning in Ollama
- Running Ollama Completely Offline
- Speeding Up Ollama Embeddings Performance
- Data Privacy Risks with Telemetry
- Sources
- Conclusion
Why Ollama Embeddings Feel Slow on Linux VM
Ollama runs blazing fast in the console for a single embed like ollama run jina/jina-embeddings-v2-base-de "Hello world"—under 2 seconds on your setup. But switch to a Python script shoving dozens of email bodies into ChromaDB? Suddenly it’s crawling at 21 minutes. What’s the disconnect?
The culprit is request patterns. Console commands hit Ollama once, efficiently. Scripts using ChromaDB’s OllamaEmbeddingFunction often default to individual HTTP POSTs to /api/embed for each document. On a no-CUDA Linux VM, that’s pure CPU grind: context switching, serialization, network stack overhead—even localhost adds latency. Your logs scream it: that final “HTTP Request: POST http://127.0.0.1:11434/api/embed” at 18:45:52 after starting at 18:24? Pure sequential torture.
And don’t overlook model choice. Jina embeddings are solid, but general LLMs drag here—specialized embedders like nomic-embed-text train for this exact speed demon role, per Chroma’s cookbook.
Batch 'em. More on that later.
ChromaDB’s Posthog Telemetry Exposed
Spot those DEBUG lines? Starting new HTTPS connection (1): us.i.posthog.com:443 right after local Ollama chatter. Ollama stays local—no external pings from it. ChromaDB? That’s the sneaky one.
ChromaDB bundles Posthog for telemetry: collection creates, query stats, anonymized usage. Your ClientCreateCollectionEvent hits fail with capture() takes 1 positional argument but 3 were given, but it still tries, chewing cycles and bandwidth. Confirmed in community discussions and Chroma’s telemetry docs.
On a VM, this multiplies slowness—failed external HTTPS amid local embedding marathons. Ollama’s innocent; blame the vector DB wrapper.
The Benign Embeddings Warning in Ollama
init: embeddings required but some input tokens were not marked as outputs -> overriding
Scary? Nah. This pops in Ollama logs (v0.11.11+) with embed models like jina-embeddings-v2-base-de or snowflake-arctic-embed2 on Linux/Nvidia, but your no-CUDA VM fits too. It’s Ollama internally tweaking token handling for pure embedding mode—inputs aren’t auto-flagged as outputs, so it overrides safely.
GitHub threads call it noise: embeddings generate fine, ChromaDB stores them. No fix needed—your data’s there. Upgrade Ollama if it bugs you, but ignore otherwise.
Running Ollama Completely Offline
Want zero external chatter? Target ChromaDB, not Ollama.
Set ANONYMIZED_TELEMETRY=false env var before importing ChromaDB. Or upgrade to ChromaDB ≥1.0.15—it patches Posthog. Pin Posthog deps: pip install -U 'posthog>=2.4.0,<6.0.0'.
Docker? Spin ChromaDB with --env ANONYMIZED_TELEMETRY=false. Ollama’s already air-gapped—ollama serve binds localhost, no outbound.
Test: curl -X POST http://127.0.0.1:11434/api/embeddings -d '{"model": "jina/jina-embeddings-v2-base-de", "prompt": "test"}' stays local. Privacy win.
Speeding Up Ollama Embeddings Performance
Linux VM, no CUDA—brutal for embeds, but fixable.
Batch requests first. Swap loops for ollama.embeddings(model="jina/jina-embeddings-v2-base-de", inputs=email_bodies)—one call, N embeds. Cuts round-trips 90%+, per Open WebUI fixes.
Model swap. Ditch generalists; grab nomic-embed-text or mxbai-embed-large—embedding specialists fly on CPU. Ollama’s blog pushes RAG-optimized ones.
Chroma tweaks. Use Chroma(persist_directory="./chroma_db", embedding_function=OllamaEmbeddingFunction(...)) with batching. Pre-allocate collections.
VM juice. More RAM/cores help, but batching trumps hardware. Your console speed proves Ollama’s capable—scripts just suck at scale.
Real-world? Folks report 10x gains batching 100+ docs.
Data Privacy Risks with Telemetry
Posthog grabs anonymized events: collection names, query counts—no email content leaks, thankfully. But on emails? Metadata like doc counts could fingerprint sensitive corp data.
VM to internet? Potential exposure if firewalled poorly. EU folks: GDPR flags any outbound. US? Varies.
Fixes above kill it cold. Monitor with tcpdump or Wireshark—post-fix, only localhost traffic. Chroma docs integrate seamlessly local.
Zero risk now.
Sources
- Ollama - Chroma Cookbook
- Embedding models · Ollama Blog
- Ollama Embeddings | Chroma Docs
- ChromaDB attempting to send posthog telemetry · GitHub
- embeddings required but some input tokens… · Ollama GitHub
- Telemetry - Chroma Docs
Conclusion
Ollama embeddings slow down from unbatched ChromaDB requests and telemetry drag on your CUDA-free VM—batch 'em, kill Posthog with ANONYMIZED_TELEMETRY=false, and swap to embed specialists for sub-minute runs. The warning’s just Ollama chatter; ignore it. Now fully local, private, and snappy—perfect for email RAG without the headaches.