Troubleshooting
Troubleshooting
Section titled “Troubleshooting”Common issues encountered during development and production operation, with diagnosis steps and solutions.
Database Connectivity
Section titled “Database Connectivity”Neo4j returns 503 on graph endpoints
Section titled “Neo4j returns 503 on graph endpoints”Symptom: Graph analytics endpoints (/v1/graph/*) return 503 Service Unavailable.
Cause: Neo4j is not included in docker-compose.yml by design. It must be configured separately.
Solution:
Option A — Local Neo4j:
# Install and start Neo4j locallybrew install neo4j # macOSneo4j startOption B — Neo4j Aura (cloud):
# Set these in your .env fileNEO4J_URI=neo4j+s://your-instance.databases.neo4j.ioNEO4J_USERNAME=neo4jNEO4J_PASSWORD=your-passwordRedis connection refused
Section titled “Redis connection refused”Symptom: ConnectionRefusedError on startup or redis.exceptions.ConnectionError during requests.
Cause: Redis container is not running.
Solution:
# Start infrastructure containers (PostgreSQL, Redis, Qdrant)make docker-up
# Verify Redis is runningdocker compose ps redisImpact if Redis is down:
- Cache is disabled (requests hit the database directly)
- Rate limiting fails closed — all requests are blocked (v1.0 will fix this to fail-open)
- ARQ background workers cannot process jobs
PostgreSQL migration failures
Section titled “PostgreSQL migration failures”Symptom: alembic upgrade head fails with schema conflicts or connection errors.
Cause: Database not running, or migration history is out of sync.
Solution:
# Ensure PostgreSQL is runningmake docker-up
# Run migrationsmake migrate
# If there are conflicts, check the migration historyalembic -c migrations/alembic.ini history
# To see the current database revisionalembic -c migrations/alembic.ini currentIf a migration fails mid-way:
- Check which revision the database is at with
alembic current - Review the failing migration in
migrations/versions/ - If the migration has a
downgrade()function, you can rollback:alembic -c migrations/alembic.ini downgrade -1 - If the migration is marked irreversible, consult the PR that introduced it for a contingency plan
Embedding Model
Section titled “Embedding Model”Model download fails or is slow
Section titled “Model download fails or is slow”Symptom: make download-model hangs, times out, or fails with a network error.
Cause: Hugging Face model downloads can be large (~500MB for Legal-BERTimbau) and are affected by network conditions.
Solution:
# Retry the downloadmake download-model
# If download keeps failing, check your network and HuggingFace statuscurl -I https://huggingface.co
# Alternative: use a remote embedding service instead of local model# Set in .env:VALTER_EMBEDDING_SERVICE_URL=https://your-embedding-service/encodeThe model is cached at ~/.cache/huggingface/ after the first successful download. Subsequent starts will use the cache.
Qdrant dimension mismatch
Section titled “Qdrant dimension mismatch”Symptom: Search returns an error about vector dimensions not matching the collection configuration.
Cause: The Qdrant collection was created with a different embedding dimension than the current model produces. This happens when switching embedding models (e.g., from a 384d model to the 768d Legal-BERTimbau).
Solution:
- Check the current model’s dimension:
python -c "from sentence_transformers import SentenceTransformer; m = SentenceTransformer('rufimelo/Legal-BERTimbau-sts-base'); print(m.get_sentence_embedding_dimension())"# Expected output: 768- Verify the
VALTER_EMBEDDING_DIMENSIONenvironment variable matches:
# In .envVALTER_EMBEDDING_DIMENSION=768- If the collection was created with the wrong dimension, it must be recreated:
# WARNING: This deletes all indexed vectors. You will need to re-index.python -c "from qdrant_client import QdrantClient; c = QdrantClient('localhost', port=6333); c.delete_collection('valter_documents')"After deleting the collection, restart the application — it will recreate the collection with the correct dimension on startup.
OCR Issues
Section titled “OCR Issues”OCR fails with ImportError
Section titled “OCR fails with ImportError”Symptom: ImportError: No module named 'pytesseract' or FileNotFoundError: tesseract is not installed.
Cause: OCR has two dependencies — the Python package and the system binary. Both must be installed.
Solution:
# Install the Python OCR extraspip install -e ".[ocr]"
# Install the system Tesseract binary# macOS:brew install tesseract tesseract-lang
# Ubuntu/Debian:sudo apt-get install tesseract-ocr tesseract-ocr-por
# Verify installationtesseract --versionMCP Server
Section titled “MCP Server”MCP stdio not connecting to Claude Desktop
Section titled “MCP stdio not connecting to Claude Desktop”Symptom: Claude Desktop does not list Valter’s tools, or shows a connection error.
Cause: Incorrect claude_desktop_config.json configuration.
Solution:
- Verify your Claude Desktop configuration file (location depends on OS):
{ "mcpServers": { "valter": { "command": "python", "args": ["-m", "valter.mcp.stdio_server"], "env": { "PYTHONPATH": "/path/to/Valter/src" } } }}-
Common issues to check:
- The
commandmust point to the correct Python binary (use the full path if using a virtual environment:/path/to/Valter/.venv/bin/python) PYTHONPATHmust include thesrc/directory- Environment variables needed by Valter (database URLs, API keys) must be present in the
envblock or inherited from the shell
- The
-
Restart Claude Desktop after changing the configuration.
MCP remote returns 401 Unauthorized
Section titled “MCP remote returns 401 Unauthorized”Symptom: Remote MCP client receives 401 Unauthorized when calling tools.
Cause: Invalid or missing API key in the request.
Solution:
- Verify the API keys are configured on the server:
# In .env — comma-separated list of valid keysVALTER_MCP_SERVER_API_KEYS=key1,key2-
Verify the client is sending the key correctly (as a Bearer token or in the configured header).
-
Start the remote MCP server and check logs for auth errors:
make mcp-remote# Watch for 401 entries in the structured log outputDevelopment
Section titled “Development”ruff not found
Section titled “ruff not found”Symptom: make lint fails with ruff: command not found.
Cause: The virtual environment is not activated, or ruff is not installed.
Solution:
# Activate the virtual environmentsource .venv/bin/activate
# If ruff is not installedpip install ruff
# Then run lintmake lintTests fail with async errors
Section titled “Tests fail with async errors”Symptom: Tests fail with RuntimeError: no current event loop or PytestUnraisableExceptionWarning related to asyncio.
Cause: pytest-asyncio mode is not configured correctly.
Solution:
Verify that pyproject.toml has the correct asyncio mode:
[tool.pytest.ini_options]asyncio_mode = "auto"If the setting is correct but tests still fail, check that pytest-asyncio is installed:
pip install pytest-asyncioType checking errors with mypy
Section titled “Type checking errors with mypy”Symptom: make quality fails at the mypy step with type errors.
Cause: Missing type stubs or strict typing violations.
Solution:
The quality target runs mypy only on a scoped subset of files (defined in MYPY_QUALITY_SCOPE in the Makefile). If you are adding new files to the scope, ensure they have complete type annotations on all public functions.
# Run mypy on just the scoped filesmypy --follow-imports=silent src/valter/api/deps.py src/valter/api/routes/ingest.py src/valter/mcp/tools.pyProduction
Section titled “Production”Rate limiting blocks all requests
Section titled “Rate limiting blocks all requests”Symptom: All API requests return 429 Too Many Requests or 503 Service Unavailable, even at low traffic.
Cause: Redis is down and the rate limiter is configured to fail-closed. When Redis is unreachable, no rate limit checks can pass, so all requests are rejected.
Solution (immediate):
# Restart Redisdocker compose restart redis
# Verify Redis is respondingdocker compose exec redis redis-cli ping# Expected: PONGSolution (permanent): This is tracked as the highest-priority fix for v1.0 — switching the rate limiter to fail-open for valid API keys when Redis is unavailable.
High latency on graph endpoints
Section titled “High latency on graph endpoints”Symptom: Graph analytics endpoints take > 10s to respond or time out.
Cause: Complex graph traversals on large subgraphs, or Neo4j is under memory pressure.
Solution:
- Check Neo4j memory allocation — the default may be too low for the ~28,000 node / ~207,000 relationship graph
- Verify that graph indexes exist for frequently-queried properties (decision number, minister name, legal provision)
- For Neo4j Aura: check the instance tier and whether you are hitting query limits
Planned Feature — v1.1 will add a circuit breaker that opens after Neo4j hangs for > 5s, allowing requests to proceed without graph features rather than blocking indefinitely.