Personal AI Assistant
Production-grade RAG system with Langfuse observability
Semantic search + LLM orchestration with real-time monitoring and analytics
Try It Live
Ask Me Anything
Questions about my background, projects, skills, or experience
Try asking:
Hi! I'm an AI assistant that can answer questions about James Mendenhall's professional background, projects, and experience. Ask me anything!
💡 Tip: Every interaction is monitored with Langfuse observability
How the System Works
1. Knowledge Base Storage
6 detailed documents stored in Supabase PostgreSQL with pgvector extension. Each document converts to a 1536-dimension vector using OpenAI's text-embedding-3-small. Langfuse tracks embedding generation cost and latency for optimization.
2. Semantic Search with Langfuse
Questions convert to vector embeddings and calculate cosine similarity. Langfuse spans capture search performance: query time, document ranking, similarity scores. Query expansion ("education" → "education school university degree") is tracked for effectiveness analysis.
3. Context Retrieval & Ranking
Top 3 most relevant documents (above 0.3 similarity threshold) combine as context. Langfuse tracks retrieval performance: document ranking, threshold filtering decisions, and context quality metrics to identify hallucination risk.
4. AI Response with Monitoring
GPT-4o-mini generates responses using retrieved context. Langfuse captures: LLM API calls, token usage (prompt vs completion), latency, cost, temperature settings, and system prompt effectiveness. Enables debugging of quality issues.
Technical Stack
Backend: Flask API
- • Endpoints: /health, /chat, /chat-stream
- • CORS: Enabled for cross-origin requests
- • Error Handling: Graceful failures with user-friendly messages
Database & Vector Search
- • PostgreSQL: Robust relational database
- • pgvector: Native vector similarity search (cosine distance)
- • IVFFlat Index: Optimized for fast vector operations
- • Metadata Filtering: Category, source, document_type for precision
- • Langfuse Tracing: Tracks query latency and retrieval performance
AI & Observability
- • OpenAI SDK: gpt-4o-mini for responses, text-embedding-3-small for vectors
- • Langfuse Tracing: Spans for embedding, search, retrieval, generation
- • Token Tracking: Monitor prompt/completion ratio and costs
- • Latency Monitoring: Identify bottlenecks (search vs LLM)
- • Quality Metrics: Similarity scores, threshold decisions, error rates
Quality & Safety
- • Query Expansion: Short queries auto-expanded for better matching
- • Anti-Hallucination: Strict system prompts + similarity thresholds
- • Mode Switching: Professional vs creative based on context
- • Langfuse Debugging: Identify which inputs cause quality issues