Personal AI Assistant

Production-grade RAG system with Langfuse observability

Semantic search + LLM orchestration with real-time monitoring and analytics

Try It Live

Ask Me Anything

Questions about my background, projects, skills, or experience

Try asking:

Hi! I'm an AI assistant that can answer questions about James Mendenhall's professional background, projects, and experience. Ask me anything!

💡 Tip: Every interaction is monitored with Langfuse observability

How the System Works

1. Knowledge Base Storage

6 detailed documents stored in Supabase PostgreSQL with pgvector extension. Each document converts to a 1536-dimension vector using OpenAI's text-embedding-3-small. Langfuse tracks embedding generation cost and latency for optimization.

2. Semantic Search with Langfuse

Questions convert to vector embeddings and calculate cosine similarity. Langfuse spans capture search performance: query time, document ranking, similarity scores. Query expansion ("education" → "education school university degree") is tracked for effectiveness analysis.

3. Context Retrieval & Ranking

Top 3 most relevant documents (above 0.3 similarity threshold) combine as context. Langfuse tracks retrieval performance: document ranking, threshold filtering decisions, and context quality metrics to identify hallucination risk.

4. AI Response with Monitoring

GPT-4o-mini generates responses using retrieved context. Langfuse captures: LLM API calls, token usage (prompt vs completion), latency, cost, temperature settings, and system prompt effectiveness. Enables debugging of quality issues.

Technical Stack

Backend: Flask API

  • • Endpoints: /health, /chat, /chat-stream
  • • CORS: Enabled for cross-origin requests
  • • Error Handling: Graceful failures with user-friendly messages

Database & Vector Search

  • • PostgreSQL: Robust relational database
  • • pgvector: Native vector similarity search (cosine distance)
  • • IVFFlat Index: Optimized for fast vector operations
  • • Metadata Filtering: Category, source, document_type for precision
  • • Langfuse Tracing: Tracks query latency and retrieval performance

AI & Observability

  • • OpenAI SDK: gpt-4o-mini for responses, text-embedding-3-small for vectors
  • • Langfuse Tracing: Spans for embedding, search, retrieval, generation
  • • Token Tracking: Monitor prompt/completion ratio and costs
  • • Latency Monitoring: Identify bottlenecks (search vs LLM)
  • • Quality Metrics: Similarity scores, threshold decisions, error rates

Quality & Safety

  • • Query Expansion: Short queries auto-expanded for better matching
  • • Anti-Hallucination: Strict system prompts + similarity thresholds
  • • Mode Switching: Professional vs creative based on context
  • • Langfuse Debugging: Identify which inputs cause quality issues