Personal AI Assistant
Production-grade RAG system with Langfuse observability
Semantic search + LLM orchestration with real-time monitoring and analytics
Try It Live
Ask Me Anything
Questions about my background, projects, skills, or upload a job posting for analysis
Try asking:
Hi! I'm an AI assistant that can answer questions about James Mendenhall's professional background, projects, and experience. Ask me anything!
💡 Tip: Every interaction is monitored with Langfuse observability
How the System Works
1. Knowledge Base Storage
6 detailed documents stored in Supabase PostgreSQL with pgvector extension. Each document converts to a 1536-dimension vector using OpenAI's text-embedding-3-small. Langfuse tracks embedding generation cost and latency for optimization.
2. Semantic Search with Langfuse
Questions convert to vector embeddings and calculate cosine similarity. Langfuse spans capture search performance: query time, document ranking, similarity scores. Query expansion ("education" → "education school university degree") is tracked for effectiveness analysis.
3. Context Retrieval & Ranking
Top 3 most relevant documents (above 0.3 similarity threshold) combine as context. Langfuse tracks retrieval performance: document ranking, threshold filtering decisions, and context quality metrics to identify hallucination risk.
4. AI Response with Monitoring
GPT-4o-mini generates responses using retrieved context. Langfuse captures: LLM API calls, token usage (prompt vs completion), latency, cost, temperature settings, and system prompt effectiveness. Enables debugging of quality issues.
Technical Stack
Backend: Flask API
- • Endpoints: /health, /chat, /upload-job
- • File Processing: PDF (pypdf), DOCX (python-docx), TXT
- • URL Scraping: BeautifulSoup for job posting analysis
- • CORS: Enabled for cross-origin requests
- • Error Handling: Graceful failures with user-friendly messages
Database & Vector Search
- • PostgreSQL: Robust relational database
- • pgvector: Native vector similarity search (cosine distance)
- • IVFFlat Index: Optimized for fast vector operations
- • Metadata Filtering: Category, source, document_type for precision
- • Langfuse Tracing: Tracks query latency and retrieval performance
AI & Observability
- • OpenAI SDK: gpt-4o-mini for responses, text-embedding-3-small for vectors
- • Langfuse Tracing: Spans for embedding, search, retrieval, generation
- • Token Tracking: Monitor prompt/completion ratio and costs
- • Latency Monitoring: Identify bottlenecks (search vs LLM)
- • Quality Metrics: Similarity scores, threshold decisions, error rates
Quality & Safety
- • Query Expansion: Short queries auto-expanded for better matching
- • Anti-Hallucination: Strict system prompts + similarity thresholds
- • Mode Switching: Professional vs creative based on context
- • Job Analysis: 3 input methods (text/file/URL)
- • Langfuse Debugging: Identify which inputs cause quality issues
Performance Metrics
Why Langfuse Matters
Real-Time Observability
Every LLM call, vector search, and retrieval operation is tracked. Monitor performance, identify bottlenecks, and catch quality issues before users do.
Quality Debugging
Trace which documents caused hallucinations, which queries had poor results, which prompts underperformed. Fix issues systematically, not blindly.
Cost Optimization
Track embedding costs, token usage, API calls. Identify wasteful queries and optimize thresholds. Currently ~$0.001/query but Langfuse shows exactly where money goes.
Production Readiness
Demonstrates enterprise thinking. Real production systems have observability built in from day one, not bolted on after issues appear.
Enterprise Value
Scalable Architecture
This same system powers enterprise RAG applications. Replace my resume with your company's knowledge base, policies, or customer data, and you have production infrastructure ready to scale.
Enterprise Applications
- • Customer support knowledge bases
- • Internal documentation search
- • Compliance policy assistants
- • HR & legal document analysis
- • Employee onboarding automation
Key Benefits
- • Instant answers from company knowledge
- • Reduced support ticket volume
- • Faster employee onboarding
- • Consistent information delivery
- • Langfuse monitoring for quality assurance