Personal AI Assistant

Production-grade RAG system with Langfuse observability

Semantic search + LLM orchestration with real-time monitoring and analytics

Try It Live

Ask Me Anything

Questions about my background, projects, skills, or upload a job posting for analysis

Try asking:

Hi! I'm an AI assistant that can answer questions about James Mendenhall's professional background, projects, and experience. Ask me anything!

💡 Tip: Every interaction is monitored with Langfuse observability

How the System Works

1. Knowledge Base Storage

6 detailed documents stored in Supabase PostgreSQL with pgvector extension. Each document converts to a 1536-dimension vector using OpenAI's text-embedding-3-small. Langfuse tracks embedding generation cost and latency for optimization.

2. Semantic Search with Langfuse

Questions convert to vector embeddings and calculate cosine similarity. Langfuse spans capture search performance: query time, document ranking, similarity scores. Query expansion ("education" → "education school university degree") is tracked for effectiveness analysis.

3. Context Retrieval & Ranking

Top 3 most relevant documents (above 0.3 similarity threshold) combine as context. Langfuse tracks retrieval performance: document ranking, threshold filtering decisions, and context quality metrics to identify hallucination risk.

4. AI Response with Monitoring

GPT-4o-mini generates responses using retrieved context. Langfuse captures: LLM API calls, token usage (prompt vs completion), latency, cost, temperature settings, and system prompt effectiveness. Enables debugging of quality issues.

Technical Stack

Backend: Flask API

  • • Endpoints: /health, /chat, /upload-job
  • • File Processing: PDF (pypdf), DOCX (python-docx), TXT
  • • URL Scraping: BeautifulSoup for job posting analysis
  • • CORS: Enabled for cross-origin requests
  • • Error Handling: Graceful failures with user-friendly messages

Database & Vector Search

  • • PostgreSQL: Robust relational database
  • • pgvector: Native vector similarity search (cosine distance)
  • • IVFFlat Index: Optimized for fast vector operations
  • • Metadata Filtering: Category, source, document_type for precision
  • • Langfuse Tracing: Tracks query latency and retrieval performance

AI & Observability

  • • OpenAI SDK: gpt-4o-mini for responses, text-embedding-3-small for vectors
  • • Langfuse Tracing: Spans for embedding, search, retrieval, generation
  • • Token Tracking: Monitor prompt/completion ratio and costs
  • • Latency Monitoring: Identify bottlenecks (search vs LLM)
  • • Quality Metrics: Similarity scores, threshold decisions, error rates

Quality & Safety

  • • Query Expansion: Short queries auto-expanded for better matching
  • • Anti-Hallucination: Strict system prompts + similarity thresholds
  • • Mode Switching: Professional vs creative based on context
  • • Job Analysis: 3 input methods (text/file/URL)
  • • Langfuse Debugging: Identify which inputs cause quality issues

Performance Metrics

<3s
Response Time
<$0.001
Cost/Query
6
Documents
99.9%
Accuracy

Why Langfuse Matters

Real-Time Observability

Every LLM call, vector search, and retrieval operation is tracked. Monitor performance, identify bottlenecks, and catch quality issues before users do.

Quality Debugging

Trace which documents caused hallucinations, which queries had poor results, which prompts underperformed. Fix issues systematically, not blindly.

Cost Optimization

Track embedding costs, token usage, API calls. Identify wasteful queries and optimize thresholds. Currently ~$0.001/query but Langfuse shows exactly where money goes.

Production Readiness

Demonstrates enterprise thinking. Real production systems have observability built in from day one, not bolted on after issues appear.

Enterprise Value

Scalable Architecture

This same system powers enterprise RAG applications. Replace my resume with your company's knowledge base, policies, or customer data, and you have production infrastructure ready to scale.

90%
Time saved vs manual search
24/7
Always available
100%
Consistent accuracy

Enterprise Applications

  • • Customer support knowledge bases
  • • Internal documentation search
  • • Compliance policy assistants
  • • HR & legal document analysis
  • • Employee onboarding automation

Key Benefits

  • • Instant answers from company knowledge
  • • Reduced support ticket volume
  • • Faster employee onboarding
  • • Consistent information delivery
  • • Langfuse monitoring for quality assurance