← Back to projects
Case StudyYear: 2025-2026

AskMyDoc

A retrieval-augmented document assistant that evolved from a RAG prototype into a reliable multi-user AI product.

Overview

AskMyDoc began as a deliberate RAG learning project: upload a PDF, retrieve relevant passages, and answer questions with grounded context.

Over multiple iterations it evolved from a single-document demo into a multi-user product with authentication, persistent workspaces, contract-first APIs, lifecycle-aware document handling, and evidence-aware answer validation.

How It Evolved

V1

Learn the RAG pipeline

Built the core document workflow end to end: extraction, chunking, embeddings, retrieval, and grounded answer generation.

V2

Productize the experience

Added authentication, user-owned documents, conversation persistence, search, history, and a workspace model that felt closer to a real AI product.

V3

Improve trust and reliability

Focused on contracts, ownership enforcement, lifecycle safety, structured answers, evidence-aware validation, and stronger test coverage.

Why I Built This

I didn't build AskMyDoc to chase novelty. I built it to understand what it actually takes to ship an AI application people can trust.

Each version answered a different engineering question:

V1: Can I build a complete RAG pipeline?
V2: Can that pipeline behave like a real product?
V3: Can the product stay reliable under real-world failure modes?

That progression turned the project from a PDF chatbot into a hands-on study of contracts, trust boundaries, persistence, and trustworthy AI behavior.

Demo Showcase

AskMyDoc

Architecture

Authenticate User
Upload + Validate PDF
Extract + Chunk
Embed + Index
Retrieve by Intent
Generate Structured Answer
Validate Evidence + Citations
Persist Workspace State

Key Features

Authentication and ownership-aware document access
Persistent conversations, document history, and workspace navigation
Contract-first frontend/backend integration using OpenAPI-generated TypeScript types
Structured JSON answers with citations, answer status, and retrieval metadata
Evidence-aware fallback behavior when context is weak or validation fails
Lifecycle-aware upload and deletion flows with partial-failure handling
Integration and UI testing for auth, retrieval, recovery, and async state transitions

Technical Decisions

FastAPI
Next.js
LangChain
ChromaDB
OpenAI
Supabase
Google OAuth
OpenAPI-generated TypeScript

What I Learned

RAG quality is only one part of the problem; product reliability becomes the harder challenge as the system grows.
Trusted, server-derived identity is safer than relying on client-supplied user identifiers for authorization.
Contract-first APIs reduce frontend/backend drift as request and response shapes get more complex.
Uploads, deletions, and cleanup need explicit lifecycle handling instead of simple success-or-failure assumptions.
A trustworthy AI product should surface uncertainty and refuse unsupported answers when evidence is weak.
Testing becomes essential once auth, ownership, persistence, and async recovery behavior enter the system.
Repo