## WellJourney Chatbot – Technical Brief and Enhancement Plan

### Overview

The WellJourney chatbot is a helpful assistant for people after a hospital visit. It answers questions, guides you to the right support services, and can suggest local providers based on your needs. It focuses on practical help, simple explanations, and safe answers.
 
Current status: It answers healthcare questions from its built‑in knowledge, suggests nearby providers by ZIP/service/insurance, shows responses quickly, remembers conversations when a session is used, and politely redirects off‑topic questions back to healthcare topics.

### What it does today (plain language)

- Answers questions about recovery, follow‑up care, and community services.
- Suggests providers near a ZIP code, filtered by service and (if provided) insurance.
- Shows answers quickly as they are generated (streaming response).
- Remembers your conversation when a session is used, so you don’t repeat yourself.
- Stays on topic: it’s for healthcare questions, not general chat.

### Intended Purpose

- Help patients, caregivers, and providers locate appropriate after-hospital services, understand options, and navigate next steps.
- Reduce load on staff by answering common questions from KB context and leveraging tooling for provider discovery.
- Keep conversations safe and on-topic; defer clinical decisions to licensed professionals.

### Who it’s for

- Patients who just left the hospital and want clear next steps.
- Caregivers helping a loved one find the right support.
- Providers who need a quick way to point people to the right services.

### How a chat works (simple view)

1) You ask a question in your own words.
2) The chatbot looks up helpful notes from its knowledge base.
3) If needed, it searches for nearby providers that match your request.
4) It writes an answer in clear language and shows it to you as it’s ready.
5) If you’re using a session, it saves the messages so the conversation can continue later.

### What information it uses

- A curated knowledge base of articles and notes inside the project.
- Provider information from our platform and trusted public sources (for example, government registries and service directories).
- If available, your user profile (to tailor answers), but only when securely provided.

### Safety and boundaries

- Not a clinician: it does not diagnose or prescribe. For urgent or clinical issues, it guides you to appropriate care.
- Stays on healthcare topics. If asked something unrelated, it will politely ask for a healthcare‑related question.
- Keeps answers short and clear, with follow‑up questions only when truly helpful.

### Examples you can try

- “Find physical therapy near 10001 that accepts Aetna.”
- “I was discharged after knee surgery—what home services can help?”
- “I need a mental health counselor near 19104 who takes Medicaid.”
- “What should I bring to my post‑op follow‑up appointment?”
- “Show providers for speech therapy next week in 94110.”

### Known limitations (plain)

- It knows what’s in its knowledge base; if that doesn’t cover your question, it may not have an exact answer.
- Provider information depends on external sources and may change.
- It doesn’t book appointments yet.
- It won’t answer off-topic questions.

### How it can be improved (plain)

- Expand the knowledge base and improve how results are picked.
- Add more tools (insurance verification, appointment scheduling, maps and distances).
- Make safety checks and evaluations more robust.
- Improve the interface with clearer citations and provider cards.

### Suggested next steps

- Set up a larger, regularly updated knowledge base.
- Add appointment and insurance tools.
- Store chat sessions in a database for scale and reliability.
- Add monitoring and quality checks so we know it’s working well.

### Technical appendix

### Architecture (Summary)

- Express server with routers for chat, providers, and sessions.
- LangChain-based agent loop with limited tool-call steps.
- Retriever builds embeddings from local KB at startup; no external vector DB required.
- Frontend demo consumes SSE to render streaming responses.

### Diagrams

For quick reference, the following diagrams mirror those in `chatbot-architecture.md`.

#### Component diagram

```mermaid
graph LR
  %% Groups
  subgraph Frontend
    UI[Next.js Chat UI]
  end

  subgraph "Chatbot Backend (Express)"
    Srv["API Server [server.js]"]
    ChatR["Chat Router [routes/chat.js]<br/>[/api/chat-stream]"]
    ProvR["Providers Router [routes/providers.js]<br/>[/api/providers/search]"]
    SessR["Sessions Router [routes/sessions.js]<br/>[/api/sessions]"]
    RAG["RAG Retriever [rag/retriever.js]"]
    Tool["Provider Search Tool [tools/providerSearchTool.js]"]
    Store["Session Store [storage/chatStore.js]"]
    Agg["Provider Search Aggregator [services/searchAggregator.js]"]
  end

  subgraph "External / Upstream"
    MainAPI["Main Backend API [MAIN_API_BASE]"]
    OpenAI["OpenAI API"]
    NPPES[NPPES]
    CMS[CMS]
    Medicaid[Medicaid]
    In211["211 / Socrata"]
    Trials[ClinicalTrials.gov]
  end

  %% Flows
  UI -->|SSE POST| ChatR
  Srv --> ChatR
  Srv --> ProvR
  Srv --> SessR

  ChatR -->|init on boot| RAG
  ChatR -->|messages| Store
  ChatR -->|LLM call| OpenAI
  ChatR -->|tool call| Tool
  Tool -->|HTTP POST| ProvR
  ProvR --> Agg
  Agg -->|search| MainAPI
  Agg --> NPPES
  Agg --> CMS
  Agg --> Medicaid
  Agg --> In211
  Agg --> Trials

  %% Styling
  class UI ui
  class ChatR,ProvR,SessR router
  class Srv,RAG,Tool,Agg service
  class Store storage
  class MainAPI,OpenAI,NPPES,CMS,Medicaid,In211,Trials external

  classDef ui fill:#E3F2FD,stroke:#1E88E5,color:#0D47A1,stroke-width:2,rx:6,ry:6
  classDef router fill:#FFF3E0,stroke:#FB8C00,color:#E65100,stroke-width:2,rx:6,ry:6
  classDef service fill:#E8F5E9,stroke:#43A047,color:#1B5E20,stroke-width:2,rx:6,ry:6
  classDef storage fill:#F3E5F5,stroke:#8E24AA,color:#4A148C,stroke-width:2,rx:6,ry:6
  classDef external fill:#FCE4EC,stroke:#D81B60,color:#880E4F,stroke-width:2,rx:6,ry:6
```

#### Chat interaction sequence

```mermaid
sequenceDiagram
  autonumber
  box rgba(227,242,253,0.6) Frontend
    participant UI as Chat UI<br/>[chatBotFrontend/app/page.jsx]
  end
  box rgba(232,245,233,0.6) Backend
    participant Chat as Chat Router<br/>[/api/chat-stream]<br/>[routes/chat.js]
    participant RAG as RAG Retriever<br/>[rag/retriever.js]
    participant Store as Session Store<br/>[storage/chatStore.js]
  end
  box rgba(252,228,236,0.6) External
    participant LLM as LLM<br/>[ChatOpenAI]
    participant Tool as Provider Search Tool<br/>[tools/providerSearchTool.js]
    participant Prov as Providers API<br/>[/api/providers/search]<br/>[routes/providers.js]
    participant Agg as Provider Search Aggregator<br/>[services/searchAggregator.js]
    participant Main as Main Backend API<br/>[MAIN_API_BASE]
  end

  UI->>+Chat: POST message (SSE)
  opt Has sessionId
    Chat->>Chat: Load session history
  end

  Chat->>+RAG: getRelevantDocuments(question)
  RAG-->>-Chat: Top-K chunks

  Chat->>+LLM: system + context + history + user
  alt LLM decides to call tool
    LLM-->>Chat: tool_calls: search_providers
    Chat->>+Tool: invoke({zip,date,service,insurance})
    Tool->>+Prov: POST /api/providers/search
    Prov->>+Agg: aggregatedProviderSearch(params)
    Agg->>+Main: POST MAIN_API_BASE/providers/search (Bearer token)
    Main-->>-Agg: results
    Agg-->>-Prov: merged provider list
    Prov-->>-Tool: {count, results}
    Tool-->>-Chat: tool result
    Chat->>LLM: ToolMessage(result)
  else No tool call
    Note over Chat,LLM: Direct completion with RAG context
  end
  LLM-->>-Chat: final answer

  par Stream tokens
    Chat-->>UI: SSE: { token }
    Chat-->>UI: SSE: { token } ...
  and Persist
    Chat->>Store: append user/assistant messages
  end
```

#### Chatbot flow diagram

```mermaid
flowchart LR
  %% Groups
  subgraph FE[Frontend]
    direction TB
    User([User])
    UI[Next.js Chat UI]
    Stream[SSE: stream tokens]
    User --> UI
  end

  subgraph BE[Chatbot Backend]
    direction TB
    Chat[[/api/chat-stream]]
    HasSess{sessionId provided?}
    LoadSess[Load session history]
    RAG[RAG: retrieve relevant documents]
    Prompt[Compose system + context + history + user]
    LLM[LLM: ChatOpenAI]
    ToolNeeded{Tool call needed?}
    Persist[(Persist messages)]
  end

  subgraph EXT[External Services]
    direction TB
    Tool[Provider Search Tool]
    Prov[[/api/providers/search]]
    Agg[Aggregate providers]
    OpenAI[(OpenAI API)]
  end

  %% Flows
  UI -->|POST message| Chat
  Chat --> HasSess
  HasSess -- Yes --> LoadSess
  LoadSess --> RAG
  HasSess -- No --> RAG
  RAG --> Prompt
  Prompt --> LLM
  LLM -->|inference| Chat
  Chat --> ToolNeeded
  Chat -->|LLM call| OpenAI
  OpenAI --> Chat
  ToolNeeded -- Yes --> Tool
  Tool --> Prov
  Prov --> Agg
  Agg --> Tool
  Tool --> LLM
  ToolNeeded -- No --> Chat
  Chat -->|SSE| Stream
  Stream --> UI
  Chat -->|append messages| Persist

  %% Styling
  class UI,User,Stream io
  class Chat,Prompt,RAG,LoadSess,LLM process
  class HasSess,ToolNeeded decision
  class Persist storage
  class Tool,Prov,Agg,OpenAI external

  classDef io fill:#E3F2FD,stroke:#1E88E5,color:#0D47A1,stroke-width:2,rx:6,ry:6
  classDef process fill:#E8F5E9,stroke:#43A047,color:#1B5E20,stroke-width:2,rx:6,ry:6
  classDef decision fill:#FFF3E0,stroke:#FB8C00,color:#E65100,stroke-width:2
  classDef storage fill:#F3E5F5,stroke:#8E24AA,color:#4A148C,stroke-width:2,rx:6,ry:6
  classDef external fill:#FCE4EC,stroke:#D81B60,color:#880E4F,stroke-width:2,rx:6,ry:6
```

### Detailed Implementation

#### Server and Configuration (`src/server.js`)

- Initializes CORS with allowlist from `ALLOWED_ORIGINS` (CSV). Allows credentials.
- Parses JSON requests up to 2 MB.
- Health endpoint: `GET /health` → `{ ok: true }`.
- Bootstraps RAG via `initRAG()` before attaching routes.
- Mounts routes under `/api`, `/api/providers`, and `/api/sessions`.
- Listens on `PORT` (default 4001).

#### Retrieval-Augmented Generation (RAG) (`src/rag/retriever.js`)

- KB directory: `chatBotbackend/kb` (markdown/text files recursively discovered).
- Chunking: `chunkSize = 1000`, `chunkOverlap = 200` using `RecursiveCharacterTextSplitter`.
- Embeddings: `OpenAIEmbeddings` computed once at startup and kept in memory.
- Similarity: cosine similarity over dense vectors; returns top `k=5` documents by default.
- If `OPENAI_API_KEY` is missing: logs a warning and returns an empty retriever (no context).

#### Chat Orchestration and SSE (`src/routes/chat.js`)

- Input body: `{ message, role?, history?, sessionId? }`.
  - `message` required; truncated to 4001 chars if longer.
  - If `history` missing and `sessionId` provided, loads persisted messages for the session.
- Optional user profile: if `MAIN_API_BASE` and `Authorization: Bearer <token>` are present, fetches `GET {MAIN_API_BASE}/auth/me` and injects JSON profile into the system context.
- System messages:
  - `systemPrompt(role)` provides tone, safety, KB preference, and on-topic guard.
  - `KB Context (citations optional)` message injects retrieved passages in the form `- [source] content`.
- Model: `ChatOpenAI` with `model: 'gpt-4.1'` and `temperature: 0.2`.
- Tools: binds `providerSearchTool` as `search_providers`.
- Tool-calling loop: up to 4 iterations.
  - If AI returns tool calls, pushes the AI tool-call message, executes each tool, and pushes a `ToolMessage` with JSON result (or error) keyed by `tool_call_id`.
  - If no tool calls remain, returns the final text (joins content parts if array).
- SSE response: sends a single `data: { token: finalText }` chunk, then `event: end` and `[DONE]` marker.
- Persistence: if `sessionId` exists, appends both user and assistant messages to the session store.

#### System Prompt (`src/prompts/systemPrompt.js`)

- Encodes non-clinician boundaries, escalation guidance, KB-first policy, concise clarifications, and on-topic guard for healthcare.
- Includes role-specific tone hints: `patient`, `caregiver`, `provider`, `admin`.

#### Provider Search Tool (`src/tools/providerSearchTool.js`)

- Tool name: `search_providers`.
- Description: searches providers by `zip`, `date`, `service`, `insurance`.
- Input normalization: accepts raw string, `{ input: string }`, `{ input: object }`, or direct object.
- Endpoint: `POST {API_BASE_INTERNAL || http://localhost:PORT}/api/providers/search`.
- Timeout: `PROVIDER_SEARCH_TIMEOUT_MS` (default 8000 ms) with `AbortController`.
- Returns stringified JSON of the endpoint response; extensive logging for parameters, request, status, and counts.

#### Provider Aggregation (`src/services/searchAggregator.js`, `src/routes/providers.js`)

- `POST /api/providers/search` validates that at least `zip` or `service` is provided.
- Sources combined (in parallel; `Promise.allSettled`): internal DB, NPPES, CMS, Medicaid, 211 (Socrata), ClinicalTrials.
- De-duplication by lowercase `(name|address)` key.
- Forwards bearer token to internal DB search when present.
- Response shape: `{ count: number, results: any[] }`.

#### Sessions (`src/storage/chatStore.js`, `src/routes/sessions.js`)

- Storage: file-backed JSON under `chatBotbackend/storage/chats`.
- `POST /api/sessions` → `{ id }` (uses `crypto.randomUUID()` when available).
- `GET /api/sessions/:id` → `{ id, createdAt, updatedAt, messages }`.
- `appendMessage(sessionId, { role, content })` appends with timestamp and updates `updatedAt`.

### Notable Constraints

- In-memory embeddings: requires warm-up; limited corpus size unless optimized.
- File-based session store: simple but not scalable for multi-instance deployments.
- Single-tool integration path; additional tools require coordination and guardrails.

### Enhancement Opportunities

1) Retrieval quality and scale
   - Move from in-memory to a vector database (e.g., pgvector, Pinecone, Weaviate, Qdrant).
   - Add chunking strategy, semantic metadata filters, and hybrid BM25 + dense search.
   - Scheduled re-embeddings and KB ingestion pipeline with validation.

2) Prompting and grounding
   - Add explicit answer schemas (JSON mode) for structured steps, citations, and UI hints.
   - Provide deterministic refusal templates for out-of-scope queries (already started) and safety incidents.
   - Use reranking before prompt injection to improve relevance.

3) Tooling expansion
   - Insurance plan verification tool; appointment scheduling integration.
   - Geospatial filters and quality scores for providers; wait-time feeds.
   - User profile-aware personalization for eligibility and coverage guidance.

4) Observability and safety
   - Add tracing (OpenTelemetry) and prompt/token logs with PII scrubbing.
   - Add guardrails: sensitive-topic detectors, rate limiting, abuse filters.
   - Automated evaluation harness (golden Q&A, tool-call correctness, refusal accuracy).

5) Production readiness
   - Switch sessions to persistent DB (PostgreSQL) and add authn/z for API routes.
   - Horizontal scaling behind a gateway; sticky sessions or shared store.
   - Caching for provider search results (per-zip/service) with TTL.

6) Frontend UX
   - Rich citations UI with source chips and expand/collapse of KB snippets.
   - Tool result cards (providers) with facets, sorting, and map view.
   - Retry/continue buttons; “narrow my search” guided prompts.

### Step-by-Step Enhancement Plan

Phase 1 – Foundations (1–2 sprints)
- Introduce config-driven RAG: env for topK, chunkSize, overlap, rerank toggle.
- Replace file sessions with PostgreSQL tables (`chat_sessions`, `chat_messages`).
- Add request logging, OpenTelemetry traces, and error boundaries in routes.

Phase 2 – Retrieval upgrade (2–3 sprints)
- Adopt pgvector or Qdrant for embeddings; write ingestion CLI.
- Implement hybrid search: lexical + dense + optional rerank.
- Add document lineage metadata; expose citations in UI consistently.

Phase 3 – Tooling and personalization (2–3 sprints)
- Add insurance verification and appointment scheduling tools.
- Extend provider aggregator with quality, distance, and availability signals.
- Pass user profile hints consistently; add consent and profile freshness checks.

Phase 4 – Safety and evaluation (ongoing)
- Add topic and PHI detectors; deterministic refusal messages.
- Build an eval suite: coverage of top intents, refusal scenarios, tool-path accuracy.
- Integrate canary checks into CI with thresholds.

Phase 5 – UX polish (1–2 sprints)
- Streaming UI with partial citations; structured provider cards; map view.
- Session list view; share/export; follow-up suggestions.

### Risks and Mitigations

- Hallucination risk without strong grounding → Improve retrieval, add JSON answer schemas, enforce refusal templates.
- External API reliability → Add timeouts, retries, circuit breakers, and caching.
- Scale and cost → Cache frequent queries, batch embeddings, monitor token usage.

### Appendix: Key Files

- `src/routes/chat.js`: Orchestrates LLM, tools, RAG, sessions, SSE.
- `src/prompts/systemPrompt.js`: Role-aware system message and domain guard.
- `src/rag/retriever.js`: Embedding build and document retrieval.
- `src/tools/providerSearchTool.js`: Internal provider search tool.
- `src/services/searchAggregator.js`: Aggregates external and internal provider sources.
- `src/storage/chatStore.js`: Session storage (file-based).

### API Reference (Exact)

- Chat (SSE)
  - Request: `POST /api/chat-stream`
    - Body: `{ message: string, role?: 'patient'|'caregiver'|'provider'|'admin', history?: {role:'user'|'assistant',content:string}[], sessionId?: string }`
    - Headers: optional `Authorization: Bearer <token>` for `MAIN_API_BASE` user profile fetch.
  - Response: SSE
    - `data: {"token":"<finalText>"}` (single chunk)
    - `event: end` then `data: [DONE]`

- Providers
  - Request: `POST /api/providers/search`
    - Body: `{ zip?: string, date?: 'YYYY-MM-DD', service?: string, insurance?: string }`
  - Response: `{ count: number, results: any[] }`

- Sessions
  - `POST /api/sessions` → `{ id }`
  - `GET /api/sessions/:id` → `{ id, createdAt, updatedAt, messages: {role,content,ts}[] }`

### Environment Variables

- `OPENAI_API_KEY`: required for embeddings and LLM.
- `PORT`: chatbot backend port (default `4001`).
- `ALLOWED_ORIGINS`: CSV allowlist for CORS.
- `MAIN_API_BASE`: base URL of the main backend (for `auth/me` and provider DB search).
- `PROVIDER_SEARCH_TIMEOUT_MS`: tool HTTP timeout (default `8000`).
- `API_BASE_INTERNAL`: overrides base URL used by the provider tool.
- `NEXT_PUBLIC_API_BASE_URL` (frontend): URL of this chatbot backend for the demo UI.

### Limitations and Known Behaviors

- SSE currently returns one final chunk; per-token streaming is not implemented.
- Tool loop caps at 4 iterations; returns a generic error if iterations are exhausted.
- Request size limit is 2 MB.
- Provider search requires at least `zip` or `service`.
- When `OPENAI_API_KEY` is unset, RAG returns no context, degrading grounding quality.