Chatbot Internal Testing Guide
v2.2 · June 2026

👋  Who's testing today?

change

macu4 Chatbot — v2.2 Testing Round

Welcome to the v2.2 internal testing round for the macu4 AI chatbot. This update adds the patient profile wizard and a fully deterministic product matching engine — the biggest change to the recommendation flow yet. Professionals can now trigger a step-by-step chip wizard (condition → activity → age → wrist) that feeds into a SQL resolver with 47 pre-defined rules. No LLM guesswork at any point in product selection. Your task is to test that the wizard, the matching logic, and the product cards behave as expected — and report anything that looks wrong.

Full screen — best for desktop and mobile testing  ·  Widget — see it embedded in a page corner like a real deployment

3
Product systems
3
Languages
13
Conversation intents
v2.2
Wizard · SQL · Pages

How to use this testing guide

1
Open the chatbot — use the full-screen link (chat.macu4.com/chat) or the widget demo (chat.macu4.com/demo-homepage). On mobile, the full-screen link is easiest.
2
Work through each test group below. Each group explains what to test and why — expand it to see the prompt suggestions.
3
Click any prompt chip to copy it, then paste it into the chatbot. Try variations of your own too — that's where unexpected behaviour often hides. Pay attention to choice buttons: cached responses should feel instant. For the new wizard, the first chip step should appear without any loading delay. If a Pages card appears in the chat, tap it to browse the product content inline.
4
If something looks wrong, click the red Report an issue button in that test group (or the floating button at the bottom-right of this page). Describe what you saw and what you expected. No subject needed — just type it out.

What's changed since May 2026

v2.2 adds the patient profile wizard and deterministic SQL product matching. Test these new behaviours deliberately.

v2.1 (June 2026)

  • Deterministic answers for knowledge queries
  • In-chat Pages browser
  • Configurable choice buttons with instant cached responses
  • Query caching
  • Product recommendations still LLM-driven — inconsistent
  • Bot asked clarifying text questions before recommending
  • No structured patient profile collection

v2.2 (this test)

  • Patient profile wizard — step-by-step chip flow (condition → activity → age → wrist)
  • Deterministic SQL resolver — 47 rules, no LLM in product selection
  • Product cards rendered inline — View and Watch buttons inside the chat
  • "Which system suits my patient?" quick-reply chip for professionals
  • Natural language triggers wizard (e.g. "what system works for my patient?")
  • Full EN/DE/FR localisation in wizard and product cards
Test groups — expand to begin
👋

1  ·  First impression & user identification

Does the bot greet correctly and identify who you are?

Why this matters: The onboarding step is the gate to the entire conversation. The bot needs to identify whether you're an end user (patient, parent), a professional (orthotist, therapist), or an organisation (school, NGO) — because the entire journey and vocabulary changes depending on who you are. If it gets this wrong, every subsequent response will be miscalibrated.
👥 All user types EN DE FR
Try these opening messages
Try introducing yourself
What to look for
  • When you open the chatbot, the greeting appears with three user-type choice buttons already visible — "End-user", "Professional", "Community organization". No message needed to trigger them.
  • After tapping a choice (e.g. "End-user"), does the bot respond with four follow-up options? e.g. "I need help finding the right product for my situation", "I need help with measurements or ordering", "I have a support or service question", "I'd like to learn about the macu4 product systems."
  • After tapping "Professional", does a new "Which system suits my patient?" chip appear first in the follow-up options (EN/DE/FR)?
  • Tapping "Which system suits my patient?" should open the chip wizard immediately — no text questions before the wizard appears.
  • Does the greeting feel natural and welcoming — not robotic or overly formal?
  • When you introduce yourself by name, does the bot remember and use your name later in the conversation?
  • When you mention a patient's name or age, does the bot retain those details and refer back to them without asking again?
  • Does the bot's vocabulary and tone shift when you identify as a professional (clinical terms) vs an end-user (empathetic, everyday language)?
Screenshot: Opening greeting + user type choice buttons
📚

2  ·  Product knowledge & catalog browsing

Can the bot accurately explain Explorer, Lynk, and Lumo?

Why this matters: The knowledge base is the foundation of the chatbot. Users — whether patients or professionals — need accurate product information before making any decisions. This includes catalog browsing, factual questions, exact technical specs (dimensions, weight, materials, force limits), and clinical indications and contraindications. Any wrong answer here is misinformation about a medical device.
👥 All user types 🩺 More depth for professionals
Catalog browsing — start broad, drill down
Dimensions & weight — the bot should give exact numbers
Materials — verify what components are made of
Lifetime & service — when does the user need to replace parts?
Indications — which conditions is each system designed for?
Contraindications — the bot must not recommend when these apply
Professional-level — documentation & regulatory
What to look for
  • After "What are the product systems?", do three system cards appear — Explorer, Lynk, Lumo?
  • After "Show me all products in Lynk", does it list individual product names as tappable choices?
  • For dimension/weight questions: does the bot give exact numbers that match the specific product asked?
    Lumo Clamp → 23 g, 70.6 × 36.9 × 51.9 mm
    Lumo Roll → 32 g, 89.3 × 28.0 × 53.1 mm
    Lumo Shovel → 20 g, 98.8 × 20.6 × 51.6 mm
    Lynk Clic Lock → 48 g, 45.5 × 44 × 167 mm
    Lynk Hook Module → 30 g, 29.1 × 49.2 × 168.8 mm
    Explorer Ring STD → 36 g, 46 × 13 × 47 mm  ·  Explorer Ring EXP → 43 g, 46 × 26 × 47 mm
  • For material questions: does the answer correctly match the specific component asked?
    Lumo Clamp / Roll / Shovel → PA2200, DyeMansion colour
    Lumo Cuff → PA2200 shell, Alcantara / Velour lining, Nylon velcro, NdFeB grade 52 magnets
    Lumo Hold Module → PA2200 body + Ultrasint TPU01 head
    Lynk Hold / Push modules → PA2200 + Ultrasint TPU01
    Lynk Cover / Cuff → PA2200, Alcantara, Velour, NdFeB grade 45 or 52
    Explorer Rings → PA2200 module body, stainless steel nut, NdFeB grade 42
  • For lifetime questions: does the answer match the specific component?
    Most modules across all three systems → 2 years at 120 min/day
    Notable exception: Explorer Atop Lacing System → 1 year at 120 min/day
  • For indications: does it correctly map Lynk → spasticity/grip/wrist conditions, Explorer → amputation, Lumo → under 4 years?
  • For contraindications: does the bot clearly state the device cannot be used and explain why — without simply ignoring the question?
  • Does a child under 12 months get a clear "contraindicated" response (not a Lumo recommendation)?
Screenshot: Product catalog drill-down view
🎯

3  ·  Product recommendation engine

Does the bot ask the right questions and recommend correctly?

Why this matters: Product recommendations are clinically sensitive — wrong advice could mislead a professional or patient. As of v2.2, the entire selection flow is deterministic: a chip wizard collects condition, activity, age, and wrist status, then a SQL resolver matches those signals against 47 pre-defined rules. No LLM reasoning is involved. This means the only failure modes are: (a) the wizard not opening when expected, (b) wrong chips showing for a condition, or (c) a missing or incorrect rule in the database. Test all three.
👤 End user 🩺 Professional
Issues #028 & #026 resolved: Age and anatomy filtering is now handled by a SQL resolver, not the LLM. The chip wizard collects all signals before any matching runs, eliminating the inconsistency.
How the recommendation flow works (v2.2)
  1. Trigger the wizard — either by tapping "Which system suits my patient?" (professional chip) or typing a natural language description
  2. A chip wizard opens inside the chat: Condition → Activity → Age → Wrist
  3. Each step can be skipped; a summary step shows before confirmation
  4. On confirm, a SQL resolver queries 47 rules and returns matching product cards
  5. Product cards render inline with View (in-chat page) and Watch (video) buttons
Starting the wizard — try all three entry points

Scenario A — Child under 4 → should recommend Lumo

🌱 Expected: Lumo system only — never Lynk or Explorer for this age

Patient is under 4 years old. Lumo is the only system indicated for this age group. Test the hard age boundary and edge cases.

Scenario B — Wrist present, limited function → should recommend Lynk

💜 Expected: Lynk system — spasticity, grip weakness, paralysis, limited wrist movement, partial hand absence

Adult or older child where the wrist is present but function is limited. Multiple clinical conditions map to Lynk.

Scenario C — Amputation / wrist absent → should recommend Explorer

🔵 Expected: Explorer system — below-elbow amputation or wrist disarticulation

Adult where the hand or wrist is absent. The Explorer is the system for transradial or wrist-level amputations.

Scenario D — Anatomy unclear → bot should ask the distinguishing question

🌐 Expected: Bot asks "Is the wrist present?" before recommending Lynk vs Explorer

When impairment type is ambiguous, the bot should not guess — it should ask the one question that distinguishes Lynk (wrist present) from Explorer (wrist absent).

Scenario E — Organisation / group purchasing
What to look for (v2.2)
  • Wizard opens immediately — no text questions should appear before the chip wizard. If you see "please tell me more about your patient", that's a bug.
  • Wizard steps in order: Condition → Activity → Age → Wrist. Each step shows relevant chips only (age-band entries should not appear in the Condition step).
  • Skip works on every step — tapping Skip should advance to the next step without requiring a selection.
  • Summary before confirm — the final step shows all collected signals; each can be corrected before submitting.
  • Profile shown in chat history — after confirmation, a human-readable profile summary appears in the chat bubble (not raw JSON).
  • Product cards render inline — matching products appear as cards with system badge, image, and View/Watch buttons. View opens in the in-chat page browser.
  • Lumo only for under 4: Age 0–47 months → Lumo only. Age ≥ 48 months (just turned 4) → Lynk or Explorer, never Lumo.
  • Lynk for wrist-present impairments: spasticity, grip weakness, limited wrist movement, paralysis, partial hand absence — all with wrist present.
  • Explorer for amputations / wrist absent: below-elbow amputation, wrist disarticulation.
  • No recommendation without profile: if no condition or activity is selected (all skipped), the bot should indicate no match found — not guess a system.
  • All three language versions: wizard chip labels, summary text, and product cards should be fully localised in DE and FR.
Screenshot: Product recommendation with verification offer
📐

4  ·  Measurement guide & order configurator

Do the embedded tools open, accept data, and pass it through correctly?

Why this matters: The chatbot includes a few embedded applets: measurement guides, photo inspection (ISEM02) and an order configurator. These are interactive widgets that open inside the chat. After completing one, the data should be summarised in natural language and passed to the next step (e.g. configurator after measuring, quote request after configuring). Any break in this chain costs the user significant effort.
👤 End user 🩺 Professional
Trigger the measurement guide
Trigger the order configurator
No system context (should open a menu)
What to look for
  • Does the measurement guide widget open inside the chat?
  • After completing measurements, does the bot summarise them in plain language (not raw code like "cuff_length_L_mm")?
  • After measurements, does it offer to proceed to order configuration?
  • Does the order configurator close after you submit (not leave a lingering form)?
  • Does the quote form receive the full configurator data, not a truncated summary?
  • Does the language of the chatbot (EN / DE / FR) get passed into the applets correctly?
  • If no product is known yet, does the bot open a menu showing all available measurement guide apps?
Screenshot: Measurement guide widget open in chat
🆘

5  ·  Customer support & human handoff

Can the bot handle support cases and escalate gracefully?

Why this matters: The bot must never leave a user stuck. Technical questions, defects, and formal complaints all have their own path. Critically: the complaint flow is a medical device regulatory requirement — the bot must capture a formal incident record when a safety issue is reported, not deflect. The escalation to a human must also be clean, with a pre-filled contact form and a realistic response time stated.
👥 All user types
Technical support questions
Requesting human contact
Safety / formal complaint (regulatory path)
What to look for
  • For support questions: does it attempt to answer first, then offer escalation if unresolved?
  • When requesting human contact: does it show a pre-filled contact form? Does it state "1–2 business days"?
  • For a formal complaint: does it always complete an incident record (never deflect or say "contact us")?
  • Is it clear the bot is not a human? Is there a clear signal when it hands off?
  • After escalation, does the bot provide direct contact details? (support@macu4.com)
Screenshot: Human handoff / escalation flow
🌍

6  ·  Multi-language & language switching

Does language detection and switching work naturally?

Why this matters: The chatbot serves users in English, German, and French. Language is detected from the content of messages — not from a language setting. This means it should respond in whatever language you type in, and maintain that language for the rest of the session. If you switch languages mid-conversation, the bot should follow.
EN DE FR
German — start a full conversation in German
French — start a full conversation in French
Language switch mid-conversation
What to look for
  • Does a German message reliably trigger a German response — including choice button labels?
  • Is the German natural and professional? (Not machine-translated stiff phrasing)
  • Does French work end-to-end — including the goal qualification questions and product recommendation?
  • When you switch language mid-conversation, does the bot immediately adapt?
  • In German: does it use the informal "du" with end users and formal "Sie" for professionals?
  • In French: does it consistently use the formal "vous" for all user types — never the informal "tu"?
🎭

7  ·  Tone, UX & unexpected input

How does the bot behave when things go off-script?

Why this matters: Real users don't follow scripts. They type things we don't expect, ask off-topic questions, send gibberish, or loop back to something that was already answered. The bot should handle all of this gracefully — never crashing, never looping, and always guiding the user towards something useful. Tone is also critical: the bot should feel like the macu4 brand — fresh, empathetic, never robotic.
Off-topic & unexpected
Looping & contradictions
Tone check — does it feel human?
What to look for
  • Off-topic messages: does the bot redirect gracefully (not apologise robotically)?
  • Gibberish: does it ask a clarifying question rather than crash or error?
  • Contradictions: does it update its understanding (not repeat the previous recommendation)?
  • Emotional messages: is the response warm and human? Does it feel like the macu4 brand?
  • Overall: does every response end with at least one choice button (not a dead end)?
🎙️

8  ·  Voice mode

Is the voice experience natural, responsive, and multilingual?

Why this matters: Voice mode adds a completely different interaction layer — users speak, the bot listens, responds aloud, and shows visual chat bubbles in parallel. Bad voice UX means wrong language, robotic pauses, buttons that appear at the wrong moment, or URLs being read out as strings of characters. Every detail here affects whether the mode feels polished or broken.
A — Greeting & opening flow
What to look for
  • Text-to-speech (TTS) on: bot speaks a short greeting immediately — one clean sentence, no pause mid-way.
  • TTS off: no audio plays; hint card with example questions appears and stays visible while bot is in listening mode.
  • Hint card: the suggested questions ("Tell me about the Lumo system", etc.) should stay on screen until the user actually speaks — they must not disappear the moment listening starts.
  • Language: German greeting uses a German TTS voice (not English accent); French greeting uses a French voice.
  • Choice buttons: the user-type buttons (End-user / Professional / Organisation) appear only after the greeting finishes speaking, not during playback.
B — Speaking and being understood
What to look for
  • Transcription appears correctly in the right-aligned user bubble once speech ends.
  • Previous bot response disappears as soon as the new transcription is detected — only the new exchange shows.
  • Status text cycles: Listening → Transcribing → Thinking → Generating → Speaking.
  • Short or noisy input: bot asks for clarification — it does not crash or return an empty response.
C — TTS response quality
What to look for
  • No URLs read aloud — the bot should say "I've included a link below" or similar, not "https colon slash slash…"
  • Images and videos still display visually in the bot bubble even though they are not spoken.
  • Long responses: TTS plays with natural sentence flow — no awkward mid-sentence pauses.
  • Source citations like [source_1] are stripped and not spoken.

Screenshot: voice mode showing an image response — the bot describes the Explorer socket module aloud while the product photo renders visually in the bubble.

D — Barge-in & interruption
What to look for
  • TTS stops immediately on tap or barge-in — no audio continues in the background.
  • Bot returns to Listening state after interruption — it doesn't get stuck in Speaking.
  • The partial bot response stays visible in the bubble — it is not cleared.
E — Choice buttons & multi-turn
What to look for
  • Choice buttons appear only after TTS finishes — not while the bot is still speaking.
  • Tapping a choice button: TTS stops if playing, user bubble updates with the button label, bot processes and responds.
  • Multi-turn: each new exchange shows only the current user + bot pair — old messages are not stacked in the voice overlay (they are still in the underlying text chat).
  • "New Message" button also appears only after speaking ends — tapping it clears the current exchange and restarts listening.
F — Language correctness in voice
What to look for
  • Bot responds entirely in the session language — no English words slip into German or French responses.
  • Choice button labels are translated — no English buttons in a German or French session.
  • TTS voice sounds native for the language — German uses a German voice, French uses a French voice.
  • Example hint questions in the placeholder card match the session language.
G — Error & edge cases
What to look for
  • Inactivity: after ~30s of silence post-response, the overlay resets (clears text, returns to idle listening) — audio does not keep playing in the background.
  • Mic denied: a clear error or hint message appears — the bot does not silently hang.
  • Close: TTS stops immediately; the text conversation in the background is intact with all voice turns added correctly.
  • Re-open: fresh overlay with greeting — no stale state from the previous session.

macu4 Chatbot v2.2 · Internal Testing Round · June 2026
Issues go to Chatbot development team via the Report buttons in each section.

Report an issue
🐛

Report an issue

Goes straight to Thaung's tracker

📎 Tap to attach a screenshot
preview
Issue filed!
It's now in the tracker.