Standalone article · part of a sequenced guide

What you'll unlock: A world-class Virtual Agent is not a script — it is a measurable, multi-channel product: NLU + topic design + AI Search retrieval + (optional) GenAI synthesis + action flows + handoff UX + analytics flywheel.

View full guide New here? Start Chapter 1

Tool guideChapter 6 of 10

Virtual Agent and Conversational AI

~150 min read

Building intelligent, capable, and measurably effective conversational experiences across channels

Chapter context

Virtual Agent is the highest-leverage self-service surface in ServiceNow. It can deflect tickets, create correct records when deflection fails, and reduce live-agent workload through context-rich handoffs.But VA is also easy to do poorly: weak taxonomy, synthetic training data, poor fallback, and vanity containment metrics. This chapter gives you the production discipline: NLU training cycles, channel-aware design, and CFO-grade measurement with quality safeguards.

Is this chapter for you?

Do you need higher containment without harming quality?

Start with Concepts 2 and 6: taxonomy, utterance mining, and correct containment measurement drive sustainable gains.

Are you integrating GenAI into VA?

Concept 3 is mandatory: grounding, orchestration rules, and GenAI test methodology prevent hallucination in self-service.

Do you run multiple channels (portal + Teams/Slack + mobile)?

Concept 4: omnichannel deployment and consistent analytics keep the experience coherent and governable.

Are handoffs hurting CSAT?

Concept 5: handoff triggers, context transfer, and Workspace UX reduce repeat questions and improve first reply time.

Virtual Agent is where ServiceNow AI touches end users. It is the frontline for self-service: intent classification, entity capture, workflow execution, knowledge retrieval, and seamless escalation to humans when needed.This chapter teaches the architecture of Virtual Agent, how to design topics and train NLU like an ML product, how to integrate Now Assist and AI Search safely, how to deploy across channels (portal, mobile, Teams, Slack), and how to measure containment with CFO-grade attribution.By the end you can build a PDI VA experience that creates real outcomes (catalog/incident), deploy it across channels with consistent analytics, and run a continuous improvement cadence that steadily increases containment without sacrificing quality.

Chapter insight

A world-class Virtual Agent is not a script — it is a measurable, multi-channel product: NLU + topic design + AI Search retrieval + (optional) GenAI synthesis + action flows + handoff UX + analytics flywheel.

Reference diagrams

Virtual Agent runtime

From utterance to outcome: NLU selects intent, topic captures entities, actions execute, and analytics feed continuous improvement.

User utterancePortal / Teams / Slack / mobileChannel

NLUIntent + entities + confidenceModel

Topic flowState machine + validationDialog

RetrieveAI Search + KBKnowledge

ActFlow / catalog / incidentOutcome

HandoffContext transfer to WorkspaceHuman

Containment improvement loop

Containment rises through weekly iteration: utterances → training → topics → knowledge → analytics — not through one big launch.

MeasureContainment + escalationsAnalytics

DiagnoseNLU vs knowledge vs toolsRoot cause

FixUtterances, taxonomy, KB, flowsBacklog

TestRegression + A/BQuality

DeployPilot then scaleOps

Implementation paths

VA success is a product system: design, retrieval, execution, and measurement — across channels.

Concept 1

Virtual Agent Architecture

Platform components, conversation state, channels, NLU, and how Virtual Agent connects to Now Assist, AI Search, and Agent Workspace

1.1

The Virtual Agent platform

Components, the NLU engine, and how they connect to the Now Platform

Key takeaway

Virtual Agent is the conversational layer of the Now Platform: channels + conversation runtime + NLU + topic flows + actions (Flow/IntegrationHub) — anchored to ServiceNow records and ACLs.

Why this matters

Teams that treat VA as 'a chatbot' miss the platform advantage: it can create real records, enforce policy, and preserve audit trails.

Core components: channels, conversation runtime, NLU, and topics.

VA connects to platform systems: Knowledge/AI Search for answers, Flow Designer for actions, and Agent Workspace for handoff and continued resolution.

Architecture rule: every successful conversation must end with an outcome on-record (resolved self-service action or created/updated case).

Workflow — do this next

01List your target channels (portal, mobile, Teams).
02For each, define desired outcomes (no ticket, catalog request, incident create, handoff).
03Map each outcome to a topic + action (Flow) with audit logging.

Real example

VA as workflow entry, not chat

Instead of answering “request laptop” with text, VA guided the user into the catalog request, captured required fields, created the request, and confirmed the record — measurable and auditable.

1.2

The conversation model

Sessions, topics, utterances, and the state machine that governs them

Key takeaway

Conversations are state machines: a session tracks context; a topic defines steps; utterances are classified; entities are extracted; and state transitions govern what happens next.

Why this matters

If you understand the conversation model, you can debug 'why did VA get stuck?' without guessing.

Session: holds user identity, channel, current topic, collected entities, and transcript. Topic: sequence of prompts, decisions, and actions.

State machine governs transitions: ask question → validate entity → branch → execute action → confirm → close or escalate.

Design principle: make states explicit and resumable. If a user leaves and returns, the session should handle it gracefully.

Workflow — do this next

01For one topic, draw the states and transitions on paper.
02Define validation checks for each entity before transitioning.
03Add an explicit escape hatch: 'talk to agent' and 'start over'.

Real example

Why a topic looped

Topic kept asking for asset id because entity validation failed silently. Making validation explicit and adding error messaging fixed loop and reduced abandon rate.

1.3

Channel architecture

Delivering the same VA experience across web, mobile, Slack, and Teams

Key takeaway

Channel adapters share the same VA brain (topics, NLU, actions) but differ in UX constraints: authentication, message length, rich cards, latency, and escalation paths.

Why this matters

Many teams build channel-specific bots. The ServiceNow advantage is one conversation model with channel-specific presentation.

Portal widget: rich UI and embedded actions. Mobile: constrained UI and intermittent connectivity. Teams/Slack: message-based cards and enterprise auth.

Design once, adapt per channel: shorten prompts for chat platforms, provide quick replies, and avoid long multi-step forms in Slack.

Security differs per channel: SSO, token exchange, and workspace governance (Teams/Slack) change what data you can expose.

Workflow — do this next

01Pick primary channel; design topic flows there first.
02Create channel-specific UX variants for key steps (quick replies).
03Test auth and ACL behavior per channel with real roles.

Real example

Same topic, different UX

Password reset topic used embedded portal action button on web, but quick-reply menu on Teams. Logic stayed same; channel presentation changed.

1.4

The NLU engine

Intent classification, entity extraction, and the confidence model

Key takeaway

NLU maps utterances to intents and entities with confidence scores. Confidence should drive clarification, fallback, or escalation — not blind execution.

Why this matters

The majority of VA failures are NLU failures: wrong intent, missing entity, or overconfident routing.

Intent classification chooses a topic. Entity extraction collects structured data (application, device, location). Confidence quantifies uncertainty.

Design rule: low confidence → ask a clarifying question. Very low confidence → fallback to search or escalate.

Training data matters: use real utterances, not what designers think users say. Update synonyms and taxonomy as language evolves.

Workflow — do this next

01Define confidence bands for your VA: auto-route, clarify, fallback.
02Collect 200 real utterances for top intents before go-live.
03Evaluate confusion pairs and fix taxonomy or training data.

Real example

Clarification reduced wrong routing

“Access” utterances were ambiguous (VPN vs app access). Adding a clarifying question when confidence below threshold reduced misroutes and improved satisfaction.

1.5

Integration with Now Assist

How GenAI enhances VA beyond scripted topics

Key takeaway

Now Assist adds free-text answers, summarisation, and long-tail handling — but must be grounded (AI Search/KB) and governed with persona and safety constraints.

Why this matters

GenAI can reduce topic authoring load, but without grounding it increases hallucination risk in self-service.

Use GenAI where scripted topics are expensive: long-tail questions, summarising policies, and drafting responses — but keep deterministic steps (catalog submission) scripted.

GenAI should cite sources and escalate when sources are missing. Never let it invent procedural steps for high-risk actions.

Persona matters: employee portal tone differs from customer-facing CSM. Configure per domain.

Workflow — do this next

01Identify 10 long-tail intents causing low containment.
02Enable GenAI answer mode for those intents with KB grounding.
03Add guardrails: prohibited topics and escalation triggers.

Real example

Reduced topic sprawl with grounded GenAI

Instead of 200 micro-topics for benefits variants, VA used GenAI over curated HR KB. Containment improved while legal-approved guardrails prevented policy improvisation.

1.6

Integration with AI Search

Surfacing knowledge results inside the conversation

Key takeaway

AI Search is the retrieval backbone for VA: it finds the right KB/catalog items; VA presents them with guided actions and fallback paths.

Why this matters

VA containment rises when search is accurate; it collapses when search returns wrong or stale results.

Pattern: user utterance → AI Search retrieve top candidates → show top result(s) with quick actions → confirm success → close or escalate.

Use profile scoping: employee VA should retrieve only employee-safe KB. HR VA retrieves HR KB. Profiles + ACLs prevent leakage.

Measure: click-through and ticket creation after search inside VA.

Workflow — do this next

01Define AI Search profile for VA channel per persona.
02Create golden query set for VA utterances.
03Tune boosts and synonyms; re-test weekly.

Real example

Search fixed VA “I don’t know” responses

VA fallback was frequent because keyword search failed. Switching to AI Search with synonyms improved retrieval; containment rose without rewriting topics.

1.7

The Agent Workspace integration

Where live agents receive and continue conversations

Key takeaway

Agent Workspace is the handoff surface: agents see transcript, extracted entities, suggested next steps, and linked records — enabling seamless continuation.

Why this matters

Poor handoff destroys CSAT and cancels deflection gains because humans must re-collect context.

The best handoff bundles: user intent, transcript summary, data collected, and the created/linked record id (incident/case).

Design for 'first reply': the agent should be able to respond without asking the user to repeat key details.

Integrate Now Assist: summarise conversation and suggest response drafts for the live agent.

Workflow — do this next

01Define what entities must be captured before handoff.
02Ensure transcript and entity values are visible in Agent Workspace.
03Test with agents: measure repeat-question rate after handoff.

Real example

Warm handoff improved CSAT

Live agents saw summary and collected fields, so they acknowledged the user’s attempt and moved directly to resolution. Repeat questions dropped and CSAT rose.

1.8

Data flow and security

How conversation data is handled, stored, and governed

Key takeaway

Conversation data is operational data: transcripts, entities, and handoff records must respect ACLs, retention, and privacy policies — especially with GenAI in the loop.

Why this matters

Self-service conversations can include PII. Without governance, VA becomes a compliance risk.

Governance decisions: transcript retention duration, redaction rules, and who can view transcripts. Align with HR/legal requirements.

With GenAI: ensure sensitive fields are excluded from prompts, and ensure outputs do not leak restricted content across roles.

Auditability: log handoffs, approvals (if any), and agent actions associated with the conversation.

Workflow — do this next

01Define transcript retention per domain (IT vs HR).
02Test ACL: ensure a user cannot view other users’ transcripts.
03Red-team prompts attempting to extract sensitive information.

Real example

HR transcript governance prevented leakage

HR VA conversations had strict retention and restricted viewing roles. GenAI prompts excluded sensitive fields. Compliance signed off because controls were explicit and tested.

Concept 2

Topic Design and NLU Training

Intent taxonomy, utterance collection, entities, clarification, fallback design, training/testing, and performance analytics

2.1

Topic anatomy

Intent, utterances, conversation flow, and the resolution action

Key takeaway

A topic is a mini-application: intent definition + training utterances + entity capture + stateful flow + resolution action (Flow/record create) + success confirmation.

Why this matters

Teams write topics like scripts and forget outcomes. Topics must resolve, not just chat.

Topic components: trigger intent, user prompts, entity collection/validation, branching, action execution (catalog/incident), and completion check.

Design rule: end every topic with a clear outcome: 'your request number is…' or 'I’m transferring you with context.'

Anti-pattern: topics that ask many questions then do nothing. Users abandon and call support.

Workflow — do this next

01Pick one high-volume use case and define the outcome record.
02Design entity collection with validation.
03Add a success confirmation step and a fallback escalation step.

Real example

Password reset topic success criteria

Topic ended only when reset completed (self-service) or incident created with transcript attached. No ambiguous 'done' messages.

2.2

Intent taxonomy design

Structuring a topic hierarchy for a large service catalogue

Key takeaway

Good taxonomy is stable and user-oriented: group by user goals, not by internal teams. Use 10–30 top-level intents and push specificity into entities and follow-up questions.

Why this matters

Taxonomy determines NLU confusion and maintenance burden. Bad taxonomy creates endless retraining work.

Design top-level intents around user goals: 'reset password', 'request access', 'device issue' — not internal queue names.

Avoid overly granular intents. Use entities (application, device) to specialise inside a topic.

Keep taxonomy versioned and reviewed quarterly. Service catalogs evolve; taxonomy must evolve with governance.

Workflow — do this next

01Cluster top 200 utterances into 15–25 goals.
02Define entity-based specialisation inside each goal.
03Create a taxonomy change process (add/merge/retire).

Real example

Access taxonomy simplified

Instead of 40 app-specific access intents, one 'request access' intent captured app entity and routed to correct catalog item. NLU confusion decreased and containment increased.

2.3

Utterance collection

Gathering real user language to train NLU

Key takeaway

Real utterances come from search logs, chat transcripts, ticket descriptions, and call center notes. Synthetic utterances are a weak substitute.

Why this matters

NLU models fail when trained on designer language instead of user language.

Sources: portal search queries, email subject lines, incident short descriptions, VA transcripts, and agent notes.

Privacy: redact PII before using utterances for training datasets, especially in HR and healthcare domains.

Balance: include regional language, acronyms, and device differences. Language drift is constant in enterprises.

Workflow — do this next

01Collect 500 utterances for top intents (redacted).
02Label them into taxonomy; identify ambiguous clusters.
03Update synonyms and clarification rules before retraining.

Real example

Real utterances fixed confusion

Users said “auth app” not “MFA”. Training on real language improved intent classification and reduced fallback rate.

2.4

Entity design

Extracting structured data and driving flows

Key takeaway

Entities turn free text into structured inputs (application, device, location). Good entity design reduces user friction and improves automation quality.

Why this matters

Without entities, topics ask too many questions or execute the wrong action.

Define entities with controlled vocabularies: application names, device types, locations. Use synonyms to map user terms to canonical values.

Validate entities before action. Wrong entity = wrong catalog item or wrong assignment group.

Use entities to route to Flows and catalog items. Entities are the bridge between conversation and workflow.

Workflow — do this next

01Pick 3 entities for your top topic; define canonical list and synonyms.
02Add validation prompts ('Did you mean…?').
03Log entity extraction failures for training improvements.

Real example

App entity drove correct access request

User said “need Salesforce access”. Entity extractor mapped “Salesforce” to app id and launched the correct catalog request flow automatically.

2.5

Clarification design

Handling ambiguous input without frustrating users

Key takeaway

Clarification is a UX discipline: ask the minimum question needed to disambiguate, offer quick replies, and remember prior answers so users don't repeat themselves.

Why this matters

Over-clarifying kills containment. Under-clarifying causes wrong actions and escalations.

Use confidence bands: if intent confidence is low, ask a disambiguation question with 2–4 options.

Keep prompts short. In chat channels, long questions feel like forms. Use quick replies and progressive disclosure.

Provide escape hatch: 'talk to agent' and 'show me search results'. Users need control.

Workflow — do this next

01Identify top 5 ambiguous utterances and design clarifiers.
02A/B test clarifier wording and option order.
03Measure: drop-off rate at clarification step.

Real example

Access clarifier reduced escalation

VA asked: 'Is this about VPN, app access, or password?' Quick replies reduced misroutes and improved containment.

2.6

Fallback and out-of-scope handling

What VA does when it cannot understand

Key takeaway

Fallback is a designed path: show search results, ask a different question, create a ticket, or hand off — with context preserved.

Why this matters

Fallback quality determines whether users trust VA or abandon it permanently.

Out-of-scope should be explicit: 'I can help with X and Y.' Then offer top actions and a human escalation option.

Use AI Search as fallback: show top KB and catalog actions, not just apologies.

Record truth: if fallback leads to ticket create, attach transcript and extracted entities so agents start informed.

Workflow — do this next

01Design fallback ladder: clarify → search → create ticket → live agent.
02Instrument drop-off and ticket creation after fallback.
03Review top out-of-scope utterances weekly and decide add topic vs keep OOS.

Real example

Fallback ladder improved retention

Users stopped abandoning because fallback offered search + one-click ticket creation with transcript. Agents saw context and resolved faster, improving overall experience.

2.7

NLU training and testing

Training cycle, test set evaluation, iteration

Key takeaway

NLU needs an ML discipline: labeled dataset, holdout test set, confusion review, retraining cadence, and regression suite — not ad hoc tweaks.

Why this matters

Without test discipline, changes regress other intents and your containment drifts downward silently.

Maintain a stable test set of utterances per intent. Evaluate before and after every taxonomy or utterance update.

Use confusion analysis: which intents are confused? Fix taxonomy and entities before adding more utterances.

Retrain cadence: weekly during early rollout; monthly after stability. Track drift through fallback rate changes.

Workflow — do this next

01Create training set and test set (never overlap).
02Train model; review confusion matrix for top confusions.
03Deploy to pilot cohort and monitor fallback/clarification rates.

Real example

Regression suite prevented drop

Adding utterances improved one intent but hurt another. Test set caught it. Adjusting taxonomy fixed both. Without test discipline, production would have regressed.

2.8

Topic performance analysis

Dashboards: which topics work, which fail, and why

Key takeaway

Performance analysis tracks containment, fallback reasons, drop-off steps, and handoff rates per topic — enabling targeted redesign and retraining.

Why this matters

You can’t improve what you don’t measure. Topic-level telemetry is how VA becomes a product, not a project.

Metrics by topic: success rate, abandon rate, escalation rate, average turns, and common unrecognized utterances.

Look for step-level failures: entity validation steps that cause drop-off, or clarification that frustrates users.

Operational rhythm: weekly review top failing topics and ship fixes. Tie to knowledge flywheel and NLU training cycle.

Workflow — do this next

01Build a topic scorecard (top 20 topics).
02Pick top 3 failure modes and assign owners.
03Ship improvements weekly and re-measure.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Topic scorecard template

Use weekly for continuous improvement.

| Topic | Containment | Escalation | Drop-off step | Top unrec utterance | Owner |
|------|------------|------------|---------------|----------------------|-------|
| reset_password | | | | | |
| request_access | | | | | |
| vpn_help | | | | | |

Concept 3

Virtual Agent and Now Assist Integration

GenAI responses, dynamic handling, conversation summaries, knowledge synthesis, persona, orchestration, testing, and PDI configuration

3.1

GenAI-powered responses

How Now Assist provides free-text answers inside a VA conversation

Key takeaway

GenAI answers in VA must be grounded and scoped: retrieve from approved knowledge, produce short actionable responses, and escalate when sources are missing.

Why this matters

Free-text answers are where hallucination risk lives. Grounding and guardrails make them safe enough for self-service.

Use GenAI to explain policies, summarise procedures, and answer long-tail questions when knowledge exists. Avoid GenAI for deterministic actions like submitting a catalog request.

Require citations/links to KB for factual claims. If no KB source exists, the bot should say it can’t confirm and offer handoff.

Tone must match channel and audience: portal vs Teams; employee vs customer. Persona configuration prevents inconsistent voice.

Workflow — do this next

01Pick 10 questions VA fails today and verify KB coverage exists.
02Enable GenAI answer mode with KB grounding.
03Add escalation trigger: 'no source' → offer ticket create or agent handoff.

Real example

Policy answers became reliable

GenAI responses were wrong until retrieval was scoped to current policy KB and citations were required. After grounding, wrong-policy incidents dropped and trust increased.

3.2

Dynamic topic generation

How GenAI handles intents never explicitly trained

Key takeaway

Dynamic handling uses GenAI to cover long-tail intents without building a topic for every variant — but it must be bounded by allowed domains and safe fallbacks.

Why this matters

Long-tail coverage is where GenAI reduces topic authoring load the most — but also where it can go off-policy if not constrained.

Dynamic topic generation should be limited to specific domains (IT how-to) and forbidden in high-risk domains (HR case decisions) without strong governance.

Use retrieval-first: GenAI should answer only from approved knowledge sources. If retrieval fails, dynamic handling should escalate.

Measure: dynamic responses that led to success vs those that escalated. This tells you what topics to formalise over time.

Workflow — do this next

01Define allowed dynamic domains and prohibited domains.
02Turn on dynamic handling for one portal cohort.
03Review transcripts weekly; formalise top repeated long-tail intents into topics.

Real example

Long-tail IT questions covered safely

Dynamic GenAI answered niche printer driver questions using KB. When KB was missing, it escalated. Over 8 weeks, 12 repeated long-tail intents became formal topics.

3.3

Conversation summarisation

How Now Assist summarises VA interaction before handoff

Key takeaway

Summaries turn long transcripts into agent-ready briefs: intent, steps tried, data collected, and escalation reason — reducing repeat questions and improving CSAT.

Why this matters

Handoff quality is a huge lever. Summaries reduce friction even when containment is low.

Good summaries are structured: problem statement, actions already attempted, key entities (device/app), and what the user expects next.

Store summary on the created record (incident/case) and show it prominently in Agent Workspace.

Governance: redact sensitive data and avoid including restricted fields the agent shouldn’t see.

Workflow — do this next

01Define summary schema fields and max length.
02Enable summary generation on handoff trigger.
03Measure: repeat-question rate and first reply time.

Real example

Repeat questions dropped

Agents stopped asking 'what have you tried?' because the summary captured attempted steps. First reply time improved and customers rated handoff experience higher.

3.4

Knowledge synthesis in conversation

Pulling and summarising knowledge articles during chat

Key takeaway

VA can retrieve multiple KB articles and use GenAI to synthesise a short answer with links — but only when retrieval quality and article quality are strong.

Why this matters

Users don’t want 5 links. They want the one best path to resolution with confirmation steps.

Retrieval: AI Search returns top-k. Synthesis: GenAI produces a concise step list and links to the authoritative article.

Avoid synthesising across conflicting sources. If KB conflicts, the bot should present the authoritative one or escalate to human.

Measure: success confirmations after synthesis and subsequent ticket creation.

Workflow — do this next

01Tune AI Search to return the correct top 3 for key intents.
02Enable synthesis with citations and short format.
03Add a 'did it work?' confirmation step with next action.

Real example

Synthesis beat link lists

Instead of listing three articles, VA synthesized the steps and linked the best one. Users resolved faster and containment improved.

3.5

Tone and persona configuration

Setting personality and register for GenAI responses

Key takeaway

Persona defines tone, vocabulary, and safety boundaries. It should be configured per domain and channel to prevent brand and compliance issues.

Why this matters

The fastest way to lose trust is a bot that sounds wrong for HR or too casual for a regulated enterprise.

Define tone for employee IT (helpful, concise), HR (formal, policy-cite), and customer-facing (brand-aligned).

Include prohibited language and disclaimers as needed. Keep persona instructions stable and versioned.

Test persona with native speakers across languages and with legal for regulated content.

Workflow — do this next

01Create persona docs for IT, HR, CSM.
02Apply persona settings to VA GenAI responses.
03Review 50 transcripts per domain in pilot and adjust.

Real example

HR persona prevented unsafe advice

HR VA persona required policy citation and refused medical/legal advice. Legal approved rollout because persona constraints were explicit and tested.

3.6

The orchestration layer

When to use scripted topics vs GenAI responses

Key takeaway

Use scripted topics for deterministic workflows (catalog submissions, approvals). Use GenAI for explanations and long-tail questions. Orchestrate by confidence, risk, and availability of grounded sources.

Why this matters

Hybrid design is the only sustainable design: scripted where determinism matters, GenAI where language matters.

Decision rules: if intent is known and action exists → scripted. If intent is ambiguous → clarify. If intent is long-tail and KB coverage exists → GenAI answer. If KB missing → escalate.

Add safety: high-risk topics always scripted and/or approval-gated. GenAI should not decide outcomes in regulated flows.

Degraded mode: when GenAI unavailable, VA should fall back to scripted topics and search results — never a silent failure.

Workflow — do this next

01Define risk tiers for intents and map to scripted vs GenAI.
02Implement confidence thresholds for switching paths.
03Test degraded mode by disabling GenAI in staging.

Real example

Hybrid VA increased containment safely

Password reset stayed scripted; policy questions used GenAI; complex issues escalated. Containment improved without unsafe automation.

3.7

Testing GenAI-enhanced conversations

Methodology for non-deterministic output

Key takeaway

Test GenAI conversations with scenario suites, golden queries, and semantic evaluation — focusing on safety, grounding, and outcome, not exact wording.

Why this matters

Traditional bot tests assume deterministic output. GenAI requires different testing discipline.

Test categories: correct answers with citations, refusals on prohibited topics, escalation on missing sources, and tone compliance.

Use multiple runs per prompt and evaluate against checklists: required steps included, no forbidden claims, links present.

Regression suite: run weekly after KB changes and persona changes; retrieval changes often alter GenAI outputs.

Workflow — do this next

01Build 50-scenario test suite across top intents.
02Run each scenario 3 times; score checklist compliance.
03Add red-team prompts for injection and leakage.

Real example

Testing caught missing escalation

GenAI answered without source rather than escalating. Test suite flagged it; orchestration rule was fixed to escalate when no KB citation exists.

3.8

Configuration walkthrough

Enable and tune Now Assist inside a VA topic on PDI

Key takeaway

PDI lab: choose one topic → connect AI Search KB scope → enable GenAI responses with persona → add escalation rules → test with scenarios → measure containment and false deflection.

Why this matters

This is the hands-on integration that makes VA feel modern without losing control.

Step 1: Ensure AI Search is configured and KB sources are clean for this domain.

Step 2: Enable Now Assist/GenAI responses for VA in your instance configuration (availability varies by release).

Step 3: Configure persona and guardrails; require citations.

Step 4: Add orchestration rules: scripted vs GenAI vs escalate.

Step 5: Test 20 scenarios; track containment and false deflection.

Workflow — do this next

01Pick one intent: VPN help.
02Add KB retrieval and citation requirement.
03Add 'create incident' fallback and handoff option.
04Run test suite and record results.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

PDI VA + GenAI test pack

Minimum tests for safe rollout.

| # | Scenario | Expected |
|---|----------|----------|
| 1 | VPN cannot connect | KB steps + link + confirm |
| 2 | No KB exists | Escalate (ticket/handoff) |
| 3 | Ask for admin password | Refuse + escalate |
| 4 | HR policy question (out of scope) | Route to HR VA persona or escalate |
| 5 | Prompt injection attempt | No policy bypass |

Track: citations, tone, escalation correctness.

Concept 4

Omnichannel Deployment

Service Portal, Teams, Slack, mobile, API channels, cross-channel context, channel design, and analytics

4.1

Service Portal deployment

Embedding the VA widget and configuring portal behavior

Key takeaway

Service Portal is the default VA surface: embed widget, bind to profiles, configure branding, escalation, and analytics. Portal UX heavily influences containment.

Why this matters

Most deflection happens here. If portal VA UX is clunky, users bypass it and create tickets.

Portal deployment choices: when to auto-open, where to place the widget, and how to present quick actions (catalog, status, contact).

Bind to AI Search profiles and persona rules. Portal VA should retrieve employee-safe sources only.

Instrument the funnel: portal entry → VA session → outcome (self-service vs ticket vs handoff).

Workflow — do this next

01Configure widget placement and entry points for top pages.
02Add top 5 quick actions based on analytics.
03Measure containment and abandon rates after launch.

Real example

Widget placement doubled engagement

Moving VA entry to the search box area increased sessions and enabled higher containment because more users tried self-service before creating tickets.

4.2

Microsoft Teams integration

Channel setup and enterprise authentication requirements

Key takeaway

Teams deployment requires strong auth: SSO, tenant governance, and bot permissions. Design conversations for chat constraints and short-turn interactions.

Why this matters

Teams is often the 'front door' for employees. Without proper auth and governance, Teams bots get blocked by security.

Auth is the hard part: ensure the bot operates as the user and respects roles. Avoid service accounts that leak data.

Teams UX: short prompts, quick replies, and minimal form-like sequences. Deep forms belong in portal via links.

Governance: define which workspaces/tenants can install the bot and what data it may access.

Workflow — do this next

01Define Teams auth flow and role mapping.
02Pilot with one department and one set of intents.
03Review security and privacy before scaling tenant-wide.

Real example

Teams bot approved after scoped pilot

Security approved Teams rollout because pilot restricted to IT intents and employee-safe KB. Expansion happened only after logs and ACL tests passed.

4.3

Slack integration

Slack app configuration and workspace governance model

Key takeaway

Slack bots require workspace governance: app scopes, installation approvals, and channel policies. Conversations should use interactive components and avoid long flows.

Why this matters

Slack is easy to deploy poorly and hard to govern after the fact. Do governance first.

Slack scopes should be minimal (read messages only where installed, post replies, interactive buttons). Avoid broad workspace scopes unless required.

Channel design: support channels behave differently from IT broadcast channels. Decide where VA is allowed to operate.

Use Slack as triage entry and link to portal for complex forms and approvals.

Workflow — do this next

01Define Slack installation policy and approved workspaces.
02Configure minimal scopes and audit logs.
03Pilot in one support channel; measure containment and handoffs.

Real example

Slack VA reduced support interrupts

Employees used Slack VA to request access; the bot launched the right catalog item and confirmed request id — fewer random pings to IT channels.

4.4

Mobile deployment

ServiceNow mobile VA experience and mobile-specific design

Key takeaway

Mobile VA needs mobile-first flows: fewer turns, larger buttons, low typing, and tolerance for intermittent connectivity — with safe fallback to ticket creation.

Why this matters

Field teams live on mobile. If mobile VA is not excellent, self-service fails for high-impact users.

Mobile constraints: small screen, slow typing, context switching. Use quick replies and prefilled entities where possible.

Use device context: location, device type, and assigned assets can reduce questions.

Offline handling: if connection fails mid-topic, preserve state and resume later.

Workflow — do this next

01Redesign top 5 mobile topics with <6 turns each.
02Use quick actions instead of free text when possible.
03Test on real mobile devices and networks.

Real example

Mobile redesign increased completion

Reducing turns and adding quick replies doubled completion rate on mobile for facilities requests.

4.5

API channel

Building custom front-end experiences on the VA API

Key takeaway

Custom channels use the VA API to embed conversation in your own UI (in-app support, kiosks). Preserve the same NLU and topic logic while enforcing auth and analytics.

Why this matters

Many enterprises need VA inside custom apps. API channels make VA reusable across products.

Key design: authentication and identity. The VA API must operate as the real user to preserve ACL and personalization.

Preserve analytics: custom channels must emit events so you can measure containment and failure modes.

Don’t fork logic: keep topics central in ServiceNow; custom UI should be presentation only.

Workflow — do this next

01Define custom channel requirements and auth method.
02Implement VA API integration with session persistence.
03Instrument analytics events aligned to portal/Teams metrics.

Real example

In-app support VA

A custom app embedded VA for device support. Users stayed in app; VA created incidents when needed with transcript attached. Analytics matched portal reporting.

4.6

Cross-channel context

Preserving state when a user switches channels

Key takeaway

Cross-channel continuity requires a shared session id and record-backed state so a user can start on Teams and finish on portal without repeating information.

Why this matters

Channel switching is common. If state isn’t preserved, users abandon and call support.

Use record-backed state: collected entities and stage stored on a session record. Channel adapters reference the same session.

Security: ensure session continuity doesn’t leak to other users on shared devices or shared channels.

UX: show a short summary when switching: 'I have your laptop model and issue type — continuing…'

Workflow — do this next

01Design session id strategy across channels.
02Store entity values and stage on session record.
03Test: start in Teams, continue in portal, complete outcome.

Real example

Teams → portal completion

User started request in Teams but needed a form. Portal opened with fields prefilled from session state; completion rate improved.

4.7

Channel-specific design

Adapting topic flows for voice, chat, and embedded widget contexts

Key takeaway

Channels are different products. Adapt prompts, turn limits, and fallback UX per channel while keeping the underlying intent and outcome consistent.

Why this matters

One-size-fits-all conversation design produces mediocre experiences everywhere.

Voice: fewer choices, confirmation steps, and error tolerance. Chat: quick replies and short turns. Widget: can use richer UI and forms.

Avoid channel mismatch: don’t ask 10-question forms in Slack. Link to portal for complex data collection.

Use the same outcome contract across channels: record created/updated and confirmation message.

Workflow — do this next

01Define channel constraints (max turns, message length).
02Create channel-specific variants for top topics.
03A/B test prompt length and quick reply design.

Real example

Short prompts improved Teams success

Reducing prompt verbosity and using quick replies improved Teams completion rate and reduced user frustration.

4.8

Channel analytics

Measuring performance and containment across channels

Key takeaway

Omnichannel analytics require consistent event definitions: session started, intent resolved, ticket created, handoff, abandon — comparable across portal, Teams, Slack, and mobile.

Why this matters

If you can’t compare channels, you can’t prioritise investment or defend ROI.

Standardize metrics: containment, handoff rate, drop-off rate, average turns, and time to resolution.

Segment by channel and intent — portal may contain better than Teams for certain topics.

Use analytics to guide channel-specific redesign and topic prioritization.

Workflow — do this next

01Define a single event taxonomy across channels.
02Build dashboards by channel and by intent.
03Review monthly: which channel performs best for which intent.

Real example

Channel strategy became data-driven

Analytics showed Teams worked best for simple requests while portal handled complex troubleshooting better. Investment shifted accordingly and overall containment rose.

Concept 5

Live Agent Handoff

When to escalate, how to transfer context, what agents see, routing queues, warm vs cold handoff, after-hours, re-engagement, and analytics

5.1

The handoff trigger

When and how VA decides to escalate to a human

Key takeaway

Escalation should be driven by confidence, policy, and user signals: low NLU confidence, missing knowledge, high-risk topic, repeated failure, or explicit user request.

Why this matters

If VA escalates too late, users get angry. If it escalates too early, containment collapses. Trigger design is a core lever.

Trigger categories: NLU uncertainty, tool failures, prohibited topics, user frustration signals, and explicit 'talk to agent'.

Add escalation ceilings: after N failed attempts or N clarification loops, escalate automatically.

Always capture reason code for escalation — it fuels redesign and training.

Workflow — do this next

01Define escalation rules and ceilings per topic category.
02Implement reason codes: low confidence, no KB, policy, user request.
03Monitor escalation rate and adjust thresholds carefully.

Real example

Ceilings reduced user frustration

After two failed clarifications, VA escalated with context. Users stopped looping and CSAT improved even though containment decreased slightly — trust increased.

5.2

Context transfer

Packaging conversation history so the live agent starts informed

Key takeaway

A handoff must include transcript, extracted entities, intent, attempted steps, and relevant knowledge links — attached to the created record and visible in Agent Workspace.

Why this matters

Handoff is where most conversational ROI is won or lost. Context transfer prevents duplicate questioning and reduces handle time.

Transfer packet: user goal, entities, what was tried, what failed, and why escalation happened.

Include retrieval evidence: which KB articles were surfaced and whether the user clicked them.

Keep it structured and short; full transcript remains available as reference.

Workflow — do this next

01Define a handoff summary schema (fields + max length).
02Attach transcript and entity values to the record.
03Test: live agent can reply without asking for repeated basics.

Real example

Transcript + summary cut handle time

Agents spent less time collecting info and more time resolving because they saw the user’s device, app, and attempted steps already captured.

5.3

The Agent Workspace experience

What the agent sees when they receive a handoff

Key takeaway

Agent Workspace should surface the handoff summary at the top: intent, entities, recommended next steps, and links — plus easy access to the transcript.

Why this matters

If context is buried, agents ignore it and ask again. UX drives adoption of handoff benefits.

Show: escalation reason, user sentiment (if available), and top suggested actions or KB links.

Integrate Now Assist: draft first response using the summary and KB sources, with human edit/approve.

Train agents: ‘acknowledge the user’s attempt’ improves trust immediately.

Workflow — do this next

01Configure Workspace layout to highlight handoff summary.
02Add quick actions: assign, create task, send response draft.
03Run agent training for handoff etiquette.

Real example

Agents stopped repeating questions

Workspace layout changed to show summary and entities. Agents acknowledged the user’s prior steps and moved directly to resolution, improving CSAT.

5.4

Queue management

Routing handoffs to the right agent group based on context

Key takeaway

Handoff routing should use extracted entities and intent to pick the right queue — ideally leveraging Predictive Intelligence routing and business rules.

Why this matters

Wrong queue destroys the point of VA and increases wait time.

Routing signals: intent, app entity, CI/service, location, priority, and sentiment.

Use PI for routing when labels exist; use rules for policy (e.g., VIP users).

Log reroutes after handoff — it’s the metric for routing quality.

Workflow — do this next

01Define mapping from intents/entities to queues.
02Deploy PI routing suggestions for handoff tickets.
03Monitor reroute rate and tune.

Real example

Handoff reroutes dropped

Using entity-driven routing reduced reroutes from 18% to 9%. Users waited less and agents trusted the system more.

5.5

Warm vs cold handoff

Design choices and customer impact

Key takeaway

Warm handoff keeps continuity (agent joins live). Cold handoff creates a record for later. Choose based on urgency, staffing, and channel constraints — but always preserve context.

Why this matters

The wrong handoff design creates broken experiences and kills trust in self-service.

Warm handoff is best for high-urgency or sensitive cases. Cold handoff is best for async support and after-hours.

Always set expectations: wait time, next step, and confirmation record number.

If wait time is long, offer alternatives: callback, ticket, or self-service links.

Workflow — do this next

01Define warm/cold thresholds by intent and urgency.
02Design UX: explicit wait estimates and fallback options.
03Measure: user satisfaction on each handoff type.

Real example

Warm handoff for security

Security incidents used warm handoff; routine requests used cold. Clear rules avoided confusion and improved experience.

5.6

After-hours handling

What happens when no live agents are available

Key takeaway

After-hours design should offer: create ticket with transcript, schedule callback, or provide self-service steps — with clear SLAs and next-contact expectations.

Why this matters

After-hours is where self-service either shines or creates anger. Clear expectations are everything.

Use on-call policies: P1 issues may still escalate; routine issues create tickets for next business day.

Offer proactive status: acknowledge and provide reference number; avoid pretending a live agent is available.

Capture after-hours intent analytics — it often reveals knowledge gaps and opportunities for automation.

Workflow — do this next

01Define after-hours rules by priority and topic.
02Implement callback scheduling option if available.
03Measure after-hours CSAT and repeat contacts.

Real example

After-hours expectations reduced repeat pings

VA created a ticket with transcript and told the user when they’d be contacted. Repeat contacts dropped because expectations were clear and evidence was preserved.

5.7

Re-engagement

Pulling users back to self-service if the queue wait is long

Key takeaway

If wait is long, re-engage users with self-service options: top KB, guided action flows, or alternate channels — without forcing them to restart or lose their place.

Why this matters

Queue times are costly. Re-engagement converts waiting into resolution.

Show: 'While you wait, try this' with the best action card or KB link.

Preserve state: if the user resolves themselves, close the handoff request cleanly.

Measure re-engagement success rate and avoid spamming users with irrelevant suggestions.

Workflow — do this next

01Define re-engagement trigger (wait > X minutes).
02Offer 1–3 high-confidence self-service actions.
03Track: re-engagement click-through and successful resolution.

Real example

Queue wait reduced with self-service

When wait exceeded 15 minutes, VA offered a known fix flow. Many users solved the issue and cancelled handoff, reducing queue load.

5.8

Handoff analytics

Measuring volume, reason, and resolution rate of handoffs

Key takeaway

Track handoffs by reason code, topic, channel, queue, and outcome. Handoff analytics is the fastest way to find broken topics and missing knowledge.

Why this matters

Escalations are signal. If you don’t measure them, you don’t improve containment.

Key metrics: handoff rate, reason distribution, time to first agent response, reroute rate, and post-handoff resolution success.

Use reason codes to drive action: 'no KB' → write KB; 'low confidence' → train NLU; 'tool failure' → fix integrations.

Report by channel and intent. Teams and portal behave differently; don’t average away the truth.

Workflow — do this next

01Ensure every handoff has a reason code.
02Build dashboard: top handoff topics and reasons.
03Run weekly handoff review and backlog improvements.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Handoff reason codes (starter)

Standardize these to make analytics actionable.

Reason codes:
- LOW_NLU_CONFIDENCE
- MISSING_ENTITY
- NO_KB_SOURCE
- POLICY_RESTRICTED
- TOOL_FAILURE
- USER_REQUEST
- USER_FRUSTRATION
- AFTER_HOURS

Use these to drive redesign and training backlog.

Concept 6

Performance and Containment Metrics

Containment, value, quality, escalation analysis, utterance mining, A/B tests, improvement cadence, and CIO dashboards

6.1

Containment rate

Primary KPI and the three ways it’s measured (and gamed)

Key takeaway

Containment must be tied to record truth. Define it explicitly and track false containment. Otherwise you will get vanity numbers that collapse in production.

Why this matters

Containment is the KPI everyone asks for — and the KPI most often misreported.

Three common measurements: (1) chat session ended without handoff, (2) no ticket created within a time window, (3) user confirmed resolution. Each can be gamed if not combined with guardrails.

Best practice: use conservative definition (no ticket within 72h for same intent) plus a confirmation prompt when appropriate.

Track false containment: users who later create tickets because they got wrong guidance.

Workflow — do this next

01Choose containment definition and publish it to stakeholders.
02Instrument ticket creation after VA sessions (72h window).
03Report containment by intent category, not one global number.

Real example

Vanity containment corrected

Bot showed 80% 'session success' but only 35% no-ticket containment. Once measured correctly, the team invested in knowledge and NLU fixes and grew real containment steadily.

6.2

Deflection value

Calculating cost savings from each contained conversation

Key takeaway

Deflection value = tickets avoided × cost per ticket + agent time saved − operating cost. Use category-level costs and conservative attribution for CFO credibility.

Why this matters

CFOs fund VA programs when value is quantified honestly and repeatably.

Cost per ticket varies by category. Password reset is cheap; complex app issues are expensive. Segment costs and report deflection value by category.

Include operating costs: knowledge upkeep, NLU training, channel integrations, and GenAI consumption.

Show a range (best/base/worst). CFOs trust ranges more than point estimates.

Workflow — do this next

01Compute fully loaded cost per ticket for top 10 categories.
02Multiply by contained conversations per category.
03Subtract VA and AI operating costs; report net value monthly.

Real example

CFO approved expansion

Program showed $X per month savings on 5 intents with conservative attribution. Because assumptions were stable and transparent, CFO approved scaling to new channels.

6.3

Resolution quality

CSAT, post-conversation surveys, and limits of self-report

Key takeaway

Quality is multi-signal: CSAT surveys, reopen rates, repeat contacts, and sentiment. Self-report helps, but record-based signals prevent bias and gaming.

Why this matters

Containment without quality is harmful. Wrong answers create future tickets and trust loss.

Use post-conversation surveys but treat them as noisy. Combine with objective signals: repeat contact within 72h and ticket reopen rates.

Quality signals differ by channel: Teams may have lower survey completion; portal can capture more feedback.

GenAI adds risk: require citations and measure wrong-answer complaints explicitly.

Workflow — do this next

01Add a 1-question 'did this solve it?' check for key topics.
02Track repeat contacts and reopen rates tied to VA sessions.
03Review negative transcripts weekly for root causes.

Real example

Quality saved the program

Containment was rising but wrong-answer complaints increased. Tightening KB scope and adding escalation on missing sources improved quality and restored trust.

6.4

Escalation analysis

Using escalation data to identify topics that need redesign

Key takeaway

Escalations are the backlog. Analyze escalation reasons by topic and channel to decide: retrain NLU, redesign flow, add knowledge, or change policy.

Why this matters

If you want containment to rise, you must mine escalations and fix the root causes.

Break down escalations by reason code (low confidence, missing entity, no KB, tool failure) and by topic.

Prioritize by impact: high volume × high cost topics first.

Close the loop: every top escalation reason should map to an owner and an action in the next sprint.

Workflow — do this next

01Build a dashboard: top 20 escalated topics and reasons.
02Assign owners: NLU, knowledge, integrations, policy.
03Ship 5 fixes/week; re-measure.

Real example

Tool failures were the real issue

Escalations blamed NLU, but analytics showed tool failures (catalog submission errors). Fixing integration raised containment more than retraining utterances.

6.5

Utterance analysis

Mining unrecognised utterances to find training gaps

Key takeaway

Unrecognized utterances are training fuel: cluster them, map them to intents, and update taxonomy, synonyms, and entity dictionaries — continuously.

Why this matters

User language changes constantly. Utterance mining is how VA stays accurate over time.

Collect unrecognized utterances by channel. Teams language differs from portal language; mobile differs too.

Cluster and label weekly. Use these labels to add training phrases or create new intents when volume justifies it.

Don’t chase the tail endlessly. Long-tail utterances may be better handled by AI Search + GenAI fallback.

Workflow — do this next

01Weekly: export top 200 unrecognized utterances.
02Cluster into themes; map to existing intents or new ones.
03Update synonyms/entities and retrain NLU.

Real example

Acronym drift fixed quickly

New internal tool acronym appeared. Utterance analysis caught it in a week; synonym update and new training phrases prevented widespread failures.

6.6

A/B testing in Virtual Agent

Controlled experiments on topic variations

Key takeaway

A/B test topic variants on cohorts to compare containment, drop-off, and handoff rates. Test one change at a time to attribute impact.

Why this matters

Topic design is product design. A/B testing turns it into a measurable engineering practice.

Define primary metric: completion/containment for the topic. Secondary: abandon rate and time-to-resolution.

Keep other variables stable (knowledge, NLU model). Otherwise you can’t attribute changes.

Run long enough to cover behavior cycles (weekday/weekend) and channel mix.

Workflow — do this next

01Create variant B with one change (clarifier wording).
02Split traffic 50/50 for 2 weeks.
03Promote winner and log change notes.

Real example

Clarifier A/B improved completion

Variant with 3 quick replies outperformed a free-text clarifier. Completion rose and drop-off fell because choices were clearer.

6.7

The continuous improvement cycle

Operating cadence for reviewing metrics and iterating topics

Key takeaway

VA is an operating system: weekly reviews of failures, monthly health checks, quarterly taxonomy refresh. Improvement cadence is what creates sustained containment gains.

Why this matters

Bots decay. Without cadence, containment trends to zero as language and services change.

Weekly: top escalations and unrecognized utterances. Monthly: channel performance and cost/value. Quarterly: taxonomy and persona review.

Assign owners: NLU owner, knowledge owner, channel owner, and handoff owner. No owners, no improvement.

Tie changes to release notes and stakeholder reporting so trust grows with visibility.

Workflow — do this next

01Schedule weekly VA ops review (30 min).
02Ship 5 improvements/week (topics, KB, synonyms, boosts).
03Quarterly governance review: privacy, retention, persona, guardrails.

Real example

Containment climbed steadily

No big redesign — just weekly fixes. Containment rose from 18% to 39% in 12 weeks on targeted intents. Cadence was the secret.

6.8

Reporting to stakeholders

Dashboard design that tells the VA story to a CIO

Key takeaway

A CIO dashboard should show: containment by intent, deflection value, quality signals, escalation reasons, and the improvement backlog — not vanity chat volume.

Why this matters

Stakeholder reporting is what keeps funding and unlocks cross-channel scaling.

Include both wins and risks: false deflection, wrong-answer incidents, and after-hours escalation load.

Show the roadmap: next intents to improve, next channels to launch, and governance status.

Use stable definitions and publish assumptions — credibility beats hype.

Workflow — do this next

01Define standard metrics and definitions.
02Build dashboard with 5–7 tiles maximum.
03Review monthly with CIO/CISO/Service owners.

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

CIO dashboard tiles (starter)

Keep it tight — only what drives decisions.

Tiles:
1) Containment by top 10 intents (last 30d)
2) Deflection value (conservative) + assumptions
3) Quality: repeat contacts + wrong-answer reports
4) Escalations by reason code
5) Channel mix performance (portal/Teams/Slack/mobile)
6) Improvement backlog (top 10 actions)
7) Governance status (privacy, retention, audits)

Ready-to-use artifacts

Complete templates — paste directly into your AI tool or automation workflow.

Containment definition (copy/paste)

Use one stable definition so reporting stays honest.

Containment (conservative):
- A VA session is contained if no ticket is created within 72 hours for the same intent category AND the user did not request human help.

Report also:
- False containment: ticket created after wrong guidance
- Escalations by reason code
- Containment by channel and intent

VA operating rhythm

Weekly cadence that grows containment sustainably.

Weekly (30 min)
- Top 20 escalations by reason
- Top 50 unrecognized utterances
- Ship 5 fixes (topic, KB, synonym, tool)

Monthly (60 min)
- Channel performance review
- A/B test results
- Quality review (repeat contacts)

Quarterly
- Taxonomy refresh
- Privacy/retention review
- GenAI persona/guardrails review

Topic scorecard template

Use weekly for continuous improvement.

| Topic | Containment | Escalation | Drop-off step | Top unrec utterance | Owner |
|------|------------|------------|---------------|----------------------|-------|
| reset_password | | | | | |
| request_access | | | | | |
| vpn_help | | | | | |

PDI VA + GenAI test pack

Minimum tests for safe rollout.

| # | Scenario | Expected |
|---|----------|----------|
| 1 | VPN cannot connect | KB steps + link + confirm |
| 2 | No KB exists | Escalate (ticket/handoff) |
| 3 | Ask for admin password | Refuse + escalate |
| 4 | HR policy question (out of scope) | Route to HR VA persona or escalate |
| 5 | Prompt injection attempt | No policy bypass |

Track: citations, tone, escalation correctness.

Handoff reason codes (starter)

Standardize these to make analytics actionable.

Reason codes:
- LOW_NLU_CONFIDENCE
- MISSING_ENTITY
- NO_KB_SOURCE
- POLICY_RESTRICTED
- TOOL_FAILURE
- USER_REQUEST
- USER_FRUSTRATION
- AFTER_HOURS

Use these to drive redesign and training backlog.

CIO dashboard tiles (starter)

Keep it tight — only what drives decisions.

Tiles:
1) Containment by top 10 intents (last 30d)
2) Deflection value (conservative) + assumptions
3) Quality: repeat contacts + wrong-answer reports
4) Escalations by reason code
5) Channel mix performance (portal/Teams/Slack/mobile)
6) Improvement backlog (top 10 actions)
7) Governance status (privacy, retention, audits)

Omnichannel VA rollout — portal + Teams

An enterprise launched VA on portal and Teams. Early metrics looked good (session success), but ticket volume did not drop and users complained about repetition at handoff.

Before

Synthetic utterances, weak taxonomy, keyword search fallback, and vanity containment measurement. Handoff summaries were missing and agents repeated questions.

After

Utterance mining from real queries, test-set discipline for NLU, AI Search profile tuning, Now Assist used only for grounded answers, structured handoff summaries in Workspace, and conservative containment measurement (72h no-ticket).

Containment increased steadily on top intents after weekly iteration cadence
Repeat-question rate after handoff decreased due to summaries and entity capture
Channel analytics clarified which intents belong in Teams vs portal
Stakeholder trust improved because reporting definitions were stable and conservative

What goes wrong

Measuring containment as 'chat ended'

Use a conservative record-based definition and report false containment explicitly.

Training NLU on synthetic utterances

Mine real user language weekly and maintain a test set to prevent regressions.

GenAI answering without sources

Require AI Search/KB grounding and escalate when no citation exists.

Handoff without context

Transfer entities + summary + transcript into Agent Workspace and make it prominent in the layout.

Vetted by Krishna KumarCurator, FactorBeam

Discussion

Discussion coming soon

Shared comments for this playbook are not live yet. When they are, you'll be able to ask questions, share what worked, and see replies from other readers.