Standalone article · part of a sequenced guide
What you'll unlock: Production Cowork is 60–80% automated prep with explicit human gates — reliability and governance are features, not paperwork.
Production-Grade Cowork Operations
Running Cowork as a serious operational system — reliability, governance, and the discipline that makes automation trustworthy
Chapter context
Your Cowork pilots work on your machine — then someone asks 'can we bet the Monday board pack on this?' Chapter 5 is the answer: production-grade operations.Without this chapter, automation stays a hero skill; with it, Cowork becomes operational infrastructure finance and security can endorse.
Is this chapter for you?
Will stakeholders act on output without re-checking sources?
Production readiness gate (§1.1) + validation Skill (§1.2) mandatory.
Does workflow touch confidential data or external sends?
T3 approval + data classification (§2.2–2.4) before schedule.
Are multiple people creating Skills?
Team RBAC + CoE review (§2.6, §3.5) before shared production.
Does leadership ask for ROI on AI spend?
Token ledger (§1.7) + ROI worksheet (§3.6) from month one.
Chapters 1–4 built skills, schedules, pipelines, and connectors. Chapter 5 is how you run Cowork as production infrastructure — trustworthy enough for leadership and finance to depend on.You will implement validation, alerting, review queues, minimum access, change management, and the organisational patterns that compound over years.
Chapter insight
Production Cowork is 60–80% automated prep with explicit human gates — reliability and governance are features, not paperwork.
Reference diagrams
Production workflow gates
Nothing reaches production without validation — uncertainty routes to review.
Governance layers
Access, data class, audit — stacked before unattended schedules.
Implementation paths
Trust comes from gates and logs — not from hoping the model behaved.
Concept 1
Reliability & Quality Control
Making Cowork workflows reliable enough to run without supervision — the engineering discipline for automated operations
1.1
The reliability requirements for automated Skills
What you need to be true before a Skill runs without review
Key takeaway
Production gate: five clean runs, defined acceptance tests, failure policy, owner, and HITL tier — all documented before unattended schedule.
Why this matters
Teams schedule on enthusiasm; production requires a checklist security and finance would sign.
Before unattended production, verify: (1) TAR spec signed off. (2) Test matrix passed (Ch 2 §1.8). (3) Output validation Skill attached. (4) Alert on failure configured. (5) Rollback path documented. (6) Token estimate within budget. (7) HITL tier explicit.
Reliability is tiered: T0 auto-deliver only for low-stakes internal transforms; T1 notify; T2 staging review (default); T3 approval flag before external write.
Label workflow specs NATIVE vs PATTERN so reviewers know what the product guarantees vs what your team implements.
Workflow — do this next
- 01Complete PRODUCTION_READINESS checklist (artifact).
- 02Stakeholder sign-off on HITL tier.
- 03Enable schedule only after checklist green.
- 04Re-run checklist on any Skill version bump.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Production readiness gate
[ ] TAR spec + owner [ ] Test matrix 5/5 [ ] QA/validation Skill attached [ ] Failure alerts configured [ ] HITL tier documented [ ] Token estimate approved [ ] Rollback procedure written
1.2
Output validation
How to verify that a Skill produced acceptable output — the quality check that runs after the Skill
Key takeaway
Validation is deterministic code-style checks on outputs — schema, counts, reconciliations — not 'ask Claude if it looks good.'
Why this matters
LLM self-review of its own output is circular; production needs measurable gates.
VALIDATE_* Skills run after primary transform: required files exist, JSON schema matches, totals reconcile to source within tolerance, banned patterns absent. Emit VALIDATION.json Pipeline blocks downstream on FAIL.
Separate validation from narrative QA — numbers first, prose second. Golden fixtures from human baseline become regression oracles.
Workflow — do this next
- 01List 5–10 deterministic checks per workflow.
- 02Implement VALIDATE Skill or script gate.
- 03Block emit/notify on FAIL.
- 04Log validation results in manifest.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Validation check template
checks: - name: schema_match - name: row_count_delta < 0.2 - name: revenue_sum_matches_source - name: no_pii_in_output on_fail: stop_pipeline + notify
1.3
Failure alerting
How to know immediately when a Skill fails rather than discovering it hours later
Key takeaway
Fail loud, succeed quiet — failure alerts are immediate, paginated for critical workflows; success digests bundle.
Why this matters
Discovering failure at standup means the automation already failed the business once.
Alert payload: workflow name, run_id, failed Skill + version, error class, partial outputs path, owner @mention. Severity: P1, P2, P3
No alert on success except daily digest. Test alerts monthly — muted channels kill programs silently.
Workflow — do this next
- 01Map workflows to P1/P2/P3.
- 02Wire failure → correct channel per tier.
- 03Include run_id in every alert.
- 04Monthly fire-drill: simulate failure.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Failure alert template
[P2] Cowork FAIL: {workflow}
run_id: {id} | Skill: {name}_v{n}
error: {class} — {message}
partial: {paths}
owner: @{handle}1.4
Graceful degradation
Skills that produce partial output gracefully rather than failing silently when inputs are incomplete
Key takeaway
Degraded success is explicit — manifest flags missing inputs; output sections say 'unavailable' — never fabricate or omit silently.
Why this matters
Silent omission on missing Gmail or CSV reads produced board-ready lies — degradation must be visible.
Pattern: per-input try/fetch → on fail append to degraded_inputs[] in manifest → primary Skill still runs on available data → Result template includes Data completeness
Do not mark run SUCCESS if P1 input missing — use PARTIAL status and route to review queue.
Workflow — do this next
- 01Classify inputs required vs optional.
- 02Required missing → FAIL or PARTIAL+review.
- 03Optional missing → degrade with flag.
- 04Template section for completeness.
Real example
Monday brief with missing CRM export
manifest: degraded_inputs: [crm_export]. Brief includes Finance + Slack sections; CRM section states 'export not found — pipeline metrics omitted.' Status PARTIAL → review queue.
1.5
The review queue
Designing workflows so uncertain outputs land in a human review queue rather than going straight to a destination
Key takeaway
Review queue = folder or ticket list with review.md, confidence scores, and SLA — default destination for T2 workflows.
Why this matters
Uncertainty is normal; routing it to humans is design, not failure.
Route to queue when: validation WARN (not FAIL), classifier confidence < threshold, sensitivity keywords hit, or PARTIAL degraded status. Queue item: output files + REVIEW_BRIEF.md (what to check, 60-second scan guide).
Queue SLA: P1 items 4h, P2 24h. Stale queue alerts ops lead. Approve → move to prod path + approval.flag; reject → archive + feedback to Skill owner.
Workflow — do this next
- 01Create ~/Cowork/review_queue/{workflow}/.
- 02Define routing rules in pipeline router.
- 03REVIEW_BRIEF template per workflow.
- 04Weekly queue hygiene meeting.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
REVIEW_BRIEF.md template
# Review — {workflow} / {run_id}
## 60-second check
- [ ] Key number: ___
## Confidence / flags
## Approve → move to {prod_path}1.6
Skill regression testing
How to verify that a Skill still performs correctly after you modify it — the regression check before redeployment
Key takeaway
Golden fixtures + diff against expected outputs — rerun full test matrix on every version bump before schedule update.
Why this matters
Small TAR edits break column order or totals; regression catches drift before Monday 6am.
Maintain REGRESSION_FIXTURES/ per Skill: anonymised inputs + expected outputs (json/md). On vN+1: run all fixtures; diff key fields; human spot-check if diff tool flags change. Block promotion if golden test fails.
Changelog in SKILL_SPEC: what changed, why, which fixtures updated. Pipelines: regression entire DAG when any node bumps major version.
Workflow — do this next
- 01Capture golden fixtures at v1 promotion.
- 02Automate diff on key fields where possible.
- 03Regression gate in change management (Ch 5 §2.5).
- 04Archive old fixtures when schema versions sunset.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Regression run log
| Skill version | fixture | key_fields_match | reviewer | date | |---------------|---------|------------------|----------|------|
1.7
Token usage monitoring
Tracking how much each Skill costs in tokens — the cost visibility that prevents budget surprises
Key takeaway
Per-workflow token ledger — estimate from manual runs, alert on 2× weekly average, finance digest monthly.
Why this matters
Unmonitored schedules discover budget at invoice — too late for trust.
Track: tokens per run_id, per Skill, per workflow, per week. Cowork run history + export to TOKEN_LEDGER.csv. Compare actual vs estimate; investigate outliers (duplicate runs, missing pagination caps).
Cost controls: model routing, max_files, chunking, condition triggers vs empty cron. ROI section (§3.6) uses token cost as input.
Workflow — do this next
- 01Record baseline tokens from 5 manual runs.
- 02Set weekly budget per workflow.
- 03Alert at 80% and 120% of budget.
- 04Monthly finance review with ledger export.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
TOKEN_LEDGER.csv columns
week, workflow, run_count, tokens_total, tokens_per_run, budget, variance_pct
1.8
The production Skill audit
The periodic review of your running Skills — what to check, how often, and what to do when a Skill is underperforming
Key takeaway
Quarterly audit every production Skill: still needed, still passing regression, owner active, permissions minimal, token trend OK.
Why this matters
Skill libraries rot — duplicates, stale schedules, and zombie workflows erode trust and budget.
Audit checklist per Skill: last 10 runs success rate, validation pass rate, queue rejection rate, token trend, incident count, owner, downstream dependents. Outcomes: keep, tune, deprecate, merge duplicate.
Underperforming: freeze schedule → root cause (Task, Action, Result, input drift) → fix version → regression → re-enable. Kill Skills unused 90 days unless compliance-mandated.
Workflow — do this next
- 01Export SKILL_INDEX + run stats quarterly.
- 02Score each Skill keep/tune/kill.
- 03Assign owners for tune items.
- 04Document deprecations with migration path.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Skill audit scorecard
| Skill | success% | validation% | tokens/wk | incidents | action | |-------|----------|-------------|-----------|-----------|--------|
Concept 2
Governance & Safety
The oversight structures that make autonomous operation safe — what a professional Cowork deployment looks like
2.1
The principle of minimum access
Giving Cowork access to only the systems and files it genuinely needs for each Skill
Key takeaway
Scope per Skill, not per machine — narrow folders, connector allowlists, and model tools are the default.
Why this matters
Broad access simplifies day-one demos and complicates day-90 incident response.
Minimum access matrix: each Skill lists read paths, write paths, connectors, channels. No Skill gets 'whole Drive' or 'all Gmail.' Quarterly permission review removes paths no longer referenced in TAR specs.
Separate sandbox profile from production profile
Workflow — do this next
- 01Document allowlists in every TAR spec.
- 02Remove unused paths from Cowork settings.
- 03Quarterly access review with IT.
- 04Deny by default for new connectors.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Skill access matrix
| Skill | read | write | connectors | HITL | |-------|------|-------|------------|------|
2.2
Sensitive data handling in Cowork
What to never put in a Skill, how to handle credentials, and the data hygiene for automated workflows
Key takeaway
Never in Skills: secrets, full customer PII dumps, unretractable instructions to exfiltrate. Credentials in connector store only.
Why this matters
Skill files copy to wikis, git, and logs — treat them as published documentation.
Data classes: public, internal, confidential, restricted. Restricted workflows require T3 approval, encrypted staging if policy demands, retention limits on raw/. Redact logs; manifest records counts not content for mail/message bodies.
Prohibited in TAR: API keys, passwords, SSN patterns in examples, 'send all files to external URL.' Train teams on prompt injection via fetched docs (Ch 2 §1.7).
Workflow — do this next
- 01Classify each production workflow data tier.
- 02Scan Skills for secret patterns quarterly.
- 03Redact manifest logs for confidential runs.
- 04Incident playbook if PII leaked to wrong folder.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Cowork data hygiene rules
- No secrets in Skills - Staging retention per class - Restricted → T3 + encrypted staging if required - Manifest: metadata only for mail bodies
2.3
Output destinations and access control
Where Skill outputs go and who can see them — the data governance of automated output
Key takeaway
Three-tier paths: sandbox, staging (review), production — ACLs match human access; automation does not bypass folder permissions.
Why this matters
Cowork writing board packs to a world-readable folder is a governance failure, not a Skill bug.
Map destinations: who can read, who can write, who gets notified. Production paths require validation PASS + HITL tier met. Shared drives: use service account with explicit ACL, not personal home directory.
Version outputs by run_id or date — never overwrite sole copy. Deprecation: archive old reports, do not delete without retention policy check.
Workflow — do this next
- 01Define path taxonomy in OPS_WIKI.
- 02ACL review per production destination.
- 03Automated writes only to staging until promoted.
- 04Audit random sample of output ACLs quarterly.
2.4
The human approval gate
Designing Skills with an explicit approval step before output is delivered or action is taken
Key takeaway
Approval = explicit artifact (approval.flag, ticket ID, button in review queue) — not 'someone probably saw the Slack message.'
Why this matters
Ambiguous approval is how automated emails and customer-facing docs slip out.
HITL tiers (Ch 1): T0 none (rare), T1 notify, T2 review queue (default), T3 dual approval for external/customer/restricted. SEND_* Skills check approval/{id}.flag exists and signer != Skill runner identity.
Approval audit: who approved, when, run_id, output hash. Revoke approval.flag on output edit — require re-approval.
Workflow — do this next
- 01Assign HITL tier per workflow in SOP.
- 02Implement approval.flag pattern for writes/sends.
- 03Log approvals in manifest.
- 04Legal sign-off for T3 customer comms.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
approval.flag spec
{
"run_id": "...",
"approver": "user@company.com",
"approved_at": "ISO8601",
"output_hash": "sha256..."
}2.5
Skill change management
How to update a running Skill safely without disrupting downstream workflows
Key takeaway
Version bump → regression → staged rollout → schedule pointer update — never edit production Skill in place without version suffix.
Why this matters
In-place edits break audit trail and leave pipelines on unknown behaviour mid-week.
Change flow: PR to TAR spec → implement SKILL_vN+1 → regression fixtures → parallel run in shadow mode (write sandbox only) → compare outputs → flip workflow to new version → monitor 3 runs → deprecate vN.
Breaking handoff schema: bump schema_version; support N-1 for one sprint; notify downstream Skill owners.
Workflow — do this next
- 01Never edit v1 in place — create v2.
- 02Shadow run 3 cycles before cutover.
- 03Update SKILL_INDEX version column.
- 04Announce breaking changes in #ops.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Skill change management checklist
[ ] TAR PR reviewed [ ] Version _vN+1 created [ ] Regression pass [ ] Shadow runs OK [ ] Workflow pointer updated [ ] 3 prod runs monitored
2.6
Cowork for teams
The governance model for shared Skills and shared connectors — who can create, modify, and schedule
Key takeaway
RBAC: creators draft Skills, owners approve production, schedulers bind triggers, admins manage connectors — four roles minimum.
Why this matters
Shared Cowork without roles becomes nobody's problem until something breaks.
Roles: Creator, Owner, Scheduler, Admin. Shared library in git or drive; changes via PR not direct UI edit for production Skills.
Team connectors use service identity; personal OAuth only for individual sandboxes.
Workflow — do this next
- 01Define roles in COWORK_GOV.md.
- 02Assign owner per production Skill.
- 03Require owner approval for schedule changes.
- 04Quarterly role recertification.
2.7
Audit logging for Cowork
Maintaining a record of what Cowork did, when, and what it produced — the accountability trail
Key takeaway
Correlate run history + manifest + connector audit + approval flags — answer 'what happened?' in one run_id lookup.
Why this matters
Regulators and security ask for trails; building them after an incident is too late.
Log retention per policy (e.g. 90d run metadata, 1y approval records). Export monthly AUDIT_BUNDLE for compliance. Include: trigger, Skills versions, paths read/written, connector calls, validation result, approver.
Platform caveat: Cowork activity may not yet appear in org Compliance API or all enterprise audit logs. Your AUDIT_BUNDLE is the authoritative trail today alongside connector vendor logs.
Workflow — do this next
- 01Standardise manifest schema across workflows.
- 02Monthly audit export to compliance folder.
- 03Verify Compliance API coverage quarterly (Ch 7 §2.7).
- 04Test run_id lookup drill quarterly.
- 05Align retention with legal.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Monthly AUDIT_BUNDLE contents
run_history_export.csv
manifests/{month}/
approval_flags/
connector_audit_summary.json2.8
The Cowork security review
The periodic assessment of your Cowork configuration for security risks, stale permissions, and unnecessary access
Key takeaway
Semi-annual security review: permissions, connectors, offboarded users, Skill content scan, incident retrospective.
Why this matters
Automation attack surface grows with every connector and scheduled write.
Review agenda: filesystem scope vs SKILL_INDEX, connector OAuth scopes, revoked employees, Skills with web/MCP tools, staging retention, failed auth attempts, injection near-misses. Output: findings + remediation owners + dates.
Pair with IT security; use same checklist as Ch 4 §1.8 connector audit. Red findings block new production schedules until resolved.
Workflow — do this next
- 01Schedule semi-annual review on calendar.
- 02Run automated secret scan on Skill repo.
- 03Verify offboarding revoked Cowork access.
- 04Publish findings to leadership.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Cowork security review checklist
[ ] Filesystem scope minimal [ ] Connector scopes current [ ] Offboarded users revoked [ ] No secrets in Skills [ ] Staging retention compliant [ ] Incident log reviewed
Concept 3
Scaling Your Cowork Operation
Growing from a few helpful Skills to a comprehensive operations layer — the scaling principles and the organisational discipline
3.1
The Cowork inventory
Mapping every recurring task in your role to a potential Cowork Skill — the audit that reveals the full automation opportunity
Key takeaway
Full inventory = every recurring task with frequency, minutes, inputs, automate-now/next/never — the master backlog for your COO layer.
Why this matters
Ad hoc Skill sprawl misses high-ROI work; inventory reveals the 80% you have not touched.
Audit sources: calendar repeats, email templates, weekly reports, standup rituals, close checklists. Columns: task, cadence, mins/run, data source, Skill exists?, priority score. Revisit quarterly as roles change.
Cross-reference Ch 3 §1.8 schedule inventory and Ch 2 SKILL_INDEX — one master COWORK_INVENTORY.md.
Workflow — do this next
- 01Block 2h — list all recurring tasks 90 days.
- 02Score frequency × minutes.
- 03Mark automate-now vs human-only.
- 04Feed top 10 into roadmap.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
COWORK_INVENTORY.md
| Task | Cadence | min/run | Source | Skill? | Priority | Status | |------|---------|---------|--------|--------|----------|--------|
3.2
The automation priority matrix
Which tasks to automate first based on frequency, time cost, and automation reliability — the sequencing framework
Key takeaway
Prioritise high-frequency × high-minutes × high-reliability — quick wins build trust for harder workflows.
Why this matters
Automating rare complex judgment first kills programs; sequencing matters as much as tooling.
Matrix axes: Impact, Feasibility. Quadrant 1 (high/high): automate now. Q2 (high impact, low feasibility): invest in Skill design. Q3: batch later. Q4: never automate.
Reliability score: can you write acceptance tests? If no, defer until yes.
Workflow — do this next
- 01Plot inventory items on 2×2.
- 02Pick 3 Q1 items for next sprint.
- 03One connector workflow max per quarter early on.
- 04Replot after each production launch.
Real example
Q1 vs Q4
Q1: weekly CSV normalise (weekly, 45 min, schema stable). Q4: negotiate enterprise contract (yearly, judgment-heavy) — human-only.
3.3
Skill interdependency mapping
Understanding which Skills depend on others and how to manage the dependency graph as it grows
Key takeaway
Dependency graph: nodes = Skills, edges = handoff manifests — version bumps ripple; map before you change.
Why this matters
Upstream schema change without downstream update breaks pipelines silently at join step.
Maintain DEPENDENCY_GRAPH.md or diagram: FINANCE_INGEST → NORMALISE → VARIANCE → MEMO. Tag each edge with schema_version. Before deprecating Skill, list dependents and migration plan.
Critical path: workflows on board-week calendar — change freeze 7 days before. Non-critical Skills can iterate faster.
Workflow — do this next
- 01Draw graph for all production pipelines.
- 02Label schema_version on edges.
- 03Change freeze policy for critical path.
- 04Update graph on every new pipeline.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Dependency graph notation
INGEST_v2 --[manifest v1]--> ANALYSE_v3 --[metrics.json]--> REPORT_v1
3.4
Cowork for a team operation
Scaling from personal automation to a shared operational layer — the Skills and governance that work at team scale
Key takeaway
Shared layer = central SKILL_INDEX, shared connectors, named owners, team review queue — not everyone's laptop with different paths.
Why this matters
Personal Cowork genius does not scale; shared library and ops machine do.
Team patterns: dedicated ops host or VM always on for schedules; shared staging/review paths; Slack #cowork-ops for failures; weekly 15min triage rotation. Personal sandboxes fork shared Skills with PATH_MAPPING.
Avoid N duplicate Skills — merge FINANCE_MEMO_v3 for all analysts; parameterise client folder in Context.
Workflow — do this next
- 01Stand up shared ops environment.
- 02Consolidate duplicate Skills.
- 03Rotate triage duty weekly.
- 04Single source of truth for SKILL_INDEX.
3.5
The Cowork centre of excellence
Building a small internal capability for Skill development, quality assurance, and library management
Key takeaway
CoE = 1–2 operators + playbook — TAR reviews, regression gates, library hygiene — not a 20-person platform team.
Why this matters
Without CoE, quality variance across creators erodes trust; with CoE, teams ship Skills faster safely.
CoE services: TAR review office hours, regression fixture templates, connector allowlist maintenance, quarterly audits, training onboarding (§3.7). Metrics: production Skills count, success rate, time-to-production for new Skills.
CoE does not own every workflow — domain owners own Skills; CoE owns standards and gates.
Workflow — do this next
- 01Name CoE lead (often ops or chief of staff).
- 02Publish standards doc from this playbook.
- 03Weekly 30min office hours for Skill authors.
- 04Track CoE metrics monthly.
3.6
Measuring Cowork ROI
How to quantify the time, cost, and quality impact of your Cowork automation — the business case for continued investment
Key takeaway
ROI = (hours saved × loaded rate) − token cost − ops overhead — track monthly per workflow, not vibes.
Why this matters
Finance funds what you measure; hero stories do not survive budget season.
Hours saved: (manual baseline mins − review mins) × runs/month. Quality: error rate, missed deadline count, queue rejection rate. Cost: TOKEN_LEDGER + ops host + CoE time. Report simple dashboard to leadership quarterly.
Include intangible: faster Monday standup, fewer Sunday nights — but anchor on measurable hours first.
Workflow — do this next
- 01Baseline manual time before automation.
- 02Track review time post-automation.
- 03Subtract token + infra cost.
- 04Quarterly ROI slide for leadership.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Cowork ROI worksheet
hours_saved_mo = (manual_mins - review_mins) * runs / 60 value = hours_saved * loaded_rate cost = tokens_mo + infra ROI = value - cost
3.7
Cowork onboarding for new team members
Using the Skill library as a way to transfer operational knowledge — the documentation and training pattern
Key takeaway
Onboarding = read SKILL_INDEX → sandbox golden tests → shadow triage → own one Skill — ops knowledge in artifacts, not tribal chat.
Why this matters
New hires historically relearned Monday rituals by osmosis; Skill library is the curriculum.
Week 1: playbook Ch 1–2, run manual workflows, read TAR specs for team Skills. Week 2: fix one documentation gap, pass one regression fixture. Week 3: triage rotation. Week 4: propose automate-next from inventory.
Pair with PATH_MAPPING for their sandbox; share bundles (Ch 2 §3.8) standardised.
Workflow — do this next
- 01Create ONBOARDING.md linking SKILL_INDEX.
- 02Assign buddy + one sandbox workflow.
- 03Golden test completion = gate to prod access.
- 0430-day feedback to CoE.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Cowork onboarding checklist
[ ] Read playbook Ch 1-2 [ ] Run 3 team workflows manually [ ] Pass golden test in sandbox [ ] 1 triage shift [ ] Update one TAR doc
3.8
The fully automated operation
What it looks like when Cowork is running a significant portion of your operational workload — the vision and the discipline that makes it possible
Key takeaway
Full automation is 60–80% prep and review, not 100% unattended — humans at gates, machines on cadence, audit always on.
Why this matters
Pursuing 100% unattended sets up disappointment; realistic vision sustains multi-year programs.
Mature state: morning triage 15 min; Sunday nights gone; board inputs staged Friday; failures rare and loud; SKILL_INDEX 30+ Skills with owners; ROI positive; security reviews clean. Humans decide; Cowork prepares.
Discipline never ends: regression on change, quarterly audits, inventory refresh, CoE standards, no secret bypass of HITL for 'just this once.'
Workflow — do this next
- 01Define your 12-month maturity targets.
- 02Measure % ops hours on review vs manual prep.
- 03Celebrate gates working, not zero humans.
- 04Annual retrospective — playbook update.
Real example
Mature ops lead Monday
6:15 triage dashboard — all green. 6:20 skim brief + weekly report in review queue — 2 flags, approve. 6:35 standup. Cowork ran 6 workflows overnight; human time 20 min not 90.
Ready-to-use artifacts
Complete templates — paste directly into your AI tool or automation workflow.
Production readiness gate
Checklist before unattended production schedule.
Test matrix · validation · alerts · HITL · token budget · rollback
COWORK_INVENTORY.md
Master map of recurring tasks to Skills.
Task · cadence · minutes · Skill · priority · status
Cowork ROI worksheet
Monthly value vs token + infra cost.
hours_saved × loaded_rate − tokens − infra
Production readiness gate
[ ] TAR spec + owner [ ] Test matrix 5/5 [ ] QA/validation Skill attached [ ] Failure alerts configured [ ] HITL tier documented [ ] Token estimate approved [ ] Rollback procedure written
Validation check template
checks: - name: schema_match - name: row_count_delta < 0.2 - name: revenue_sum_matches_source - name: no_pii_in_output on_fail: stop_pipeline + notify
Failure alert template
[P2] Cowork FAIL: {workflow}
run_id: {id} | Skill: {name}_v{n}
error: {class} — {message}
partial: {paths}
owner: @{handle}REVIEW_BRIEF.md template
# Review — {workflow} / {run_id}
## 60-second check
- [ ] Key number: ___
## Confidence / flags
## Approve → move to {prod_path}Regression run log
| Skill version | fixture | key_fields_match | reviewer | date | |---------------|---------|------------------|----------|------|
TOKEN_LEDGER.csv columns
week, workflow, run_count, tokens_total, tokens_per_run, budget, variance_pct
Skill audit scorecard
| Skill | success% | validation% | tokens/wk | incidents | action | |-------|----------|-------------|-----------|-----------|--------|
Skill access matrix
| Skill | read | write | connectors | HITL | |-------|------|-------|------------|------|
Cowork data hygiene rules
- No secrets in Skills - Staging retention per class - Restricted → T3 + encrypted staging if required - Manifest: metadata only for mail bodies
approval.flag spec
{
"run_id": "...",
"approver": "user@company.com",
"approved_at": "ISO8601",
"output_hash": "sha256..."
}Skill change management checklist
[ ] TAR PR reviewed [ ] Version _vN+1 created [ ] Regression pass [ ] Shadow runs OK [ ] Workflow pointer updated [ ] 3 prod runs monitored
Monthly AUDIT_BUNDLE contents
run_history_export.csv
manifests/{month}/
approval_flags/
connector_audit_summary.jsonCowork security review checklist
[ ] Filesystem scope minimal [ ] Connector scopes current [ ] Offboarded users revoked [ ] No secrets in Skills [ ] Staging retention compliant [ ] Incident log reviewed
COWORK_INVENTORY.md
| Task | Cadence | min/run | Source | Skill? | Priority | Status | |------|---------|---------|--------|--------|----------|--------|
Dependency graph notation
INGEST_v2 --[manifest v1]--> ANALYSE_v3 --[metrics.json]--> REPORT_v1
Cowork ROI worksheet
hours_saved_mo = (manual_mins - review_mins) * runs / 60 value = hours_saved * loaded_rate cost = tokens_mo + infra ROI = value - cost
Cowork onboarding checklist
[ ] Read playbook Ch 1-2 [ ] Run 3 team workflows manually [ ] Pass golden test in sandbox [ ] 1 triage shift [ ] Update one TAR doc
Series B — Cowork as approved ops infrastructure
A 60-person company ran successful Cowork pilots but security blocked company-wide rollout after an over-scoped connector incident.
Before
No validation gates, personal OAuth on production, Skills edited in place, no ROI data, hero-dependent triage.
After
Chapter 5 playbook adoption: production readiness gate, team RBAC, CoE office hours, quarterly security review, ROI dashboard — security approved shared ops host.
- Production incidents → 3 in Q1 to 0 in Q2 after validation gates
- Security review → passed with min-access matrix
- Documented ROI → $42k annualised value vs $8k token+ops cost
- Onboarding → new ops hire productive in 2 weeks via SKILL_INDEX
What goes wrong
Scheduling without validation — confident wrong numbers reach leadership.
§1.1 readiness gate + §1.2 deterministic validation before emit.
Editing production Skills in place — no regression, unknown behaviour.
§2.5 version bump + shadow runs + regression fixtures.
Everyone admin on connectors — security shutdown.
§2.1 min access + §2.6 RBAC + semi-annual §2.8 review.
Automation sprawl — 40 Skills, 12 duplicates, no owners.
§1.8 quarterly audit + §3.1 inventory + §3.5 CoE consolidation.

Vetted by Krishna KumarCurator, FactorBeam
Discussion
Discussion coming soon
Shared comments for this playbook are not live yet. When they are, you'll be able to ask questions, share what worked, and see replies from other readers.