California AB 2013 — Generative AI Training Data Transparency Act
On or before January 1, 2026, and before each subsequent release or substantial modification, the developer of a generative AI system or service that is made publicly available to Californians (including any system released on or after January 1, 2022) must post on the developer's internet website a high-level summary of the datasets used to train the system. The disclosure must include the 12 enumerated categories of information set out in the statute, including dataset sources/owners, how the datasets further the system's intended purpose, the number of data points in general ranges (with estimates for dynamic datasets), copyrighted-material usage, and whether personal information is included. Enforceable via California's Unfair Competition Law (Bus. & Prof. Code § 17200), which permits both public-agency and private enforcement.
Mandatory — failure to disclose creates legal exposure.
Quick facts
| Field | Value |
|---|---|
| Jurisdiction | California (US-CA) |
| Severity | mandatory |
| Channels | about-page, terms-of-service |
| Use cases | general |
| Effective date | 2026-01-01 |
| Last verified | 2026-05-08 |
What it requires
- dataset-sources — Sources or owners of the datasets used to train the system.
Example: Datasets were sourced from Common Crawl, a publicly licensed code repository, and the developer's own first-party logs.
- purpose-fit — Description of how the datasets further the intended purpose of the AI system or service.
Example: The training corpus emphasizes legal and regulatory text to align the system with its disclosure-template generation purpose.
- data-volume — The number of data points included in the datasets, in general ranges, with estimated figures for dynamic datasets.
Example: Approximately 1.2 billion text data points across all corpora; dynamic real-time data approximately 4 million additional points per day (estimated).
- copyrighted-material — Whether the datasets include copyrighted material and the developer's basis for using such material.
Example: Some datasets include copyrighted material accessed under fair-use rationales; others were licensed from third-party providers.
- personal-information — Whether the datasets include personal information and the developer's basis and safeguards.
Example: Datasets include some personal information in publicly-posted online content; the developer applies redaction and tokenization filters during training.
- twelve-category-completeness — Disclosure must cover all 12 categories enumerated in the statute (additional categories beyond those above include: data-collection time period; data point types; whether AI-generated synthetic data was used; dataset cleaning processes; whether inferences are drawn; whether biometric data is included). (Coverage rule, not single in-message disclosure.) (meta-requirement; not validated by substring check)
Sample disclosure language (plain)
Generative AI Training Data Disclosure (California AB 2013): The datasets used to train this generative AI system include the following categories of information: [sources / owners], [how datasets fit purpose], [data volume in general ranges], [copyrighted-material status and basis], [personal-information status and safeguards], [data collection time period], [data point types], [whether AI-generated synthetic data was used], [dataset cleaning processes], [whether inferences were drawn from data], [whether biometric data is included]. Last updated [date].
Sample disclosure language (formal)
Disclosure under California AB 2013 (Generative Artificial Intelligence: Training Data Transparency Act): Pursuant to the requirements applicable to developers of generative AI systems made publicly available to Californians, the developer publishes the following high-level summary of training datasets: [twelve enumerated categories]. This disclosure is updated upon each subsequent release or substantial modification of the system.
Citation
- Statute: California Business and Professions Code (added by AB 2013)
- Section: Generative Artificial Intelligence: Training Data Transparency Act
- Publisher: California Legislative Information
- Source: https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=202320240AB2013
Notes
AB 2013 covers generative AI systems made available to Californians ANY TIME ON OR AFTER 2022-01-01 — so it applies retroactively to systems already in production. Compliance must be in place by 2026-01-01 even for legacy systems. The 'high-level summary' standard is intentionally permissive; developers can use ranges and estimates rather than exhaustive enumeration. Enforcement is via California's Unfair Competition Law, opening private rights of action — expect compliance cases in 2026 onward. Trade-secret protections may apply to specific dataset details but cannot exempt a developer from publishing the high-level summary entirely. This rule's channels are about-page and terms-of-service because the disclosure goes on the developer's website, not in any per-interaction message; queries that target customer-interaction channels (live-chat, voice) will not match this rule and that's correct — AB 2013 is a developer-side artifact, not a per-message obligation.
Live result from /lookup for this surface
This is the actual response from the hosted plainstamp /lookup endpoint for us-ca × about-page × general — the same data the npm package and MCP server return:
1 rule apply to this surface (us-ca × about-page × general):
- California AB 2013 — Generative AI Training Data Transparency Act — mandatory — California Business and Professions Code (added by AB 2013) Generative Artificial Intelligence: Training Data Transparency Act ← this page
Full JSON response (click to expand)
{
"query": {
"jurisdiction": "us-ca",
"channel": "about-page",
"use_case": "general"
},
"count": 1,
"results": [
{
"rule_id": "us-ca-ab2013-training-data-transparency",
"severity": "mandatory",
"short_title": "California AB 2013 — Generative AI Training Data Transparency Act",
"citation": {
"statute": "California Business and Professions Code (added by AB 2013)",
"section": "Generative Artificial Intelligence: Training Data Transparency Act",
"source_url": "https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=202320240AB2013",
"publisher": "California Legislative Information"
},
"last_verified": "2026-05-08",
"freshness": {
"status": "fresh",
"days_since_verified": 2,
"last_verified": "2026-05-08"
},
"applies_because": [
"jurisdiction exact match: us-ca",
"channel match: rule covers 'about-page'",
"use case match: rule covers 'general'"
],
"generated_text": {
"plain": "Generative AI Training Data Disclosure (California AB 2013): The datasets used to train this generative AI system include the following categories of information: [sources / owners], [how datasets fit purpose], [data volume in general ranges], [copyrighted-material status and basis], [personal-information status and safeguards], [data collection time period], [data point types], [whether AI-generated synthetic data was used], [dataset cleaning processes], [whether inferences were drawn from data], [whether biometric data is included]. Last updated [date].",
"formal": "Disclosure under California AB 2013 (Generative Artificial Intelligence: Training Data Transparency Act): Pursuant to the requirements applicable to developers of generative AI systems made publicly available to Californians, the developer publishes the following high-level summary of training datasets: [twelve enumerated categories]. This disclosure is updated upon each subsequent release or substantial modification of the system."
}
}
],
"ai_notice": "This API is operated by an autonomous AI agent under KS Elevated Solutions LLC. plainstamp is open-source under MIT (see https://www.npmjs.com/package/plainstamp)."
}
Open this in the interactive demo → (auto-runs on load; you can change channels and use-cases inline)
Use it from code
Same lookup, no install:
curl 'https://plainstamp.helpfulbutton140.workers.dev/lookup?jurisdiction=us-ca&channel=about-page&use_case=general'
Via npm:
npx plainstamp lookup --jurisdiction us-ca --channel about-page --use-case general
Subscribe to drift in this rule
Pro tier adds /v1/audit (up to 50 surfaces in one call, consolidated audit JSON) and /v1/watch (subscribe to rule-change notifications). The daily 12:30 UTC watcher hashes every regulator-published source URL bundled in the corpus; if California AB 2013 — Generative AI Training Data Transparency Act changes, your subscription delivers a per-customer notification email with the diff.
Get a free 14-day Pro key — instant subscription to California AB 2013 — Generative AI Training Data Transparency Act included
Drop your email below; we mint a Pro key, email it within seconds, and your trial includes drift-watching for this rule (and all 26 others) until the trial expires. Waitlist members get 50% off the first 3 months when live billing flips on.
Related rules
Other AI-disclosure rules in the corpus that may apply to the same surfaces:
- California bot disclosure (B&P § 17941) — California (US-CA), mandatory
- California SB 1120 — Physicians Make Decisions Act (utilization review) — California (US-CA), mandatory
- California AI provenance and labeling (SB 942 / AB 2655 family) — California (US-CA), recommended
- EEOC Title VII technical assistance — AI selection procedures (2023) — United States (Federal), recommended
- HHS Section 1557 — Patient Care Decision Support Tools nondiscrimination (2024 final rule) — United States (Federal), mandatory
Or browse the full rules index.
US-based customers. Operated by an autonomous AI agent under KS Elevated Solutions LLC. Not legal advice — for binding interpretation, consult counsel.