Career Pathways

Colorado Career Pathways Backend Data Dictionary

Last reviewed: 2026-06-10

This dictionary documents the backend data contracts used by Colorado Career Pathways. The backend data source of truth is still the generated JSON warehouse at `lib/career-data/generated/career-upload-data.json`, plus TypeScript modules that expose derived profile, export, mock provider, outcome, and analytics contracts. As of 2026-06-04, the repo also includes a Neon/Postgres-ready warehouse mirror with schema, sync, validation, and health-check tooling.

Reviewer Packet And PDF Status

Start reviewer handoff from `docs/mvp/data-dictionary-review-packet.md`. It summarizes the latest PDF update, the current generated quantitative inventory, the warehouse mirror status, and known reviewer-facing risks for Scott, Simon, and the MCJ team.

Treat this markdown file and the review packet as the current source context for the backend data dictionary.

System Map

| Layer | File or route | Purpose | Source/trust boundary |

| --- | --- | --- | --- |

| Career upload warehouse | `lib/career-data/generated/career-upload-data.json` | Main generated data store for industries, career families, occupations, regional records, workforce indices, forecasts, and confidence bands. | Uploaded workbooks plus O*NET and Indigo enrichment. |

| Neon/Postgres warehouse mirror | `db/migrations/0001_career_warehouse.sql`, `scripts/db/*`, `/api/health/database` | Durable database mirror for generated career data, with row-count validation and runtime connectivity status. | Requires `DATABASE_URL`; generated JSON remains the source of truth for imports. |

| Upload importer | `scripts/import-career-workbooks.mjs` | Reads qualitative and quantitative career workbooks, O*NET 30.3 text files, and IndigoPathway SOC grouping workbook. Writes the generated JSON warehouse. | Data refresh control point. Requires source workbook access. |

| Upload validator | `scripts/validate-career-upload-data.mjs` | Validates row minimums, uniqueness, O*NET coverage, tag coverage, regional metrics, forecast keys, and absence of `#REF!` strings. | Release/readiness guardrail. |

| Career profile facade | `lib/career-data/profiles.ts` | Converts raw generated upload data into route-ready occupation, industry, career-family, tag, education match, and readiness summaries. | App-facing source of truth. |

| Upload type facade | `lib/career-data/upload-analytics.ts` | Typed import of generated JSON and exported table constants. | Compile-time contract for generated JSON consumers. |

| API exports | `app/api/exports/*/route.ts` | Static JSON exports for occupations, industry pathways, and education crosswalk. | Public/stakeholder-facing export contracts. |

| ETPL provider fixtures | `lib/etpl/mock-providers.ts` | Local training provider and program fixtures matched to occupations by SOC. | Fixture layer until connected ETPL/Credential Registry source is wired. |

| Outcomes fixtures | `lib/slds/mock-outcomes.ts` | Local education-to-employment outcome, wage, ROI, and regional outcome fixtures. | Fixture layer until connected outcomes source is approved. |

| Workforce demo data | `lib/workforce-data.ts` | Legacy/static interfaces and sample values for workforce indices, trends, forecasts, and skill demand. | Some generated tables now supersede this module. Keep consumers explicit. |

| Static regional UI data | `lib/career-data/regional-data.ts` | Hard-coded city/region/employer/projection data by industry for map-style UI. | Demo/supporting data, not the upload warehouse. |

| Privacy-safe event contract | `lib/analytics/no-pii-events.ts` | Client-safe event envelope for industry, occupation, education-link, and export activity. | Product analytics contract designed to avoid account, user, or contact identifiers. |

Source Inputs

| Source | Period/version | Import path | Notes |

| --- | --- | --- | --- |

| Qualitative Careers Data for Upload | September 2025 | `scripts/import-career-workbooks.mjs --qualitative <xlsx>` | Supplies industries, career families, occupation narrative fields, education/work experience text, CIP/SOC hints, and qualitative placement. |

| Quantitative Careers Data for Upload | December 2025 | `scripts/import-career-workbooks.mjs --quantitative <xlsx>` | Supplies wage, openings, growth, regional sheets, and Top Job/Green Job/Critical Occupation flags. |

| O*NET tab-delimited database | O*NET 30.3 | Imported from `https://www.onetcenter.org/dl_files/database/db_30_3_text` unless `--skip-onet` is used. | Supplies title, description, ranked knowledge/skills/abilities, tasks, job zone, and representative occupations. |

| IndigoPathway SOC grouping workbook | March 2026 | `scripts/import-career-workbooks.mjs --indigo-soc-grouping <xlsx>` | Supplies pathway categories, subcategories, tips, associations, work-style flags, profile scores, and automation index. |

| ETPL/Credential Registry | Pending connected source | Fixture-backed in `lib/etpl/mock-providers.ts` | SOC matching is implemented; live provider source is not connected yet. |

| Outcomes | Pending connected source | Fixture-backed in `lib/slds/mock-outcomes.ts` | Outcome schema exists; live longitudinal source is not connected yet. |

Official Source Feed Roadmap

This is the target source stack for a best-in-class Colorado career pathway data feed. The current generated warehouse already has workbook, O*NET, IndigoPathway, regional metric, forecast, and export contracts; these official feeds are the priority source integrations to make the warehouse deeper, fresher, and more defensible.

Note on terminology: the standard BLS products are QCEW (Quarterly Census of Employment and Wages) and OEWS (Occupational Employment and Wage Statistics). If a stakeholder says QEWS, map the requirement to QCEW/OEWS and confirm which grain they need.

| Source feed | Official source | Best use in this app | Grain / join keys | Integration status |

| --- | --- | --- | --- | --- |

| Colorado LMI hub | Colorado Department of Labor and Employment LMI: `https://cdle.colorado.gov/dlss/labor-market-information-lmi` | State labor-market authority, Colorado-specific methodology, regional workforce context, CES/LAUS/OEWS/LED links. | Colorado, county, region, MSA, occupation, industry. | Priority source-of-record layer for Colorado labor-market pages. |

| Colorado occupational projections | CDLE projections: `https://cdle.colorado.gov/labor-law-stats/labor-market-information-lmi/industry-and-occupational-projections`; Colorado Open Data: `https://data.colorado.gov/Labor-and-Employment/Long-Term-Employment-Projections-in-Colorado/gyeb-jc69` | Ten-year occupational and industry projections, growth rates, replacement/opening context. | SOC, NAICS, geography, projection period. | Replace/validate generated `jobForecasts` and growth fields. |

| BLS OEWS | BLS OEWS tables: `https://www.bls.gov/oes/tables.htm` | Occupation employment and wage estimates for national, state, metro, nonmetro, and industry-specific views. | SOC, geography, ownership/industry where available, annual release. | Priority wage/employment benchmark. |

| BLS QCEW | BLS QCEW downloadable/open data: `https://www.bls.gov/cew/downloadable-data-files.htm` | Industry employment and wage by NAICS, ownership, county, quarter, and annual average; strong benchmark for regional industry health. | NAICS, county/state, ownership, quarter/year. | Priority industry and regional benchmark feed. |

| BLS Public Data API | BLS API: `https://www.bls.gov/bls/api_features.htm` | Programmatic refresh path for BLS time series in JSON/Excel where series IDs are available. | BLS series ID, period, year. | Build refresh adapter after source IDs are finalized. |

| Lightcast Skills | Lightcast Open Skills: `https://lightcast.io/open-skills`; API access: `https://docs.lightcast.io/lightcast-api/docs/api-access` | Market-facing skills normalization, skill identifiers, skill categories/subcategories, job-posting signal, career pathway intelligence if licensed. | Lightcast skill ID, occupation/title taxonomy, posting trend. | High-value enrichment feed; access/licensing required beyond open taxonomy. |

| O*NET database | O*NET database help: `https://www.onetonline.org/help/onet/database` | Occupational descriptors, tasks, knowledge, skills, abilities, work activities, job zone, interests, work values. | O*NET-SOC, SOC fallback, descriptor IDs. | Already imported; keep current with O*NET releases. |

| CareerOneStop APIs | CareerOneStop API Explorer: `https://api.careeronestop.org/api-explorer/` | Training programs, institutions, career resources, certifications, and occupation-supporting datasets. | Location, occupation, program, institution. | Candidate ETPL/training provider adapter. |

| Credential Engine Registry | Credential Engine APIs: `https://credentialengine.org/develop-solutions/apis/`; technical registry: `https://credreg.net/registry/searchapi` | CTDL-backed credential, organization, assessment, and pathway transparency. | CTID, credential, organization, occupation, learning opportunity. | Candidate credential registry adapter; API agreement required. |

| College Scorecard | College Scorecard API: `https://collegescorecard.ed.gov/data/api/` | Institution and field-of-study cost, completion, debt, and earnings indicators. | OPEID/UNITID, CIP, field of study, state. | Candidate education ROI and provider-quality feed. |

| IPEDS | NCES IPEDS Data Center: `https://nces.ed.gov/ipeds/datacenter/Default.aspx` | Institutional characteristics, completions, awards, program inventory, enrollment, finance, graduation. | UNITID, CIP, award level, year. | Candidate education-provider normalization feed. |

| Census LEHD/LODES | Census LEHD data: `https://lehd.ces.census.gov/data/` | Origin-destination employment, workplace/residence flows, worker characteristics, commute sheds, regional workforce movement. | Census block, state, workplace/residence area, industry, worker segment. | Candidate regional mobility and access layer. |

| Census ACS / data.census.gov | Census data portal: `https://data.census.gov/` | Demographics, educational attainment, income, commuting, broadband/access context by geography. | GEOID, ACS table, year. | Candidate context layer for equity/access views. |

Feed Architecture Targets

| Layer | Target behavior | Quality rule |

| --- | --- | --- |

| Source registry | Maintain a machine-readable registry of every feed, owner, URL, refresh cadence, license/access requirement, grain, join keys, and last successful load. | No dataset enters the warehouse without source metadata and freshness. |

| Raw landing | Preserve raw downloaded files/API payloads with immutable timestamps before transformation. | No silent overwrites; every refresh can be audited. |

| Crosswalks | Maintain SOC/O*NET-SOC, CIP, NAICS, UNITID/OPEID, CTID, region, county, MSA, and Colorado workforce-region crosswalks. | Crosswalk confidence must be explicit: exact, normalized, family fallback, or manual override. |

| Derived warehouse | Generate app-ready industry, occupation, regional, education, skill, forecast, and export tables from raw sources plus crosswalks. | Derived fields must point back to source feed and transform version. |

| Validation | Run row-count, uniqueness, null, stale-date, outlier, geography, code-system, and placeholder checks on every refresh. | Fail closed for broken numeric metrics; preserve qualitative records with clear empty states. |

| Evidence exports | Publish JSON/PDF/CSV outputs with source periods, freshness, and known gaps. | Public exports must not include credentials, raw private source paths, or workflow fixture records. |

Refresh And Validation Commands

npm run data:import-careers

npm run data:validate-careers

npm run db:setup

The importer accepts optional overrides:

node scripts/import-career-workbooks.mjs \

--qualitative /path/to/qualitative.xlsx \

--quantitative /path/to/quantitative.xlsx \

--indigo-soc-grouping /path/to/indigo.xlsx \

--out lib/career-data/generated/career-upload-data.json

Validation currently expects at least 10 industries, 50 careers, 800 occupations, 6,500 regional occupation records, 2,000 job forecasts, 100 workforce index rows, 650 O*NET-enriched occupations, 200 Top Job occupations, 100 Green Job occupations, 20 Critical Occupation occupations, unique occupation/regional/forecast keys, valid regional numbers, valid forecast confidence labels, and no loaded `#REF!` strings.

Database setup and operational details live in `docs/mvp/database-setup.md`. `npm run db:setup` validates the generated warehouse, applies Postgres schema migrations, syncs the warehouse tables, and validates database row counts against the generated JSON.

Current Generated Data Inventory

Counts are from `lib/career-data/generated/career-upload-data.json` as validated on 2026-06-10.

| Dataset | Count | Primary key | Description |

| --- | ---: | --- | --- |

| `metadata` | 1 | n/a | Source periods, file names, methodology, region-to-county map, row counts, completeness, and warnings. |

| `industries` | 16 | `id` | Uploaded industry sectors and overview text. |

| `careers` | 74 | `id` | Uploaded career families/pathway groupings within industries. |

| `occupations` | 810 | `occupationId` | Uploaded occupation records with SOC/O*NET/CIP, tags, narrative fields, O*NET enrichment, Indigo enrichment, and regional metrics. |

| `occupationRegionalRecords` | 6,785 | `id` | Chart-ready occupation/industry/region records with salary, openings, growth, index, tags, education, experience, certifications, and skills. |

| `workforceIndices` | 252 | `industryId + region` | Derived workforce gap, effort, challenge, opportunity, composite score, trend, projected change, and freshness. |

| `jobForecasts` | 2,805 | `industryId + region + year` | Derived current/projected jobs, growth, confidence, score, and explanatory factors. |

| `confidenceIntervals` | 2,550 | `industryId + year + implicit row grain` | Derived baseline, optimistic, and pessimistic forecast bands. |

Completeness snapshot:

| Metric | Value |

| --- | ---: |

| Occupations with SOC | 675 |

| Occupations without SOC | 135 |

| Occupations with O*NET enrichment | 810 |

| O*NET matched records | 664 |

| O*NET composite records | 146 |

| Occupations with IndigoPathway match | 773 |

| Occupations without IndigoPathway match | 37 |

| Occupations with any quantitative metric | 667 |

| Occupations without quantitative metric | 143 |

| Occupations with statewide wage | 660 |

| Top Job occupations | 219 |

| Green Job occupations | 116 |

| Critical Occupation occupations | 25 |

| Nested occupation regional metric rows | 9,669 |

Region Contract

Generated occupation metrics use `Statewide` plus `R1` through `R14`. The human-readable labels are `Statewide`, `Region 1`, ..., `Region 14`. County groupings live in `metadata.regionCountyMap`.

| Field | Type | Definition |

| --- | --- | --- |

| `regionKey` | string | `Statewide` or `R1` through `R14`. |

| `regionName` | string | Display label for the region. |

| `regionCountyMap` | object | Map of each regional key to included Colorado counties. |

Core Generated Entities

### `metadata`

| Field | Type | Definition |

| --- | --- | --- |

| `qualitativePeriod` | string | Period label for the qualitative workbook. Current value: `September 2025`. |

| `quantitativePeriod` | string | Period label for the quantitative workbook. Current value: `December 2025`. |

| `sourceFiles.qualitative` | string | Basename of qualitative source workbook. |

| `sourceFiles.quantitative` | string | Basename of quantitative source workbook. |

| `sourceFiles.onet` | string | O*NET source label or skip label. |

| `sourceFiles.indigoSocGrouping` | string | IndigoPathway source workbook label. |

| `generatedBy` | string | Import script path. Current value: `scripts/import-career-workbooks.mjs`. |

| `methodology.joinKeys` | string[] | Join strategy, currently uploaded occupation ID and SOC/O*NET fallback. |

| `methodology.onetEnrichment` | string | O*NET matching method. |

| `methodology.indigoEnrichment` | string | IndigoPathway matching method. |

| `methodology.wageMetric` | string | Wage measure definition. |

| `methodology.openingMetric` | string | Openings measure definition. |

| `methodology.growthMetric` | string | Growth measure definition. |

| `methodology.indexMethod` | string | How app indices are derived. |

| `validation.sourceRows` | object | Input workbook row counts by source sheet/category. |

| `validation.loadedRows` | object | Generated output row counts by dataset. |

| `validation.completeness` | object | Coverage counts for SOC, O*NET, Indigo, tags, metrics, and wages. |

| `validation.warnings` | string[] | Known source/processing caveats retained with the dataset. |

### `industries`

| Field | Type | Definition |

| --- | --- | --- |

| `id` | string | Slug used in routes and joins. |

| `name` | string | Display industry name. |

| `overview` | string | Uploaded industry overview narrative. |

| `tidbits` | string[] | Uploaded supporting facts/highlights for the industry page. |

Industry IDs currently include `advanced-manufacturing`, `aerospace`, `agriculture-and-natural-resources`, `behavioral-health`, `business-operations`, `construction`, `creative-industries`, `cybersecurity`, `education`, `energy`, `healthcare`, `information-technology`, `public-health`, `public-safety`, `retail`, and `transportation`.

### `careers`

| Field | Type | Definition |

| --- | --- | --- |

| `id` | string | Uploaded career family ID used by `occupations[].careerCodes`. |

| `name` | string | Display career family/pathway name. |

| `industryName` | string | Uploaded parent industry display name. |

| `teaser` | string | Short overview used in pathway summaries. |

| `expectedPay` | string | Uploaded pay narrative. |

| `buzz` | string | Uploaded career-buzz or outlook narrative. |

| `companies` | string | Uploaded employer/company narrative. |

| `workers` | string | Uploaded worker/demographic narrative. |

### `occupations`

| Field | Type | Definition |

| --- | --- | --- |

| `occupationId` | string | Primary occupation ID from the upload; also used as the occupation route ID. |

| `name` | string | Occupation display title. |

| `soc` | string | Normalized 2018 SOC base code when available. Empty string means qualitative-only/no SOC. |

| `onetSoc` | string | Full O*NET-SOC code used for enrichment. |

| `industry` | string | Industry display name. |

| `industryId` | string | Industry slug join to `industries.id`. |

| `careerCodes` | string[] | Career family IDs joined to `careers.id`. |

| `topJob` | boolean | Colorado Talent Pipeline Report-aligned Top Job flag. |

| `greenJob` | boolean | O*NET green economy classification flag. |

| `criticalOccupation` | boolean | Industry-partner critical occupation flag. |

| `skillLevel` | number/null | Uploaded or derived skill level when present. |

| `childOccupationIds` | string[] | Child occupation IDs for grouped/composite pathways. |

| `alsoKnownAs` | string[] | O*NET reported/alternate job titles. |

| `description` | string | Occupation description, usually O*NET-backed when matched. |

| `cipCodes` | string[] | CIP codes retained for education/crosswalk matching. |

| `knowledge` | string | Semicolon-delimited knowledge text for display. |

| `skills` | string | Semicolon-delimited skills text for display. |

| `abilities` | string | Semicolon-delimited abilities text for display. |

| `importantCompetencies` | string | Uploaded competency narrative. |

| `credentialRequirement` | string | Credential requirement narrative. |

| `credentialDetail` | string | Credential detail narrative. |

| `workExperienceRequirement` | string | Work experience requirement narrative. |

| `workExperienceDetail` | string | Work experience detail narrative. |

| `typicalEducation` | string | Typical education label/narrative. |

| `typicalPrograms` | string | Typical program narrative. |

| `training` | string | Training narrative. |

| `workBasedLearning` | string | Work-based learning narrative. |

| `remoteWork` | string | Remote-work narrative. |

| `extraNotes` | string[] | Additional source notes after placeholder cleanup. |

| `onet` | object | O*NET enrichment block. |

| `indigo` | object | IndigoPathway enrichment block. |

| `regionalMetrics` | object[] | Nested metrics by statewide/region key. |

### `occupations[].onet`

| Field | Type | Definition |

| --- | --- | --- |

| `source` | string | O*NET source label. |

| `status` | `matched`/`composite`/`not-found`/`missing-soc` | Match status. Current generated data has `matched` and `composite` only. |

| `title` | string | O*NET occupation title. |

| `description` | string | O*NET occupation description. |

| `topKnowledge` | string[] | Top ranked O*NET knowledge areas. |

| `topSkills` | string[] | Top ranked O*NET skills. |

| `topAbilities` | string[] | Top ranked O*NET abilities. |

| `tasks` | string[] | Top/core O*NET task statements. |

| `jobZone` | number/null | O*NET Job Zone. |

| `representativeOccupations` | object[] | Representative O*NET occupations used for composite/grouped pathways. |

### `occupations[].indigo`

| Field | Type | Definition |

| --- | --- | --- |

| `source` | string | IndigoPathway workbook label. |

| `status` | `matched`/`not-found` | Indigo SOC grouping match status. |

| `categories` | string[] | Indigo pathway categories. |

| `subCategories` | string[] | Indigo pathway subcategories. |

| `pathwayNames` | string[] | Indigo pathway names. |

| `topTips` | string[] | Indigo guidance/tips text. |

| `associations` | string[] | Association names/links from Indigo source. |

| `workStyle.workOnComputer` | string[] | Work-style flag values. |

| `workStyle.workOutside` | string[] | Work-style flag values. |

| `workStyle.workWithHands` | string[] | Work-style flag values. |

| `workStyle.workWithKids` | string[] | Work-style flag values. |

| `profileScores` | object | Indigo profile score map with numeric/null values. |

| `matchedSocRows` | object[] | Source SOC grouping rows retained for audit/debugging. |

### `occupations[].regionalMetrics`

| Field | Type | Definition |

| --- | --- | --- |

| `regionKey` | string | `Statewide` or regional code. |

| `regionName` | string | Region display label. |

| `employment` | number/null | Employment estimate when available. |

| `hourlyWage` | number/null | Median hourly wage when available. |

| `annualWage` | number/null | Median annual wage when available. |

| `annualOpenings` | number/null | Annual openings when available. |

| `annualGrowth` | number/null | Annual growth rate when available. |

| `growth10Years` | number/null | 10-year growth percentage/rate when available. |

| `openings2024To2034` | number/null | 2024-2034 opening count when available. |

| `source` | string | Regional metric source indicator, such as `curated-salary-sheet+raw-region-sheet`. |

Note: `lib/career-data/upload-analytics.ts` and `lib/career-data/profiles.ts` now expose the richer generated metric fields (`employment`, `hourlyWage`, `annualGrowth`, `openings2024To2034`, `source`) in addition to the app's core wage/openings/growth fields.

### `occupationRegionalRecords`

| Field | Type | Definition |

| --- | --- | --- |

| `id` | string | Unique chart/export record ID. |

| `title` | string | Occupation title. |

| `industry` | string | Industry display name. |

| `region` | string | Region display name. |

| `openings` | number | Chart-ready openings count. |

| `avgSalary` | number | Chart-ready average/median salary value. |

| `growthRate` | number | Chart-ready growth rate. |

| `gapIndex` | number | Derived supply-demand gap score. |

| `topJob` | boolean | Top Job flag. |

| `greenJob` | boolean | Green Job flag. |

| `criticalOccupation` | boolean | Critical Occupation flag. |

| `educationRequired` | string | Display education requirement. |

| `experienceLevel` | string | Display experience level. |

| `certifications` | string[] | Display certification labels. |

| `topSkills` | string[] | Top skill labels for this record. |

### `workforceIndices`

| Field | Type | Definition |

| --- | --- | --- |

| `industryId` | string | Industry slug. |

| `industryName` | string | Industry display name. |

| `region` | string | Region display name. |

| `gapIndex` | number | 0-100 supply-demand imbalance score; higher means more shortage. |

| `effortIndex` | number | 0-100 training intensity score; higher means more effort. |

| `challengeIndex` | number | 0-100 barrier score; higher means more barriers. |

| `opportunityIndex` | number | 0-100 advancement score; higher means more opportunity. |

| `compositeScore` | number | Weighted summary score. |

| `trend` | `up`/`down`/`stable` | Derived trend label. |

| `projectedChange` | number | Expected percent change. |

| `lastUpdated` | string | Freshness date string. |

### `jobForecasts`

| Field | Type | Definition |

| --- | --- | --- |

| `year` | number | Forecast year. |

| `industryId` | string | Industry slug. |

| `industryName` | string | Industry display name. |

| `region` | string | Region display name. |

| `currentJobs` | number | Current/base job estimate. |

| `projectedJobs` | number | Projected job estimate. |

| `growthRate` | number | Forecast growth rate. |

| `confidence` | `high`/`medium`/`low` | Confidence label validated by script. |

| `confidenceScore` | number | Numeric confidence score. |

| `factors` | string[] | Driver/explanation labels. |

### `confidenceIntervals`

| Field | Type | Definition |

| --- | --- | --- |

| `year` | number | Forecast year. |

| `industryId` | string | Industry slug. |

| `baseline` | number | Baseline forecast value. |

| `optimistic` | number | Upper/optimistic forecast value. |

| `pessimistic` | number | Lower/pessimistic forecast value. |

App-Facing Derived Entities

Defined in `lib/career-data/profiles.ts`.

| Entity | Base data | Added/derived fields |

| --- | --- | --- |

| `OccupationProfile` | `CareerUploadOccupation` | `routeId`, `tags`, `sourceMetadata`, `regionalMetricsWithSource`, `educationProgramMatches`. |

| `CareerFamily` | `CareerUploadFamily` | Joined `occupations`, `topTags`, `medianAnnualWage`, `annualOpenings`. |

| `IndustryProfile` | `CareerUploadIndustry` | `routeId`, joined `careerFamilies`, joined/sorted `occupations`, `sourceMetadata`, aggregated `regionalMetrics`, `availableRegions`, `tagCounts`, and top education matches. |

| `CareerTag` | Occupation flags | `topJob`, `greenJob`, and `criticalOccupation` labels, definitions, and source notes. |

| `EducationProgramMatch` | Occupation SOC/CIP plus mock ETPL | SOC exact first, SOC family fallback, CIP-ready placeholder when provider data is pending. |

| `DataSourceMetadata` | Generated `metadata` | Source periods/files, methodology, generated-by path, warnings. |

API Export Contracts

All export routes are static (`dynamic = 'force-static'`) and return `{ metadata, data }`.

### `/api/exports/occupations`

`metadata` fields:

| Field | Type | Definition |

| --- | --- | --- |

| `exportName` | string | `occupations`. |

| `recordCount` | number | Occupation count from readiness summary. |

| `source` | object | Source metadata. |

| `readiness` | boolean | Occupation export readiness flag. |

`data[]` fields:

| Field | Type | Definition |

| --- | --- | --- |

| `id` | string | Occupation uploaded ID. |

| `title` | string | Occupation title. |

| `industryId` | string | Industry slug. |

| `industry` | string | Industry display name. |

| `soc` | string | SOC code. |

| `onetSoc` | string | O*NET-SOC code. |

| `cipCodes` | string[] | CIP code list. |

| `careerFamilyIds` | string[] | Career family IDs. |

| `skillLevel` | number/null | Skill level. |

| `tags` | string[] | Human-readable occupation tags. |

| `typicalEducation` | string | Typical education. |

| `credentialRequirement` | string | Credential requirement. |

| `workExperienceRequirement` | string | Work experience requirement. |

| `regionalMetrics` | object[] | Region-level wage/openings/growth metrics with source period. |

| `onet` | object | O*NET enrichment block. |

| `indigo` | object | Indigo enrichment block. |

| `source` | object | Source metadata. |

### `/api/exports/industry-pathways`

`data[]` fields:

| Field | Type | Definition |

| --- | --- | --- |

| `id` | string | Industry slug. |

| `name` | string | Industry display name. |

| `overview` | string | Industry overview. |

| `tidbits` | string[] | Industry facts/highlights. |

| `tagCounts` | object | Counts of Top Job, Green Job, and Critical Occupation occupations. |

| `regionalMetrics` | object[] | Aggregated industry regional metrics. |

| `careerFamilies` | object[] | Family summaries with ID, name, overview, occupation IDs, annual openings, and median annual wage. |

| `source` | object | Source metadata. |

### `/api/exports/education-crosswalk`

`metadata.mappingRule`: `SOC exact first, SOC family fallback for mock ETPL, CIP retained for Credential Registry connection.`

`data[]` fields:

| Field | Type | Definition |

| --- | --- | --- |

| `occupationId` | string | Occupation uploaded ID. |

| `title` | string | Occupation title. |

| `soc` | string | SOC code. |

| `onetSoc` | string | O*NET-SOC code. |

| `cipCodes` | string[] | CIP codes retained for connected source matching. |

| `strategy` | string | Match strategy statement. |

| `matches` | object[] | Provider/program match summaries. |

`matches[]` fields:

| Field | Type | Definition |

| --- | --- | --- |

| `matchType` | `soc-exact`/`soc-family`/`cip-ready` | Match route. |

| `sourceStatus` | `mock-etpl`/`ready-for-connected-source` | Whether the row came from local mock data or a connected-source placeholder. |

| `providerId` | string | Provider ID. |

| `providerName` | string | Provider display name. |

| `programId` | string | Program ID. |

| `programName` | string | Program display name. |

| `credentialEarned` | string | Credential label. |

| `deliveryMode` | string | In-person, online, or hybrid. |

Fixture Data Contracts

### ETPL Provider Fixtures

Defined in `lib/etpl/mock-providers.ts`.

| Entity | Key | Fields |

| --- | --- | --- |

| `TrainingProvider` | `id` | `name`, `type`, `address`, `city`, `region`, `phone`, `website`, `accreditation`, `programs`. |

| `TrainingProgram` | `id` | `name`, `providerId`, `socCodes`, `duration`, `cost`, `completionRate`, `placementRate`, `credentialEarned`, `deliveryMode`, `startDates`. |

Use this as fixture data only. The app already has SOC exact/family matching and CIP-ready fallback semantics for a future ETPL/Credential Registry source.

### Outcomes Fixtures

Defined in `lib/slds/mock-outcomes.ts`.

| Entity | Key | Fields |

| --- | --- | --- |

| `ProgramOutcome` | `programId` | Provider/program, SOC, completer count, employment rate, wage progression, ROI, time to employment, retention, advancement, and industry alignment. |

| `IndustryOutcome` | `industryId` | Total completers, average employment/wage/ROI, top programs, and demand outlook. |

| `RegionalOutcome` | `region` | Total completers, average employment rate, average year-one wage, and top industries. |

| `WageProgression` | n/a | `year1`, `year3`, `year5`, optional `year10`. |

Use this as fixture data only until a connected outcomes source is approved.

### Privacy-Safe Analytics Events

Defined in `lib/analytics/no-pii-events.ts`.

| Event | Required fields | Optional fields |

| --- | --- | --- |

| `industry_viewed` | `industryId` | `region` |

| `occupation_viewed` | `occupationId`, `industryId` | `region` |

| `education_link_opened` | none | `occupationId`, `providerId`, `programId` |

| `export_downloaded` | `exportName` | none |

`createNoPiiEvent` adds `occurredAt` as an ISO timestamp. `sessionBucket` is optional on the event envelope and should remain non-identifying.

Data Quality And Cleanup Backlog

| Item | Status | Owner/action |

| --- | --- | --- |

| Missing SOC codes | 135 occupations retained as qualitative-only records. | Keep visible but exclude from wage/opening charts where SOC-backed quantitative metrics are required. |

| Missing quantitative metrics | 143 occupations have no regional metrics. | Preserve qualitative records and show null/empty states rather than dropping occupations. |

| Missing IndigoPathway match | 37 occupations do not have Indigo SOC grouping. | Review SOC/title mapping and add explicit representative SOC overrides when warranted. |

| CIP/provider matching | CIP codes are retained, but provider data is fixture-backed. | Connect ETPL/Credential Registry and replace `cip-ready` placeholders with live matches. |

| Regional metric contract | Generated nested regional metrics include richer fields than the core UI usually displays. | Keep `upload-analytics.ts`, `profiles.ts`, and this dictionary aligned when importer fields change. |

| Static regional UI data | `lib/career-data/regional-data.ts` contains hard-coded city/employer/projection values separate from the generated warehouse. | Reconcile or label as demo/supporting data wherever displayed. |

| Legacy workforce module | `lib/workforce-data.ts` contains static/demo constants while generated JSON also exports workforce tables. | Prefer generated tables for current career-upload analytics and keep legacy consumers explicit. |

| External links | Readiness summary counts distinct links in occupation detail text. | Batch health-check before public launch. |

Public/Private Boundaries

  • Public/stakeholder-safe: generated career upload data, source metadata, readiness counts, occupation/industry/education export JSON, and privacy-safe analytics events.
  • Fixture-only: ETPL providers/programs, outcomes, and static regional city/employer/projection data.
  • Private/protected: source workbook file paths, any future live provider/outcomes credentials, and analytics session identifiers that could identify a person.

Readiness Checklist

  • Run `npm run data:validate-careers` after any generated JSON refresh.
  • Compare the generated PDF against this markdown before reviewer handoff; the current committed PDF is a 2026-06-03 snapshot.
  • Confirm export routes still return the expected `{ metadata, data }` structure.
  • Confirm counts in this dictionary after a workbook refresh.
  • Confirm no `#REF!`, `Insf. Data`, or placeholder strings leak into numeric analytics.
  • Keep mock/demo data clearly labeled until connected source systems are wired.