Colorado Career Pathways Backend Data Dictionary
Last reviewed: 2026-06-10
This dictionary documents the backend data contracts used by Colorado Career Pathways. The backend data source of truth is still the generated JSON warehouse at `lib/career-data/generated/career-upload-data.json`, plus TypeScript modules that expose derived profile, export, mock provider, outcome, and analytics contracts. As of 2026-06-04, the repo also includes a Neon/Postgres-ready warehouse mirror with schema, sync, validation, and health-check tooling.
Reviewer Packet And PDF Status
Start reviewer handoff from `docs/mvp/data-dictionary-review-packet.md`. It summarizes the latest PDF update, the current generated quantitative inventory, the warehouse mirror status, and known reviewer-facing risks for Scott, Simon, and the MCJ team.
Treat this markdown file and the review packet as the current source context for the backend data dictionary.
System Map
| Layer | File or route | Purpose | Source/trust boundary |
| --- | --- | --- | --- |
| Career upload warehouse | `lib/career-data/generated/career-upload-data.json` | Main generated data store for industries, career families, occupations, regional records, workforce indices, forecasts, and confidence bands. | Uploaded workbooks plus O*NET and Indigo enrichment. |
| Neon/Postgres warehouse mirror | `db/migrations/0001_career_warehouse.sql`, `scripts/db/*`, `/api/health/database` | Durable database mirror for generated career data, with row-count validation and runtime connectivity status. | Requires `DATABASE_URL`; generated JSON remains the source of truth for imports. |
| Upload importer | `scripts/import-career-workbooks.mjs` | Reads qualitative and quantitative career workbooks, O*NET 30.3 text files, and IndigoPathway SOC grouping workbook. Writes the generated JSON warehouse. | Data refresh control point. Requires source workbook access. |
| Upload validator | `scripts/validate-career-upload-data.mjs` | Validates row minimums, uniqueness, O*NET coverage, tag coverage, regional metrics, forecast keys, and absence of `#REF!` strings. | Release/readiness guardrail. |
| Career profile facade | `lib/career-data/profiles.ts` | Converts raw generated upload data into route-ready occupation, industry, career-family, tag, education match, and readiness summaries. | App-facing source of truth. |
| Upload type facade | `lib/career-data/upload-analytics.ts` | Typed import of generated JSON and exported table constants. | Compile-time contract for generated JSON consumers. |
| API exports | `app/api/exports/*/route.ts` | Static JSON exports for occupations, industry pathways, and education crosswalk. | Public/stakeholder-facing export contracts. |
| ETPL provider fixtures | `lib/etpl/mock-providers.ts` | Local training provider and program fixtures matched to occupations by SOC. | Fixture layer until connected ETPL/Credential Registry source is wired. |
| Outcomes fixtures | `lib/slds/mock-outcomes.ts` | Local education-to-employment outcome, wage, ROI, and regional outcome fixtures. | Fixture layer until connected outcomes source is approved. |
| Workforce demo data | `lib/workforce-data.ts` | Legacy/static interfaces and sample values for workforce indices, trends, forecasts, and skill demand. | Some generated tables now supersede this module. Keep consumers explicit. |
| Static regional UI data | `lib/career-data/regional-data.ts` | Hard-coded city/region/employer/projection data by industry for map-style UI. | Demo/supporting data, not the upload warehouse. |
| Privacy-safe event contract | `lib/analytics/no-pii-events.ts` | Client-safe event envelope for industry, occupation, education-link, and export activity. | Product analytics contract designed to avoid account, user, or contact identifiers. |
Source Inputs
| Source | Period/version | Import path | Notes |
| --- | --- | --- | --- |
| Qualitative Careers Data for Upload | September 2025 | `scripts/import-career-workbooks.mjs --qualitative <xlsx>` | Supplies industries, career families, occupation narrative fields, education/work experience text, CIP/SOC hints, and qualitative placement. |
| Quantitative Careers Data for Upload | December 2025 | `scripts/import-career-workbooks.mjs --quantitative <xlsx>` | Supplies wage, openings, growth, regional sheets, and Top Job/Green Job/Critical Occupation flags. |
| O*NET tab-delimited database | O*NET 30.3 | Imported from `https://www.onetcenter.org/dl_files/database/db_30_3_text` unless `--skip-onet` is used. | Supplies title, description, ranked knowledge/skills/abilities, tasks, job zone, and representative occupations. |
| IndigoPathway SOC grouping workbook | March 2026 | `scripts/import-career-workbooks.mjs --indigo-soc-grouping <xlsx>` | Supplies pathway categories, subcategories, tips, associations, work-style flags, profile scores, and automation index. |
| ETPL/Credential Registry | Pending connected source | Fixture-backed in `lib/etpl/mock-providers.ts` | SOC matching is implemented; live provider source is not connected yet. |
| Outcomes | Pending connected source | Fixture-backed in `lib/slds/mock-outcomes.ts` | Outcome schema exists; live longitudinal source is not connected yet. |
Official Source Feed Roadmap
This is the target source stack for a best-in-class Colorado career pathway data feed. The current generated warehouse already has workbook, O*NET, IndigoPathway, regional metric, forecast, and export contracts; these official feeds are the priority source integrations to make the warehouse deeper, fresher, and more defensible.
Note on terminology: the standard BLS products are QCEW (Quarterly Census of Employment and Wages) and OEWS (Occupational Employment and Wage Statistics). If a stakeholder says QEWS, map the requirement to QCEW/OEWS and confirm which grain they need.
| Source feed | Official source | Best use in this app | Grain / join keys | Integration status |
| --- | --- | --- | --- | --- |
| Colorado LMI hub | Colorado Department of Labor and Employment LMI: `https://cdle.colorado.gov/dlss/labor-market-information-lmi` | State labor-market authority, Colorado-specific methodology, regional workforce context, CES/LAUS/OEWS/LED links. | Colorado, county, region, MSA, occupation, industry. | Priority source-of-record layer for Colorado labor-market pages. |
| Colorado occupational projections | CDLE projections: `https://cdle.colorado.gov/labor-law-stats/labor-market-information-lmi/industry-and-occupational-projections`; Colorado Open Data: `https://data.colorado.gov/Labor-and-Employment/Long-Term-Employment-Projections-in-Colorado/gyeb-jc69` | Ten-year occupational and industry projections, growth rates, replacement/opening context. | SOC, NAICS, geography, projection period. | Replace/validate generated `jobForecasts` and growth fields. |
| BLS OEWS | BLS OEWS tables: `https://www.bls.gov/oes/tables.htm` | Occupation employment and wage estimates for national, state, metro, nonmetro, and industry-specific views. | SOC, geography, ownership/industry where available, annual release. | Priority wage/employment benchmark. |
| BLS QCEW | BLS QCEW downloadable/open data: `https://www.bls.gov/cew/downloadable-data-files.htm` | Industry employment and wage by NAICS, ownership, county, quarter, and annual average; strong benchmark for regional industry health. | NAICS, county/state, ownership, quarter/year. | Priority industry and regional benchmark feed. |
| BLS Public Data API | BLS API: `https://www.bls.gov/bls/api_features.htm` | Programmatic refresh path for BLS time series in JSON/Excel where series IDs are available. | BLS series ID, period, year. | Build refresh adapter after source IDs are finalized. |
| Lightcast Skills | Lightcast Open Skills: `https://lightcast.io/open-skills`; API access: `https://docs.lightcast.io/lightcast-api/docs/api-access` | Market-facing skills normalization, skill identifiers, skill categories/subcategories, job-posting signal, career pathway intelligence if licensed. | Lightcast skill ID, occupation/title taxonomy, posting trend. | High-value enrichment feed; access/licensing required beyond open taxonomy. |
| O*NET database | O*NET database help: `https://www.onetonline.org/help/onet/database` | Occupational descriptors, tasks, knowledge, skills, abilities, work activities, job zone, interests, work values. | O*NET-SOC, SOC fallback, descriptor IDs. | Already imported; keep current with O*NET releases. |
| CareerOneStop APIs | CareerOneStop API Explorer: `https://api.careeronestop.org/api-explorer/` | Training programs, institutions, career resources, certifications, and occupation-supporting datasets. | Location, occupation, program, institution. | Candidate ETPL/training provider adapter. |
| Credential Engine Registry | Credential Engine APIs: `https://credentialengine.org/develop-solutions/apis/`; technical registry: `https://credreg.net/registry/searchapi` | CTDL-backed credential, organization, assessment, and pathway transparency. | CTID, credential, organization, occupation, learning opportunity. | Candidate credential registry adapter; API agreement required. |
| College Scorecard | College Scorecard API: `https://collegescorecard.ed.gov/data/api/` | Institution and field-of-study cost, completion, debt, and earnings indicators. | OPEID/UNITID, CIP, field of study, state. | Candidate education ROI and provider-quality feed. |
| IPEDS | NCES IPEDS Data Center: `https://nces.ed.gov/ipeds/datacenter/Default.aspx` | Institutional characteristics, completions, awards, program inventory, enrollment, finance, graduation. | UNITID, CIP, award level, year. | Candidate education-provider normalization feed. |
| Census LEHD/LODES | Census LEHD data: `https://lehd.ces.census.gov/data/` | Origin-destination employment, workplace/residence flows, worker characteristics, commute sheds, regional workforce movement. | Census block, state, workplace/residence area, industry, worker segment. | Candidate regional mobility and access layer. |
| Census ACS / data.census.gov | Census data portal: `https://data.census.gov/` | Demographics, educational attainment, income, commuting, broadband/access context by geography. | GEOID, ACS table, year. | Candidate context layer for equity/access views. |
Feed Architecture Targets
| Layer | Target behavior | Quality rule |
| --- | --- | --- |
| Source registry | Maintain a machine-readable registry of every feed, owner, URL, refresh cadence, license/access requirement, grain, join keys, and last successful load. | No dataset enters the warehouse without source metadata and freshness. |
| Raw landing | Preserve raw downloaded files/API payloads with immutable timestamps before transformation. | No silent overwrites; every refresh can be audited. |
| Crosswalks | Maintain SOC/O*NET-SOC, CIP, NAICS, UNITID/OPEID, CTID, region, county, MSA, and Colorado workforce-region crosswalks. | Crosswalk confidence must be explicit: exact, normalized, family fallback, or manual override. |
| Derived warehouse | Generate app-ready industry, occupation, regional, education, skill, forecast, and export tables from raw sources plus crosswalks. | Derived fields must point back to source feed and transform version. |
| Validation | Run row-count, uniqueness, null, stale-date, outlier, geography, code-system, and placeholder checks on every refresh. | Fail closed for broken numeric metrics; preserve qualitative records with clear empty states. |
| Evidence exports | Publish JSON/PDF/CSV outputs with source periods, freshness, and known gaps. | Public exports must not include credentials, raw private source paths, or workflow fixture records. |
Refresh And Validation Commands
npm run data:import-careers
npm run data:validate-careers
npm run db:setup
The importer accepts optional overrides:
node scripts/import-career-workbooks.mjs \
--qualitative /path/to/qualitative.xlsx \
--quantitative /path/to/quantitative.xlsx \
--indigo-soc-grouping /path/to/indigo.xlsx \
--out lib/career-data/generated/career-upload-data.json
Validation currently expects at least 10 industries, 50 careers, 800 occupations, 6,500 regional occupation records, 2,000 job forecasts, 100 workforce index rows, 650 O*NET-enriched occupations, 200 Top Job occupations, 100 Green Job occupations, 20 Critical Occupation occupations, unique occupation/regional/forecast keys, valid regional numbers, valid forecast confidence labels, and no loaded `#REF!` strings.
Database setup and operational details live in `docs/mvp/database-setup.md`. `npm run db:setup` validates the generated warehouse, applies Postgres schema migrations, syncs the warehouse tables, and validates database row counts against the generated JSON.
Current Generated Data Inventory
Counts are from `lib/career-data/generated/career-upload-data.json` as validated on 2026-06-10.
| Dataset | Count | Primary key | Description |
| --- | ---: | --- | --- |
| `metadata` | 1 | n/a | Source periods, file names, methodology, region-to-county map, row counts, completeness, and warnings. |
| `industries` | 16 | `id` | Uploaded industry sectors and overview text. |
| `careers` | 74 | `id` | Uploaded career families/pathway groupings within industries. |
| `occupations` | 810 | `occupationId` | Uploaded occupation records with SOC/O*NET/CIP, tags, narrative fields, O*NET enrichment, Indigo enrichment, and regional metrics. |
| `occupationRegionalRecords` | 6,785 | `id` | Chart-ready occupation/industry/region records with salary, openings, growth, index, tags, education, experience, certifications, and skills. |
| `workforceIndices` | 252 | `industryId + region` | Derived workforce gap, effort, challenge, opportunity, composite score, trend, projected change, and freshness. |
| `jobForecasts` | 2,805 | `industryId + region + year` | Derived current/projected jobs, growth, confidence, score, and explanatory factors. |
| `confidenceIntervals` | 2,550 | `industryId + year + implicit row grain` | Derived baseline, optimistic, and pessimistic forecast bands. |
Completeness snapshot:
| Metric | Value |
| --- | ---: |
| Occupations with SOC | 675 |
| Occupations without SOC | 135 |
| Occupations with O*NET enrichment | 810 |
| O*NET matched records | 664 |
| O*NET composite records | 146 |
| Occupations with IndigoPathway match | 773 |
| Occupations without IndigoPathway match | 37 |
| Occupations with any quantitative metric | 667 |
| Occupations without quantitative metric | 143 |
| Occupations with statewide wage | 660 |
| Top Job occupations | 219 |
| Green Job occupations | 116 |
| Critical Occupation occupations | 25 |
| Nested occupation regional metric rows | 9,669 |
Region Contract
Generated occupation metrics use `Statewide` plus `R1` through `R14`. The human-readable labels are `Statewide`, `Region 1`, ..., `Region 14`. County groupings live in `metadata.regionCountyMap`.
| Field | Type | Definition |
| --- | --- | --- |
| `regionKey` | string | `Statewide` or `R1` through `R14`. |
| `regionName` | string | Display label for the region. |
| `regionCountyMap` | object | Map of each regional key to included Colorado counties. |
Core Generated Entities
### `metadata`
| Field | Type | Definition |
| --- | --- | --- |
| `qualitativePeriod` | string | Period label for the qualitative workbook. Current value: `September 2025`. |
| `quantitativePeriod` | string | Period label for the quantitative workbook. Current value: `December 2025`. |
| `sourceFiles.qualitative` | string | Basename of qualitative source workbook. |
| `sourceFiles.quantitative` | string | Basename of quantitative source workbook. |
| `sourceFiles.onet` | string | O*NET source label or skip label. |
| `sourceFiles.indigoSocGrouping` | string | IndigoPathway source workbook label. |
| `generatedBy` | string | Import script path. Current value: `scripts/import-career-workbooks.mjs`. |
| `methodology.joinKeys` | string[] | Join strategy, currently uploaded occupation ID and SOC/O*NET fallback. |
| `methodology.onetEnrichment` | string | O*NET matching method. |
| `methodology.indigoEnrichment` | string | IndigoPathway matching method. |
| `methodology.wageMetric` | string | Wage measure definition. |
| `methodology.openingMetric` | string | Openings measure definition. |
| `methodology.growthMetric` | string | Growth measure definition. |
| `methodology.indexMethod` | string | How app indices are derived. |
| `validation.sourceRows` | object | Input workbook row counts by source sheet/category. |
| `validation.loadedRows` | object | Generated output row counts by dataset. |
| `validation.completeness` | object | Coverage counts for SOC, O*NET, Indigo, tags, metrics, and wages. |
| `validation.warnings` | string[] | Known source/processing caveats retained with the dataset. |
### `industries`
| Field | Type | Definition |
| --- | --- | --- |
| `id` | string | Slug used in routes and joins. |
| `name` | string | Display industry name. |
| `overview` | string | Uploaded industry overview narrative. |
| `tidbits` | string[] | Uploaded supporting facts/highlights for the industry page. |
Industry IDs currently include `advanced-manufacturing`, `aerospace`, `agriculture-and-natural-resources`, `behavioral-health`, `business-operations`, `construction`, `creative-industries`, `cybersecurity`, `education`, `energy`, `healthcare`, `information-technology`, `public-health`, `public-safety`, `retail`, and `transportation`.
### `careers`
| Field | Type | Definition |
| --- | --- | --- |
| `id` | string | Uploaded career family ID used by `occupations[].careerCodes`. |
| `name` | string | Display career family/pathway name. |
| `industryName` | string | Uploaded parent industry display name. |
| `teaser` | string | Short overview used in pathway summaries. |
| `expectedPay` | string | Uploaded pay narrative. |
| `buzz` | string | Uploaded career-buzz or outlook narrative. |
| `companies` | string | Uploaded employer/company narrative. |
| `workers` | string | Uploaded worker/demographic narrative. |
### `occupations`
| Field | Type | Definition |
| --- | --- | --- |
| `occupationId` | string | Primary occupation ID from the upload; also used as the occupation route ID. |
| `name` | string | Occupation display title. |
| `soc` | string | Normalized 2018 SOC base code when available. Empty string means qualitative-only/no SOC. |
| `onetSoc` | string | Full O*NET-SOC code used for enrichment. |
| `industry` | string | Industry display name. |
| `industryId` | string | Industry slug join to `industries.id`. |
| `careerCodes` | string[] | Career family IDs joined to `careers.id`. |
| `topJob` | boolean | Colorado Talent Pipeline Report-aligned Top Job flag. |
| `greenJob` | boolean | O*NET green economy classification flag. |
| `criticalOccupation` | boolean | Industry-partner critical occupation flag. |
| `skillLevel` | number/null | Uploaded or derived skill level when present. |
| `childOccupationIds` | string[] | Child occupation IDs for grouped/composite pathways. |
| `alsoKnownAs` | string[] | O*NET reported/alternate job titles. |
| `description` | string | Occupation description, usually O*NET-backed when matched. |
| `cipCodes` | string[] | CIP codes retained for education/crosswalk matching. |
| `knowledge` | string | Semicolon-delimited knowledge text for display. |
| `skills` | string | Semicolon-delimited skills text for display. |
| `abilities` | string | Semicolon-delimited abilities text for display. |
| `importantCompetencies` | string | Uploaded competency narrative. |
| `credentialRequirement` | string | Credential requirement narrative. |
| `credentialDetail` | string | Credential detail narrative. |
| `workExperienceRequirement` | string | Work experience requirement narrative. |
| `workExperienceDetail` | string | Work experience detail narrative. |
| `typicalEducation` | string | Typical education label/narrative. |
| `typicalPrograms` | string | Typical program narrative. |
| `training` | string | Training narrative. |
| `workBasedLearning` | string | Work-based learning narrative. |
| `remoteWork` | string | Remote-work narrative. |
| `extraNotes` | string[] | Additional source notes after placeholder cleanup. |
| `onet` | object | O*NET enrichment block. |
| `indigo` | object | IndigoPathway enrichment block. |
| `regionalMetrics` | object[] | Nested metrics by statewide/region key. |
### `occupations[].onet`
| Field | Type | Definition |
| --- | --- | --- |
| `source` | string | O*NET source label. |
| `status` | `matched`/`composite`/`not-found`/`missing-soc` | Match status. Current generated data has `matched` and `composite` only. |
| `title` | string | O*NET occupation title. |
| `description` | string | O*NET occupation description. |
| `topKnowledge` | string[] | Top ranked O*NET knowledge areas. |
| `topSkills` | string[] | Top ranked O*NET skills. |
| `topAbilities` | string[] | Top ranked O*NET abilities. |
| `tasks` | string[] | Top/core O*NET task statements. |
| `jobZone` | number/null | O*NET Job Zone. |
| `representativeOccupations` | object[] | Representative O*NET occupations used for composite/grouped pathways. |
### `occupations[].indigo`
| Field | Type | Definition |
| --- | --- | --- |
| `source` | string | IndigoPathway workbook label. |
| `status` | `matched`/`not-found` | Indigo SOC grouping match status. |
| `categories` | string[] | Indigo pathway categories. |
| `subCategories` | string[] | Indigo pathway subcategories. |
| `pathwayNames` | string[] | Indigo pathway names. |
| `topTips` | string[] | Indigo guidance/tips text. |
| `associations` | string[] | Association names/links from Indigo source. |
| `workStyle.workOnComputer` | string[] | Work-style flag values. |
| `workStyle.workOutside` | string[] | Work-style flag values. |
| `workStyle.workWithHands` | string[] | Work-style flag values. |
| `workStyle.workWithKids` | string[] | Work-style flag values. |
| `profileScores` | object | Indigo profile score map with numeric/null values. |
| `matchedSocRows` | object[] | Source SOC grouping rows retained for audit/debugging. |
### `occupations[].regionalMetrics`
| Field | Type | Definition |
| --- | --- | --- |
| `regionKey` | string | `Statewide` or regional code. |
| `regionName` | string | Region display label. |
| `employment` | number/null | Employment estimate when available. |
| `hourlyWage` | number/null | Median hourly wage when available. |
| `annualWage` | number/null | Median annual wage when available. |
| `annualOpenings` | number/null | Annual openings when available. |
| `annualGrowth` | number/null | Annual growth rate when available. |
| `growth10Years` | number/null | 10-year growth percentage/rate when available. |
| `openings2024To2034` | number/null | 2024-2034 opening count when available. |
| `source` | string | Regional metric source indicator, such as `curated-salary-sheet+raw-region-sheet`. |
Note: `lib/career-data/upload-analytics.ts` and `lib/career-data/profiles.ts` now expose the richer generated metric fields (`employment`, `hourlyWage`, `annualGrowth`, `openings2024To2034`, `source`) in addition to the app's core wage/openings/growth fields.
### `occupationRegionalRecords`
| Field | Type | Definition |
| --- | --- | --- |
| `id` | string | Unique chart/export record ID. |
| `title` | string | Occupation title. |
| `industry` | string | Industry display name. |
| `region` | string | Region display name. |
| `openings` | number | Chart-ready openings count. |
| `avgSalary` | number | Chart-ready average/median salary value. |
| `growthRate` | number | Chart-ready growth rate. |
| `gapIndex` | number | Derived supply-demand gap score. |
| `topJob` | boolean | Top Job flag. |
| `greenJob` | boolean | Green Job flag. |
| `criticalOccupation` | boolean | Critical Occupation flag. |
| `educationRequired` | string | Display education requirement. |
| `experienceLevel` | string | Display experience level. |
| `certifications` | string[] | Display certification labels. |
| `topSkills` | string[] | Top skill labels for this record. |
### `workforceIndices`
| Field | Type | Definition |
| --- | --- | --- |
| `industryId` | string | Industry slug. |
| `industryName` | string | Industry display name. |
| `region` | string | Region display name. |
| `gapIndex` | number | 0-100 supply-demand imbalance score; higher means more shortage. |
| `effortIndex` | number | 0-100 training intensity score; higher means more effort. |
| `challengeIndex` | number | 0-100 barrier score; higher means more barriers. |
| `opportunityIndex` | number | 0-100 advancement score; higher means more opportunity. |
| `compositeScore` | number | Weighted summary score. |
| `trend` | `up`/`down`/`stable` | Derived trend label. |
| `projectedChange` | number | Expected percent change. |
| `lastUpdated` | string | Freshness date string. |
### `jobForecasts`
| Field | Type | Definition |
| --- | --- | --- |
| `year` | number | Forecast year. |
| `industryId` | string | Industry slug. |
| `industryName` | string | Industry display name. |
| `region` | string | Region display name. |
| `currentJobs` | number | Current/base job estimate. |
| `projectedJobs` | number | Projected job estimate. |
| `growthRate` | number | Forecast growth rate. |
| `confidence` | `high`/`medium`/`low` | Confidence label validated by script. |
| `confidenceScore` | number | Numeric confidence score. |
| `factors` | string[] | Driver/explanation labels. |
### `confidenceIntervals`
| Field | Type | Definition |
| --- | --- | --- |
| `year` | number | Forecast year. |
| `industryId` | string | Industry slug. |
| `baseline` | number | Baseline forecast value. |
| `optimistic` | number | Upper/optimistic forecast value. |
| `pessimistic` | number | Lower/pessimistic forecast value. |
App-Facing Derived Entities
Defined in `lib/career-data/profiles.ts`.
| Entity | Base data | Added/derived fields |
| --- | --- | --- |
| `OccupationProfile` | `CareerUploadOccupation` | `routeId`, `tags`, `sourceMetadata`, `regionalMetricsWithSource`, `educationProgramMatches`. |
| `CareerFamily` | `CareerUploadFamily` | Joined `occupations`, `topTags`, `medianAnnualWage`, `annualOpenings`. |
| `IndustryProfile` | `CareerUploadIndustry` | `routeId`, joined `careerFamilies`, joined/sorted `occupations`, `sourceMetadata`, aggregated `regionalMetrics`, `availableRegions`, `tagCounts`, and top education matches. |
| `CareerTag` | Occupation flags | `topJob`, `greenJob`, and `criticalOccupation` labels, definitions, and source notes. |
| `EducationProgramMatch` | Occupation SOC/CIP plus mock ETPL | SOC exact first, SOC family fallback, CIP-ready placeholder when provider data is pending. |
| `DataSourceMetadata` | Generated `metadata` | Source periods/files, methodology, generated-by path, warnings. |
API Export Contracts
All export routes are static (`dynamic = 'force-static'`) and return `{ metadata, data }`.
### `/api/exports/occupations`
`metadata` fields:
| Field | Type | Definition |
| --- | --- | --- |
| `exportName` | string | `occupations`. |
| `recordCount` | number | Occupation count from readiness summary. |
| `source` | object | Source metadata. |
| `readiness` | boolean | Occupation export readiness flag. |
`data[]` fields:
| Field | Type | Definition |
| --- | --- | --- |
| `id` | string | Occupation uploaded ID. |
| `title` | string | Occupation title. |
| `industryId` | string | Industry slug. |
| `industry` | string | Industry display name. |
| `soc` | string | SOC code. |
| `onetSoc` | string | O*NET-SOC code. |
| `cipCodes` | string[] | CIP code list. |
| `careerFamilyIds` | string[] | Career family IDs. |
| `skillLevel` | number/null | Skill level. |
| `tags` | string[] | Human-readable occupation tags. |
| `typicalEducation` | string | Typical education. |
| `credentialRequirement` | string | Credential requirement. |
| `workExperienceRequirement` | string | Work experience requirement. |
| `regionalMetrics` | object[] | Region-level wage/openings/growth metrics with source period. |
| `onet` | object | O*NET enrichment block. |
| `indigo` | object | Indigo enrichment block. |
| `source` | object | Source metadata. |
### `/api/exports/industry-pathways`
`data[]` fields:
| Field | Type | Definition |
| --- | --- | --- |
| `id` | string | Industry slug. |
| `name` | string | Industry display name. |
| `overview` | string | Industry overview. |
| `tidbits` | string[] | Industry facts/highlights. |
| `tagCounts` | object | Counts of Top Job, Green Job, and Critical Occupation occupations. |
| `regionalMetrics` | object[] | Aggregated industry regional metrics. |
| `careerFamilies` | object[] | Family summaries with ID, name, overview, occupation IDs, annual openings, and median annual wage. |
| `source` | object | Source metadata. |
### `/api/exports/education-crosswalk`
`metadata.mappingRule`: `SOC exact first, SOC family fallback for mock ETPL, CIP retained for Credential Registry connection.`
`data[]` fields:
| Field | Type | Definition |
| --- | --- | --- |
| `occupationId` | string | Occupation uploaded ID. |
| `title` | string | Occupation title. |
| `soc` | string | SOC code. |
| `onetSoc` | string | O*NET-SOC code. |
| `cipCodes` | string[] | CIP codes retained for connected source matching. |
| `strategy` | string | Match strategy statement. |
| `matches` | object[] | Provider/program match summaries. |
`matches[]` fields:
| Field | Type | Definition |
| --- | --- | --- |
| `matchType` | `soc-exact`/`soc-family`/`cip-ready` | Match route. |
| `sourceStatus` | `mock-etpl`/`ready-for-connected-source` | Whether the row came from local mock data or a connected-source placeholder. |
| `providerId` | string | Provider ID. |
| `providerName` | string | Provider display name. |
| `programId` | string | Program ID. |
| `programName` | string | Program display name. |
| `credentialEarned` | string | Credential label. |
| `deliveryMode` | string | In-person, online, or hybrid. |
Fixture Data Contracts
### ETPL Provider Fixtures
Defined in `lib/etpl/mock-providers.ts`.
| Entity | Key | Fields |
| --- | --- | --- |
| `TrainingProvider` | `id` | `name`, `type`, `address`, `city`, `region`, `phone`, `website`, `accreditation`, `programs`. |
| `TrainingProgram` | `id` | `name`, `providerId`, `socCodes`, `duration`, `cost`, `completionRate`, `placementRate`, `credentialEarned`, `deliveryMode`, `startDates`. |
Use this as fixture data only. The app already has SOC exact/family matching and CIP-ready fallback semantics for a future ETPL/Credential Registry source.
### Outcomes Fixtures
Defined in `lib/slds/mock-outcomes.ts`.
| Entity | Key | Fields |
| --- | --- | --- |
| `ProgramOutcome` | `programId` | Provider/program, SOC, completer count, employment rate, wage progression, ROI, time to employment, retention, advancement, and industry alignment. |
| `IndustryOutcome` | `industryId` | Total completers, average employment/wage/ROI, top programs, and demand outlook. |
| `RegionalOutcome` | `region` | Total completers, average employment rate, average year-one wage, and top industries. |
| `WageProgression` | n/a | `year1`, `year3`, `year5`, optional `year10`. |
Use this as fixture data only until a connected outcomes source is approved.
### Privacy-Safe Analytics Events
Defined in `lib/analytics/no-pii-events.ts`.
| Event | Required fields | Optional fields |
| --- | --- | --- |
| `industry_viewed` | `industryId` | `region` |
| `occupation_viewed` | `occupationId`, `industryId` | `region` |
| `education_link_opened` | none | `occupationId`, `providerId`, `programId` |
| `export_downloaded` | `exportName` | none |
`createNoPiiEvent` adds `occurredAt` as an ISO timestamp. `sessionBucket` is optional on the event envelope and should remain non-identifying.
Data Quality And Cleanup Backlog
| Item | Status | Owner/action |
| --- | --- | --- |
| Missing SOC codes | 135 occupations retained as qualitative-only records. | Keep visible but exclude from wage/opening charts where SOC-backed quantitative metrics are required. |
| Missing quantitative metrics | 143 occupations have no regional metrics. | Preserve qualitative records and show null/empty states rather than dropping occupations. |
| Missing IndigoPathway match | 37 occupations do not have Indigo SOC grouping. | Review SOC/title mapping and add explicit representative SOC overrides when warranted. |
| CIP/provider matching | CIP codes are retained, but provider data is fixture-backed. | Connect ETPL/Credential Registry and replace `cip-ready` placeholders with live matches. |
| Regional metric contract | Generated nested regional metrics include richer fields than the core UI usually displays. | Keep `upload-analytics.ts`, `profiles.ts`, and this dictionary aligned when importer fields change. |
| Static regional UI data | `lib/career-data/regional-data.ts` contains hard-coded city/employer/projection values separate from the generated warehouse. | Reconcile or label as demo/supporting data wherever displayed. |
| Legacy workforce module | `lib/workforce-data.ts` contains static/demo constants while generated JSON also exports workforce tables. | Prefer generated tables for current career-upload analytics and keep legacy consumers explicit. |
| External links | Readiness summary counts distinct links in occupation detail text. | Batch health-check before public launch. |
Public/Private Boundaries
- Public/stakeholder-safe: generated career upload data, source metadata, readiness counts, occupation/industry/education export JSON, and privacy-safe analytics events.
- Fixture-only: ETPL providers/programs, outcomes, and static regional city/employer/projection data.
- Private/protected: source workbook file paths, any future live provider/outcomes credentials, and analytics session identifiers that could identify a person.
Readiness Checklist
- Run `npm run data:validate-careers` after any generated JSON refresh.
- Compare the generated PDF against this markdown before reviewer handoff; the current committed PDF is a 2026-06-03 snapshot.
- Confirm export routes still return the expected `{ metadata, data }` structure.
- Confirm counts in this dictionary after a workbook refresh.
- Confirm no `#REF!`, `Insf. Data`, or placeholder strings leak into numeric analytics.
- Keep mock/demo data clearly labeled until connected source systems are wired.