Migration Assessor

Overview

The Migration Assessor is the strategic planning layer that sits above the Schema Mapper and Payload Translator. While those services handle the mechanics of mapping and transforming data, the Migration Assessor evaluates the full picture: how complex is the migration, what risks are hiding beneath the surface, and what does a realistic phased plan look like? It produces a weighted complexity score, categorized risk inventory, multi-phase migration plan, and calibrated effort estimates.

Complexity Scoring

Eight weighted factors combine into a single 0-100 complexity score.

Factor	Weight	Scoring Criteria
Schema size	`0.15`	Total number of fields, types, and nested structures across source and target schemas.
Structural divergence	`0.20`	Degree of difference in hierarchy, grouping, and nesting between source and target.
Type mismatches	`0.15`	Count and severity of fields that require type coercion or lossy conversion.
Enum complexity	`0.10`	Unmapped enum values, many-to-one mappings, and missing default handling.
Nesting depth delta	`0.10`	Difference in maximum nesting depth requiring flatten or expand transformations.
Unmappable field ratio	`0.15`	Percentage of source fields with no viable target counterpart.
Data volume	`0.05`	Expected record count and payload sizes affecting migration window and throughput.
Domain complexity	`0.10`	Business rule density, cross-entity dependencies, and regulatory constraints.

Risk Analysis

Risks are categorized across four dimensions to ensure nothing is overlooked.

⚠

Data Loss Risks

Unmapped fields that will be dropped, precision loss from type narrowing (e.g., float64 to float32), and truncation from length constraint differences.

♦

Semantic Risks

Fields that share a name but carry different business meanings, divergent enum semantics, and context-dependent value interpretation across systems.

▲

Volume Risks

Large record counts that exceed migration window constraints, payload sizes that challenge network throughput, and batch processing bottlenecks.

◆

Regulatory Risks

PII fields requiring special handling, PHI data subject to HIPAA constraints, cross-border data residency rules, and audit trail requirements.

Migration Plan

A four-phase approach that moves from high-confidence automation to manual review.

1

Auto-Mappable Fields

60-70% of fields typically fall here. These have a mapping confidence above 0.8 and can be migrated automatically with no human review. Includes exact name matches, direct type compatibilities, and well-known aliases.

2

Review-Required Fields

Fields with mapping confidence between 0.5 and 0.8. The system proposes a mapping but flags it for human verification. Typically involves semantic near-matches, partial type overlaps, or ambiguous field names.

3

Custom Development

Unmappable fields that require bespoke transformation logic, data enrichment from external sources, or entirely new field derivations. Each item includes a complexity estimate and suggested approach.

4

Validation & Reconciliation

Post-migration verification including record count reconciliation, checksum validation, referential integrity checks, and business rule assertion testing across the migrated dataset.

Request & Response

Submit source and target schemas to receive a full migration assessment.

POST /api/v1/migration/assess Request

{
  "source_schema": {
    "format": "json_schema",
    "content": {
      "type": "object",
      "properties": {
        "patient_id": { "type": "integer" },
        "full_name": { "type": "string" },
        "dob": { "type": "string" },
        "ssn": { "type": "string" },
        "diagnosis_codes": {
          "type": "array",
          "items": { "type": "string" }
        }
      }
    }
  },
  "target_schema": {
    "format": "json_schema",
    "content": {
      "type": "object",
      "properties": {
        "id": { "type": "string" },
        "first_name": { "type": "string" },
        "last_name": { "type": "string" },
        "date_of_birth": { "type": "string" },
        "icd10_codes": {
          "type": "array",
          "items": { "type": "string" }
        }
      }
    }
  },
  "options": {
    "depth": "comprehensive",
    "estimated_record_count": 250000
  }
}

200 OK Response

{
  "complexity_score": 62,
  "risk_level": "high",
  "risks": [
    {
      "category": "data_loss",
      "field": "ssn",
      "detail": "No target field found. PII data will be dropped.",
      "severity": "critical"
    },
    {
      "category": "semantic",
      "field": "full_name",
      "detail": "Single field must split into first_name + last_name.",
      "severity": "medium"
    },
    {
      "category": "regulatory",
      "field": "ssn",
      "detail": "PII field requires audit trail for deletion.",
      "severity": "high"
    }
  ],
  "migration_plan": {
    "phase_1_auto": ["dob -> date_of_birth", "diagnosis_codes -> icd10_codes"],
    "phase_2_review": ["patient_id -> id (type coercion)"],
    "phase_3_custom": ["full_name -> first_name + last_name (split)"],
    "phase_4_validate": ["record_count_check", "pii_audit"]
  },
  "effort_estimate": {
    "engineering_days": 8.5,
    "confidence": 0.72,
    "breakdown": {
      "auto_mapping": "0.5 days",
      "review_mapping": "1 day",
      "custom_dev": "3 days",
      "validation": "2 days",
      "buffer": "2 days"
    }
  }
}

Analysis Depth

Choose the level of analysis that fits your planning stage.

⚡

Quick

Structural comparison only. Returns complexity score and top-level risks within seconds. Best for initial triage and feasibility checks.

⚙

Standard

Structural plus semantic analysis. Includes migration plan and effort estimate. The default for most assessments -- balances depth with speed.

★

Comprehensive

Full analysis with regulatory scanning, consumer impact modeling, and detailed per-field migration playbooks. Recommended for production migrations.

Effort Estimate

The assessor produces a calibrated effort estimate expressed in engineering days. The estimate is broken down by migration phase -- auto-mapping, review, custom development, validation, and a risk-adjusted buffer. A confidence score (0 to 1) indicates how reliable the estimate is based on the completeness of the input schemas and the proportion of unambiguous mappings.

Effort Estimate Sample JSON

{
  "effort_estimate": {
    "engineering_days": 8.5,
    "confidence": 0.72,
    "breakdown": {
      "auto_mapping": "0.5 days",
      "review_mapping": "1 day",
      "custom_dev": "3 days",
      "validation": "2 days",
      "buffer": "2 days"
    },
    "notes": [
      "Custom dev estimate elevated due to name-splitting logic",
      "Buffer accounts for PII audit and regulatory review"
    ]
  }
}