Seven Quality Dimensions

Each dimension is scored independently from 0 to 1.0, then combined into a weighted composite.

Request & Response

Submit a dataset with an optional schema for a full quality assessment.

POST /api/v1/quality/score Request
{
  "schema": {
    "fields": [
      { "name": "id",    "type": "integer", "required": true,  "unique": true },
      { "name": "email", "type": "string",  "required": true,  "format": "email" },
      { "name": "age",   "type": "integer", "required": false, "min": 0, "max": 150 },
      { "name": "role",  "type": "string",  "required": true,  "enum": ["admin", "user", "viewer"] }
    ]
  },
  "data": [
    { "id": 1, "email": "[email protected]", "age": 29,   "role": "admin" },
    { "id": 2, "email": "[email protected]",  "age": -5,   "role": "user" },
    { "id": 3, "email": "not-an-email",     "age": null, "role": "superadmin" }
  ]
}
200 OK Response
{
  "composite_score": 0.72,
  "dimensions": {
    "completeness":         0.92,
    "type_conformance":     0.92,
    "uniqueness":            1.00,
    "format_consistency":   0.67,
    "range_validity":        0.67,
    "referential_integrity": 0.67,
    "semantic_anomaly":      0.67
  },
  "record_count": 3,
  "field_count": 4,
  "issues_found": 4,
  "duration_ms": 12.8
}

Suggested Fixes

The API returns actionable fix suggestions for every issue detected.

Suggested Fixes Example Included in response
{
  "suggested_fixes": [
    {
      "record": 1,
      "field": "age",
      "issue": "range_violation",
      "severity": "error",
      "message": "Value -5 is below minimum 0",
      "suggestion": "Verify source data. If age is unknown, set to null rather than a negative value."
    },
    {
      "record": 2,
      "field": "email",
      "issue": "format_mismatch",
      "severity": "error",
      "message": "'not-an-email' does not match email format",
      "suggestion": "Validate email format at ingestion. Expected pattern: [email protected]"
    },
    {
      "record": 2,
      "field": "role",
      "issue": "enum_violation",
      "severity": "warning",
      "message": "'superadmin' is not in allowed values [admin, user, viewer]",
      "suggestion": "Map to closest allowed value 'admin' or update the schema enum list."
    },
    {
      "record": 2,
      "field": "age",
      "issue": "missing_required",
      "severity": "info",
      "message": "Optional field 'age' is null",
      "suggestion": "Consider collecting age data or marking as not applicable."
    }
  ]
}

Scoring Methodology

The composite score is a weighted average of all seven dimensions.

Dimension Default Weight Description
completeness 0.20 Non-null field coverage, required fields weighted 2x
type_conformance 0.20 Values matching declared schema types
uniqueness 0.15 Distinct-to-total ratio on key fields
format_consistency 0.15 Uniform formatting within columns
range_validity 0.10 Values within IQR or specified bounds
referential_integrity 0.10 Cross-field and cross-table consistency
semantic_anomaly 0.10 LLM-detected semantic issues
Composite Score Formula Weighted Average
// composite_score = sum(dimension_score * weight) / sum(weights)
//
// Example with default weights:
//   (0.92 * 0.20) + (0.92 * 0.20) + (1.00 * 0.15) +
//   (0.67 * 0.15) + (0.67 * 0.10) + (0.67 * 0.10) +
//   (0.67 * 0.10)
//   = 0.184 + 0.184 + 0.150 + 0.101 + 0.067 + 0.067 + 0.067
//   = 0.82
//
// Weights are customizable per request via the
// "weights" object in the request body.