Completeness 0 – 1.0
Percentage of non-null fields across all records. Required fields are weighted 2x compared to optional fields, ensuring critical data gaps are penalized more heavily.
Automated quality assessment across seven dimensions. Get a composite score, per-dimension breakdowns, and actionable fix suggestions for any dataset.
Each dimension is scored independently from 0 to 1.0, then combined into a weighted composite.
Percentage of non-null fields across all records. Required fields are weighted 2x compared to optional fields, ensuring critical data gaps are penalized more heavily.
Ratio of values that match their declared type in the schema. Detects strings in numeric fields, malformed dates, and other type mismatches.
Duplicate detection on designated key fields. Measures the ratio of distinct values to total values, flagging exact and near-duplicates.
Detects mixed formats within a single column, such as dates alternating between "MM/DD/YYYY" and "YYYY-MM-DD", or phone numbers with inconsistent separators.
Outlier detection using the Interquartile Range (IQR) method. Values beyond 1.5x IQR from Q1/Q3 are flagged. Custom min/max bounds can also be supplied.
Cross-field and cross-table consistency checks. Validates that foreign key references resolve, enum values are within allowed sets, and dependent fields are logically consistent.
LLM-powered detection of semantically invalid data that passes structural checks. Catches values like a city named "12345" or an email in a phone field.
Submit a dataset with an optional schema for a full quality assessment.
{
"schema": {
"fields": [
{ "name": "id", "type": "integer", "required": true, "unique": true },
{ "name": "email", "type": "string", "required": true, "format": "email" },
{ "name": "age", "type": "integer", "required": false, "min": 0, "max": 150 },
{ "name": "role", "type": "string", "required": true, "enum": ["admin", "user", "viewer"] }
]
},
"data": [
{ "id": 1, "email": "[email protected]", "age": 29, "role": "admin" },
{ "id": 2, "email": "[email protected]", "age": -5, "role": "user" },
{ "id": 3, "email": "not-an-email", "age": null, "role": "superadmin" }
]
}
{
"composite_score": 0.72,
"dimensions": {
"completeness": 0.92,
"type_conformance": 0.92,
"uniqueness": 1.00,
"format_consistency": 0.67,
"range_validity": 0.67,
"referential_integrity": 0.67,
"semantic_anomaly": 0.67
},
"record_count": 3,
"field_count": 4,
"issues_found": 4,
"duration_ms": 12.8
}
The API returns actionable fix suggestions for every issue detected.
{
"suggested_fixes": [
{
"record": 1,
"field": "age",
"issue": "range_violation",
"severity": "error",
"message": "Value -5 is below minimum 0",
"suggestion": "Verify source data. If age is unknown, set to null rather than a negative value."
},
{
"record": 2,
"field": "email",
"issue": "format_mismatch",
"severity": "error",
"message": "'not-an-email' does not match email format",
"suggestion": "Validate email format at ingestion. Expected pattern: [email protected]"
},
{
"record": 2,
"field": "role",
"issue": "enum_violation",
"severity": "warning",
"message": "'superadmin' is not in allowed values [admin, user, viewer]",
"suggestion": "Map to closest allowed value 'admin' or update the schema enum list."
},
{
"record": 2,
"field": "age",
"issue": "missing_required",
"severity": "info",
"message": "Optional field 'age' is null",
"suggestion": "Consider collecting age data or marking as not applicable."
}
]
}
The composite score is a weighted average of all seven dimensions.
| Dimension | Default Weight | Description |
|---|---|---|
completeness |
0.20 | Non-null field coverage, required fields weighted 2x |
type_conformance |
0.20 | Values matching declared schema types |
uniqueness |
0.15 | Distinct-to-total ratio on key fields |
format_consistency |
0.15 | Uniform formatting within columns |
range_validity |
0.10 | Values within IQR or specified bounds |
referential_integrity |
0.10 | Cross-field and cross-table consistency |
semantic_anomaly |
0.10 | LLM-detected semantic issues |
// composite_score = sum(dimension_score * weight) / sum(weights) // // Example with default weights: // (0.92 * 0.20) + (0.92 * 0.20) + (1.00 * 0.15) + // (0.67 * 0.15) + (0.67 * 0.10) + (0.67 * 0.10) + // (0.67 * 0.10) // = 0.184 + 0.184 + 0.150 + 0.101 + 0.067 + 0.067 + 0.067 // = 0.82 // // Weights are customizable per request via the // "weights" object in the request body.