Schema evolution and type casting
Overview
DataStori app will attempt to cast incoming data columns to match the existing table schema. If casting fails, it will raise a ValueError with detailed error information.
Current State Analysis
The current implementation includes:
- Complex validation logic with regex patterns for String→Numeric conversions
- Precision loss warnings for Double/Float→Integer conversions
- Special handling for NullType columns
- Multiple validation cases before casting
Type Conversion Matrix
| Source Type | Target Type | Conversion Result | Error Condition |
|------------|-------------|------------------|-----------------|
| NullType | Any Type | Schema evolution (no cast) | Never errors |
| StringType | IntegerType | Parses string to integer | Invalid numeric strings |
| StringType | LongType | Parses string to long | Invalid numeric strings |
| StringType | DoubleType | Parses string to double | Invalid numeric/decimal strings |
| StringType | FloatType | Parses string to float | Invalid numeric/decimal strings |
| StringType | DecimalType | Parses string to decimal | Invalid numeric/decimal strings |
| StringType | DateType | Parses date string | Invalid date format |
| StringType | TimestampType | Parses timestamp string | Invalid timestamp format |
| StringType | BooleanType | Parses boolean string | Invalid boolean string |
| IntegerType | LongType | Widening (safe) | Never errors |
| IntegerType | DoubleType | Widening (safe) | Never errors |
| IntegerType | FloatType | Widening (safe) | Never errors |
| IntegerType | DecimalType | Conversion (safe) | Never errors |
| IntegerType | StringType | String representation | Never errors |
| LongType | IntegerType | Narrowing (may overflow) | Value exceeds Integer range |
| LongType | DoubleType | Widening (safe) | Never errors |
| LongType | FloatType | Widening (safe) | Never errors |
| LongType | DecimalType | Conversion (safe) | Never errors |
| LongType | StringType | String representation | Never errors |
| DoubleType | IntegerType | Truncation (precision loss) | Never errors (truncates) |
| DoubleType | LongType | Truncation (precision loss) | Never errors (truncates) |
| DoubleType | FloatType | Narrowing (precision loss) | Never errors (may lose precision) |
| DoubleType | DecimalType | Conversion | Never errors |
| DoubleType | StringType | String representation | Never errors |
| FloatType | IntegerType | Truncation (precision loss) | Never errors (truncates) |
| FloatType | LongType | Truncation (precision loss) | Never errors (truncates) |
| FloatType | DoubleType | Widening (safe) | Never errors |
| FloatType | DecimalType | Conversion | Never errors |
| FloatType | StringType | String representation | Never errors |
| DecimalType | IntegerType | Truncation (precision loss) | Never errors (truncates) |
| DecimalType | LongType | Truncation (precision loss) | Never errors (truncates) |
| DecimalType | DoubleType | Conversion | Never errors |
| DecimalType | FloatType | Conversion | Never errors |
| DecimalType | StringType | String representation | Never errors |
| DateType | TimestampType | Adds time component (00:00:00) | Never errors |
| DateType | StringType | String representation | Never errors |
| TimestampType | DateType | Removes time component | Never errors |
| TimestampType | StringType | String representation | Never errors |
| BooleanType | StringType | String representation | Never errors |
| BooleanType | IntegerType | true→1, false→0 | Never errors |
| Any Type | NullType | Not supported | Always errors |
3. Strategy-Specific Behavior
Full Refresh (OVERWRITE)
- Behavior: Existing table is completely replaced
- Casting: Incoming data is cast to match existing schema before overwrite
- Schema Evolution: Uses
overwriteSchema=true, so schema can change - Impact: If casting fails, entire operation fails before write
Full Refresh Append
- Behavior: Incoming data is appended to existing table
- Casting: Incoming data is cast to match existing schema before append
- Schema Evolution: Uses
mergeSchema=truefor new columns - Impact: If casting fails, entire operation fails before append
- Note: Existing rows are preserved; only incoming rows are cast
Incremental Dedupe History (UPSERT)
- Behavior: Updates matching rows, inserts new rows
- Casting: Incoming data is cast to match existing schema before merge
- Schema Evolution: Uses
mergeSchema=truefor new columns - Impact: If casting fails, entire operation fails before merge
- Note: Existing rows keep their types; only incoming rows are cast
Incremental Drop and Load
- Behavior: Deletes matching rows, then appends incoming data
- Casting: Incoming data is cast to match existing schema before append
- Schema Evolution: Uses
mergeSchema=truefor new columns - Impact: If casting fails, entire operation fails before delete/append
- Note: Matching existing rows are deleted; incoming rows are cast and appended
Error Message Format
ValueError: Failed to cast column '{column_name}' from {source_type} to {target_type}.
Error: {original_error_message}