Skip to main content

Understanding Pipeline States

This guide helps you understand the different states your data pipelines can be in and how to interpret the status information displayed in your DataStori dashboard.

Table of Contents

  1. Overview
  2. State Hierarchy
  3. Pipeline States
  4. Task States
  5. Data Quality Test States
  6. Understanding State Transitions

Overview

DataStori tracks pipeline execution using a hierarchical state system with three levels:

  1. Pipeline Level - Overall status of your entire pipeline execution
  2. Task Level - Status of individual steps within your pipeline (download, transform, test)
  3. Test Level - Results of data quality checks

Understanding these states helps you monitor your pipeline executions and quickly identify any issues.


State Hierarchy

Your pipeline execution follows this hierarchy:

Pipeline Execution
├── Pipeline Status (RUNNING → COMPLETED/FAILED)
├── Download Task (RUNNING → RETRYING → COMPLETED/FAILED)
├── Transform Task (RUNNING → RETRYING → COMPLETED/FAILED)
└── Test Results (PASSED/FAILED/WARNING)

Understanding Overall vs. Task Status

The dashboard displays both pipeline-level and task-level statuses. Here's how to interpret them:

Status LevelWhat It ShowsInterpretation
Pipeline StatusOverall execution statusThe main status of your entire pipeline run
Task StatusIndividual step progressStatus of specific tasks within the pipeline

Key Points:

  • ✅ The pipeline status is the primary indicator of your execution's overall state
  • Task statuses show you the progress of individual steps
  • ✅ Multiple tasks can show as COMPLETED while the pipeline is still RUNNING
  • ✅ Only when the pipeline status shows COMPLETED or FAILED is the entire execution finished

Pipeline States

Pipeline states represent the overall status of your data pipeline execution. This is the primary status you'll see on your dashboard.

RUNNING

What it means: Your pipeline has started and is currently processing data.

  • All pipeline tasks are being executed
  • Data is actively being moved from source to destination
  • The pipeline is working through its configured steps

COMPLETED

What it means: Your pipeline finished successfully without any errors.

  • All tasks completed successfully
  • Data has been transferred and processed
  • The pipeline run is finished
  • You'll see a completion timestamp

What to do next: Review your destination to verify the data arrived as expected.

FAILED

What it means: Your pipeline encountered an error and stopped executing.

  • An error occurred during execution
  • The pipeline could not complete its tasks
  • An error message will be provided to help diagnose the issue
  • You'll see a failure timestamp

What to do next: Check the error message and review the troubleshooting guide or contact support if you need assistance.

Pipeline State Flow

Your pipeline follows this progression:

Pipeline Starts

RUNNING

├─→ COMPLETED ✓ (Success)

└─→ FAILED ✗ (Error occurred)

Task States

Each pipeline consists of multiple tasks (such as downloading data, transforming it, and running quality tests). Task states show you the progress of these individual steps.

RUNNING

What it means: A specific task within your pipeline is currently executing.

  • The task is actively processing
  • This is a normal part of pipeline execution
  • You'll see different tasks move through this state as the pipeline progresses

Example tasks: Download, Transform, Data Quality Tests

RETRYING

What it means: A task encountered an issue, but DataStori is automatically retrying it for you.

  • Temporary issues (like network hiccups) can occur
  • DataStori automatically retries failed tasks to ensure data reliability
  • You'll see which retry attempt is in progress
  • Retries happen after a brief delay

What to do: No action needed - the system is handling it automatically. If all retries are exhausted, you'll be notified.

COMPLETED

What it means: An individual task within your pipeline finished successfully.

  • The specific task completed its work
  • The pipeline moves on to the next task
  • Note: Individual tasks completing doesn't mean the entire pipeline is done
  • The pipeline status will show COMPLETED when all tasks finish

FAILED

What it means: A task failed after all retry attempts were exhausted.

  • The task couldn't complete despite multiple attempts
  • The entire pipeline will stop and show a FAILED status
  • An error message will explain what went wrong

What to do: Check the error details and refer to our troubleshooting guide or contact support.

Task State Flow

Individual tasks follow this progression:

Task Starts

RUNNING

├─→ COMPLETED ✓ (Success on first try)

└─→ Temporary Issue

RETRYING (automatic retry)

├─→ COMPLETED ✓ (Success on retry)

└─→ FAILED ✗ (All retries exhausted)

Understanding Automatic Retries

DataStori automatically retries failed tasks to handle temporary issues:

  • Network interruptions: Brief connectivity issues
  • Source system delays: Temporary unavailability of your data source
  • Rate limiting: When your source system asks us to slow down

Retries help ensure your data pipelines are resilient and reliable without requiring your intervention.


Data Quality Test States

DataStori can run automated data quality tests on your data to ensure it meets your standards. These tests produce their own status indicators.

PASSED

What it means: Your data passed the quality check.

  • The data meets your configured quality standards
  • No issues were detected
  • Your pipeline will continue normally

Common tests that pass:

  • No null values found in required fields
  • All records have unique identifiers
  • Data is fresh and up-to-date

WARNING

What it means: Your data passed the test, but something needs attention.

  • The data is usable but approaching a concerning threshold
  • Your pipeline will continue running
  • You may want to investigate to prevent future failures

Example: Data is 23 hours old when your warning threshold is set to 24 hours.

What to do: Review the warning details and consider if action is needed to prevent future issues.

FAILED

What it means: Your data failed a quality check.

  • The data doesn't meet your configured quality standards
  • An issue was detected that requires attention
  • Depending on your configuration, the pipeline may stop

Common test failures:

  • Null values found in required fields
  • Duplicate records detected when uniqueness is required
  • Data is too old (stale)

What to do: Review the error details to understand what quality issue was found, then check your source data or adjust your pipeline configuration.

Understanding Data Quality Tests

DataStori supports several types of automated data quality checks:

  • Null Checks: Ensures critical fields contain data
  • Uniqueness Checks: Verifies no duplicate records exist
  • Freshness Checks: Confirms your data is recent and up-to-date

You can configure these tests based on your data requirements and business needs.


Understanding State Transitions

Typical Successful Pipeline Execution

Here's what you'll see when a pipeline runs successfully:

Pipeline Starts

Pipeline: RUNNING

Download Task: RUNNING

Download Task: COMPLETED ✓

Transform Task: RUNNING

Transform Task: COMPLETED ✓

Test Task: RUNNING
├─→ Null Check: PASSED ✓
├─→ Uniqueness Check: PASSED ✓
└─→ Freshness Check: PASSED ✓

Test Task: COMPLETED ✓

Pipeline: COMPLETED ✓

Pipeline Execution with Retries

Sometimes tasks need to retry. Here's what that looks like:

Task Starts

Task: RUNNING (First Attempt)

Temporary Issue Encountered

Task: RETRYING (Attempt 2)

├─→ Success → Task: COMPLETED ✓
│ Pipeline continues...

└─→ Issue Persists

Task: RETRYING (Attempt 3)

├─→ Success → Task: COMPLETED ✓
│ Pipeline continues...

└─→ All Retries Exhausted

Task: FAILED ✗

Pipeline: FAILED ✗

Reading Your Pipeline Status

When viewing your pipeline execution, remember:

  1. Pipeline Status = Overall Status: This tells you if your entire pipeline succeeded or failed
  2. Task Statuses = Progress Indicators: These show you which steps are complete and which are in progress
  3. Test Results = Data Quality: These tell you if your data meets your quality standards

Example: You might see:

  • Pipeline: RUNNING (overall status - still in progress)
    • Download: COMPLETED ✓ (step 1 done)
    • Transform: RUNNING (step 2 in progress)
    • Tests: Not started yet

Common Scenarios

Scenario 1: Successful Pipeline Run

What you'll see:

  1. Pipeline status changes to RUNNING
  2. Each task shows RUNNING then COMPLETED as it finishes
  3. Data quality tests show PASSED results
  4. Pipeline status changes to COMPLETED

What this means: Your data was successfully transferred from source to destination, meeting all quality standards.

Scenario 2: Temporary Network Issue with Recovery

What you'll see:

  1. Pipeline status: RUNNING
  2. Download task: RUNNING
  3. Download task: RETRYING (first retry)
  4. Download task: RETRYING (second retry)
  5. Download task: COMPLETED
  6. Remaining tasks complete normally
  7. Pipeline status: COMPLETED

What this means: A temporary network issue occurred, but DataStori automatically recovered by retrying. No action needed from you.

Scenario 3: Data Quality Issue Detected

What you'll see:

  1. Pipeline status: RUNNING
  2. Download and transform tasks: COMPLETED
  3. Test task: RUNNING
  4. Null check test: FAILED (with error details)
  5. Pipeline status: FAILED

What this means: Your data was downloaded and transformed, but a quality check found an issue (e.g., missing required values). Review the error details to understand what needs to be fixed in your source data or pipeline configuration.

Scenario 4: Freshness Warning

What you'll see:

  1. Pipeline completes successfully
  2. Most tests: PASSED
  3. Freshness check: WARNING (data is older than preferred but within acceptable limits)

What this means: Your data pipeline worked, but your data might be getting stale. Consider increasing sync frequency or checking your source data update schedule.


Monitoring Your Pipelines

Where to Find State Information

Your pipeline states are visible in the DataStori dashboard:

  1. Pipeline List View: Shows the current status of all your pipelines
  2. Pipeline Detail View: Shows detailed status for each task and test within a pipeline execution
  3. Execution History: View past pipeline runs and their final states

What to Watch For

Green Indicators (All Good):

  • Pipeline: COMPLETED
  • All tasks: COMPLETED
  • All tests: PASSED

Yellow Indicators (Attention Recommended):

  • Task: RETRYING (system is handling it, but worth monitoring)
  • Test: WARNING (pipeline continues, but investigate the warning)

Red Indicators (Action Needed):

  • Pipeline: FAILED
  • Task: FAILED (after all retries)
  • Test: FAILED

Timestamps and Duration

Each state includes timing information:

  • Start time: When the pipeline or task began
  • End time: When it completed or failed
  • Duration: How long it took to execute

This helps you:

  • Understand pipeline performance
  • Identify bottlenecks
  • Plan appropriate schedule frequencies

Best Practices

Monitoring Your Pipelines

Check regularly: Review your pipeline statuses periodically to catch issues early.

Set up notifications: Consider setting up alerts for failed pipelines so you can respond quickly.

Review warnings: Don't ignore WARNING states - they often indicate issues that will become failures if not addressed.

Understanding Failures

When a pipeline fails:

  1. Read the error message: Error messages provide specific details about what went wrong
  2. Check the failed task: Identify which step in the pipeline encountered the issue
  3. Review timing: Check when the failure occurred - this can provide context
  4. Look for patterns: If a pipeline fails repeatedly at the same step, it indicates a consistent issue

Working with Retries

Automatic retries are normal: Seeing a task in RETRYING state is expected for handling temporary issues.

Persistent retries indicate problems: If you frequently see tasks requiring multiple retries, investigate the root cause:

  • Source system reliability
  • Network connectivity
  • Data volume or complexity

When to Contact Support

Reach out to DataStori support if:

  • Pipelines fail repeatedly with the same error
  • Error messages are unclear or unhelpful
  • You see unexpected state behavior
  • Retries seem excessive or never resolve
  • You need help optimizing pipeline performance

Frequently Asked Questions

Why does my pipeline show RUNNING for a long time?

This is normal for pipelines processing large amounts of data. Check the individual task states to see progress. If a specific task seems stuck, contact support.

Can I cancel a running pipeline?

Yes, you can stop a running pipeline from the dashboard. The pipeline state will update to reflect the cancellation.

What happens to my data if a pipeline fails?

DataStori ensures data consistency. If a pipeline fails:

  • Partial data may be written to your destination (depending on where the failure occurred)
  • You can safely rerun the pipeline
  • Deduplication logic ensures no data is duplicated when rerunning

How long does DataStori retry failed tasks?

DataStori automatically retries failed tasks up to 3 times by default, with a delay between attempts. If all retries are exhausted, the pipeline fails and you'll be notified.

What's the difference between a WARNING and a FAILED test?

  • WARNING: Your data passed the test but is approaching a concerning threshold. The pipeline continues running.
  • FAILED: Your data didn't meet the quality standards. Depending on your configuration, the pipeline may stop.

Can I see historical pipeline states?

Yes, the dashboard maintains a history of your pipeline executions, including their states, timing, and any errors encountered.


Summary

DataStori provides comprehensive status tracking for your data pipelines through a three-level state system:

  1. Pipeline States: Overall execution status (RUNNING → COMPLETED/FAILED)
  2. Task States: Individual step progress (RUNNING → RETRYING → COMPLETED/FAILED)
  3. Test States: Data quality results (PASSED/FAILED/WARNING)

Key Takeaways

Pipeline status is your primary indicator - it tells you if your entire data pipeline succeeded or failed

Task states show you the progress of individual steps and where issues occur

Automatic retries handle temporary issues without your intervention

Data quality tests ensure your data meets your standards before it's marked as complete

Comprehensive tracking gives you full visibility into every aspect of your pipeline execution

Next Steps

  • Monitor your pipelines: Regularly check your pipeline statuses in the dashboard
  • Configure data quality tests: Set up tests to ensure your data meets your requirements
  • Set up alerts: Get notified when pipelines fail so you can respond quickly
  • Review performance: Use timing information to optimize your pipeline schedules

For additional support, refer to our other documentation or contact the DataStori support team.