Skip to main content

Data pipelines from SaaS application providers

Data pipelines provided by SaaS applications

The ETL/ELT providers in the market provide syncing of the data between the business applications. Each of the ELT providers has its own USP and broadly these providers vary along the dimensions of pricing, security, number of connectors, services and added support for the tools to support the rest of the workflows in the data ecosystem.

However most of the above dimensions would render meaningless if the Saas applications start providing the data pipelines on their own. What's stopping the SaaS owners from providing data pipelines to sync their business data with the customer cloud. It will be a new revenue stream and will increase the user stickiness. In addition, the data pipelines provided by the app owners will be more robust and the users will have a better experience as compared to the pipelines provided by the ELT providers.

Off late we have seen the larger players move in this direction with Stripe and Salesforce being the most prominent examples:

Why application owners don't provide data pipelines as a service?

We can think of the following reasons why an app owner might not be interested in providing data pipelines as a service:

  1. Data pipelines are brittle and it takes a lot to keep them running and consistent - Retries, failures, historical loads, data quality and schema checks, etc can suddenly increase the support costs and distract the owners from their core offering.

  2. ELT pipelines is not a core business - For mid scale apps the product and engineering investment into providing ELT pipelines might not be worth it and they would rather focus on their core product.

  3. Incomplete offering and delayed/extended sale without the rest of the connectors - Let's assume users want to join Stripe and Netsuite data. The Stripe pipelines are of no use without having the Netsuite data coming in.

  4. Varied infrastructure stack and customer requirements - The customers might have their infra hosted on AWS, Azure or GCP. In addition, users can request output in various formats (CSV, Parquet, Delta, etc.) and in different destinations(Azure SQL, Redshift, BigQuery, Fabric Lakehouse, etc).

  5. Added technical debt and complexity - If ETL pipelines are added to the product suite, the core product rollouts can become slow and difficult. In addition in cases of breaking changes, the app owners would have to make sure their ETL customers are well supported.

What's next

The larger products, backed by the resources and large customer base, might take the path same as Stripe and Salesforce. For them the investment might be worth it and the data might flow back into their ecosystem (Salesforce + Tableau, Google Analytics + Google Cloud + BigQuery + Looker). However for the smaller applications, providing ELT services is still a formidable challenge.