Skip to main content

Introduction

DataStori is a SaaS application to automate the ingestion and storage of data from cloud-based business applications. DataStori builds data pipelines from the user's source applications to their preferred data stores, and runs them on schedule or on-demand. DataStori is hosted in AWS US. Key features of DataStori:

  • Data Security: DataStori runs data pipelines from source applications and stores the output data in the customer's cloud. Customer data never leaves their cloud, either in processing or storage.

  • Data Sources: DataStori generates data pipelines dynamically from API documentation, emails, SFTP folders and SQL database connections.

  • Data Schema Detection: DataStori automatically defines the schema of stored data based on the API responses, CSV files or database tables from source applications.

  • Data Schema Evolution: When there are changes to source data schema, DataStori automatically modifies the destination data schema to keep it in sync with the source. It also logs the schema evolution for audit and compliance.

  • Data Transformations: The main transformations that DataStori performs are data deduplication and flattening. To ensure data deduplication and completeness, DataStori supports all types of full and incremental data load strategies.

How DataStori Works

DataStori orchestrates data pipelines in its own cloud environment, but executes them and processes user data entirely in the user's cloud. The following schematic gives a high level view of DataStori.

How it works

Supported Sources

DataStori integrates with APIs, emailed CSV files, SQL databases and SFTP folders.

  • APIs - DataStori reads OpenAPI compliant API documentation and automatically creates data pipelines from it. Live API integrations on DataStori include SAP S/4HANA, NetSuite, TempWorks, HubSpot, RingCentral, Intermedia and ServiceTitan.

  • Emails, SQL and SFTP - For applications that do not expose APIs, DataStori can create data pipelines from email attachments and SFTP folders. It can also connect to and fetch data from SQL databases.

Supported Destinations

  • AWS S3 / Azure Blob / Google Storage - By default, all data pipeline outputs are written to cloud storage in Delta format.

  • PostgreSQL / Azure SQL / MySQL / Snowflake - Data written to cloud storage (above) can be further copied to any SQL Alchemy supported database, as needed for further processing and analytics.

Data Security

DataStori is built to process and store all customer data within the customer's cloud. The fact that customer data never leaves their IT environment ensures that DataStori is always compliant with the customer's data policies such as:

  • Data sovereignty

  • Data retention

  • Data encryption