Guide: Set up MS Azure
This guide helps you set up your Microsoft Azure account on DataStori.
DataStori integrates with your MS Azure subscription using a Service Principal with specific permissions, and uses Azure Container Instances (ACI) to run pipeline code.
Information and Resources Checklist
Before you begin, please keep at hand the following information and resources from your MS Azure account.
Networking
- Virtual Network (VNet) Name: The VNet where you want to run your code.
- Subnet Name: The specific subnet within the VNet for the container instances.
- Network Security Group (NSG) Name: The security group you want to apply to the container instances running the pipeline code.
Services
- Azure Container Instances (ACI): DataStori spins up container instances to run pipelines and write the output to Azure Blob Storage. You need to define a Resource Group where these resources will be created.
- Azure Blob Storage: Data is stored here. Please be ready with the Storage Account Name and the Container Name where the data is to be stored.
- RDBMS (optional): Connection details for any relational database where pipeline output will be written, in addition to Azure Blob Storage.
Identity / Service Principal
A Service Principal is an application identity within your Azure Active Directory tenant. We will create one for DataStori and assign it a custom role with the minimum required permissions.
Step 1: Create a Service Principal
- Navigate to Azure Active Directory > App registrations > New registration.
- Give it a name, like
datastori-integration-sp. - Choose "Accounts in this organizational directory only" and click on Register.
- From the overview page, note down the Application (client) ID and Directory (tenant) ID.
- Go to Certificates & secrets, click New client secret, give it a description, and copy the Value immediately. This value will not be shown again.
Step 2 Create a Custom Role Definition
Create a custom role that grants DataStori permissions to manage ACI and access storage by using Azure CLI or by uploading a JSON file in the IAM section of your subscription.
Save the following JSON as
DataStori-Role-Definition.json. Replace<YOUR_SUBSCRIPTION_ID>with your actual subscription ID.{
"Name": "DataStori ACI Runner",
"IsCustom": true,
"Description": "Allows DataStori to run container instances and access storage.",
"Actions": [
"Microsoft.ContainerInstance/containerGroups/write",
"Microsoft.ContainerInstance/containerGroups/read",
"Microsoft.ContainerInstance/containerGroups/delete",
"Microsoft.ContainerInstance/containerGroups/start/action",
"Microsoft.Resources/subscriptions/resourceGroups/read",
"Microsoft.Storage/storageAccounts/listKeys/action"
],
"NotActions": [],
"AssignableScopes": [
"/subscriptions/<YOUR_SUBSCRIPTION_ID>"
]
}To create the role, run this Azure CLI command:
az role definition create --role-definition @DataStori-Role-Definition.json
Step 3: Assign the Custom Role
- Navigate to the Resource Group where your VNet and Storage Account are located.
- Go to Access control (IAM) > Add > Add role assignment.
- Select the "DataStori ACI Runner" role you just created.
- In the Select box, search for the
datastori-integration-spService Principal you created and click on Save.
Logging (Optional)
By default, DataStori writes the pipeline logs to Azure Monitor Logs. If you want to customize the logging destination, please share the details of your Log Analytics Workspace with ishan@datastori.io.
Summary
Please be ready with the following informaiton to complete the MS Azure infrastructure setup.
- Directory (Tenant) ID
- Application (Client) ID of the Service Principal
- Client Secret Value for the Service Principal
- Subscription ID
- Resource Group Name
- VNet Name
- Subnet Name
- Network Security Group Name
- Storage Account Name
- Storage Container Name
- Storage Account Region
- Log Analytics Workspace details (optional)