Data Extraction
Extract data from SaaS tools into your data warehouse
Data Extraction Guide
This guide walks you through extracting data from a SaaS tool (like Shopify) and loading it into your data warehouse.
Prerequisites
- VirtuousAI account with CLI installed
- Credentials for your source system (e.g., Shopify API key)
- (Optional) Data warehouse connection for loading data
Overview
Data extraction in VirtuousAI follows a simple pattern:
| Step | Description |
|---|---|
| Connect | Create a connection with credentials to your source |
| Extract | Run a dlt_extract action to pull data |
| Transform | Data is automatically normalized to bronze schema |
| Load | Optionally sync to your data warehouse |
Step 1: Create a Source Connection
First, create a connection to your data source. We'll use Shopify as an example:
# Using a template (recommended)
vai connections create \
--name "Production Shopify" \
--template-slug shopify \
--credential api_key=shpat_xxxxx \
--config shop_url=mystore.myshopify.comcurl -X POST https://vai-dev.virtuousai.com/api/v1/connections \
-H "Authorization: Bearer $VAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Production Shopify",
"templateSlug": "shopify",
"credentials": {"api_key": "shpat_xxxxx"},
"config": {"shop_url": "mystore.myshopify.com"}
}'Verify the connection works:
vai connections verify prod-shopify
# ✓ Connection verified successfullyStep 2: Run an Extraction
Now run a dlt_extract action to pull data from Shopify:
# Extract orders and products
vai extract shopify \
--connection prod-shopify \
--resources orders,products \
--start-date 2026-01-01 \
--waitOr use the lower-level action run API:
vai actions run \
--kind dlt_extract \
--definition '{
"source": "shopify",
"resources": ["orders", "products"],
"start_date": "2026-01-01"
}' \
--waitcurl -X POST https://vai-dev.virtuousai.com/api/v1/action-runs \
-H "Authorization: Bearer $VAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"kind": "dlt_extract",
"definition": {
"source": "shopify",
"connectionId": "conn_abc123",
"resources": ["orders", "products"],
"start_date": "2026-01-01"
}
}'The extraction runs asynchronously. Use --wait to block until completion, or poll the status:
vai actions get run_abc123
# Status: completed
# Duration: 45s
# Records: 1,234 orders, 456 productsStep 3: Check the Results
View the extracted data summary:
vai actions get run_abc123 --output json | jq '.result'{
"tables": {
"orders": {"rows": 1234, "bytes": 2456789},
"order_line_items": {"rows": 5678, "bytes": 1234567},
"products": {"rows": 456, "bytes": 789012},
"product_variants": {"rows": 1234, "bytes": 456789}
},
"duration_seconds": 45,
"extract_date": "2026-01-22T10:15:00Z"
}Step 4: Sync to Data Warehouse (Optional)
If you have a data warehouse connection configured, sync the extracted data:
# Preview what will be synced
vai sync plan \
--layer bronze \
--dataset shopify
# Execute the sync
vai sync run \
--layer bronze \
--dataset shopify \
--yesAvailable Sources
VirtuousAI supports extraction from many sources via dlt (data load tool):
| Source | Resources | Incremental |
|---|---|---|
| Shopify | orders, products, customers, inventory | Yes |
| Klaviyo | profiles, events, lists, campaigns, metrics, flows, segments, tags | Partial |
| Salesforce | accounts, contacts, opportunities, leads | Yes |
| HubSpot | contacts, companies, deals, tickets | Yes |
| Stripe | charges, customers, subscriptions, invoices | Yes |
| PostgreSQL | Any table via SQL | Yes |
| REST API | Custom endpoints | Configurable |
Each source has its own template and configuration options. Use vai templates list --type connection to see all available templates.
Incremental Extraction
For large datasets, use incremental extraction to only pull new/changed data:
vai extract shopify \
--connection prod-shopify \
--resources orders \
--incremental \
--waitIncremental extraction:
- Tracks the last extraction timestamp
- Only fetches records modified since then
- Dramatically reduces API calls and processing time
Scheduling Extractions
To run extractions on a schedule, create an automation:
vai automations create \
--name "Daily Shopify Sync" \
--trigger-type schedule \
--config '{
"schedule": "0 6 * * *",
"action": {
"kind": "dlt_extract",
"definition": {
"source": "shopify",
"resources": ["orders", "products"],
"incremental": true
}
}
}'This runs the extraction every day at 6 AM UTC.
Error Handling
If an extraction fails:
vai actions get run_abc123
# Status: failed
# Error: Rate limited by Shopify APICommon issues and solutions:
| Error | Solution |
|---|---|
| Rate limited | Wait and retry, or reduce batch size |
| Authentication failed | Verify credentials, check token expiration |
| Connection timeout | Check network, increase timeout |
| Schema mismatch | Review source schema changes |
Retry a failed extraction:
vai actions retry run_abc123 --wait