VirtuousAI

Data Extraction

Extract data from SaaS tools into your data warehouse

Data Extraction Guide

This guide walks you through extracting data from a SaaS tool (like Shopify) and loading it into your data warehouse.

Prerequisites

  • VirtuousAI account with CLI installed
  • Credentials for your source system (e.g., Shopify API key)
  • (Optional) Data warehouse connection for loading data

Overview

Data extraction in VirtuousAI follows a simple pattern:

StepDescription
ConnectCreate a connection with credentials to your source
ExtractRun a dlt_extract action to pull data
TransformData is automatically normalized to bronze schema
LoadOptionally sync to your data warehouse

Step 1: Create a Source Connection

First, create a connection to your data source. We'll use Shopify as an example:

# Using a template (recommended)
vai connections create \
  --name "Production Shopify" \
  --template-slug shopify \
  --credential api_key=shpat_xxxxx \
  --config shop_url=mystore.myshopify.com
curl -X POST https://vai-dev.virtuousai.com/api/v1/connections \
  -H "Authorization: Bearer $VAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Production Shopify",
    "templateSlug": "shopify",
    "credentials": {"api_key": "shpat_xxxxx"},
    "config": {"shop_url": "mystore.myshopify.com"}
  }'

Verify the connection works:

vai connections verify prod-shopify
# ✓ Connection verified successfully

Step 2: Run an Extraction

Now run a dlt_extract action to pull data from Shopify:

# Extract orders and products
vai extract shopify \
  --connection prod-shopify \
  --resources orders,products \
  --start-date 2026-01-01 \
  --wait

Or use the lower-level action run API:

vai actions run \
  --kind dlt_extract \
  --definition '{
    "source": "shopify",
    "resources": ["orders", "products"],
    "start_date": "2026-01-01"
  }' \
  --wait
curl -X POST https://vai-dev.virtuousai.com/api/v1/action-runs \
  -H "Authorization: Bearer $VAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "kind": "dlt_extract",
    "definition": {
      "source": "shopify",
      "connectionId": "conn_abc123",
      "resources": ["orders", "products"],
      "start_date": "2026-01-01"
    }
  }'

The extraction runs asynchronously. Use --wait to block until completion, or poll the status:

vai actions get run_abc123
# Status: completed
# Duration: 45s
# Records: 1,234 orders, 456 products

Step 3: Check the Results

View the extracted data summary:

vai actions get run_abc123 --output json | jq '.result'
{
  "tables": {
    "orders": {"rows": 1234, "bytes": 2456789},
    "order_line_items": {"rows": 5678, "bytes": 1234567},
    "products": {"rows": 456, "bytes": 789012},
    "product_variants": {"rows": 1234, "bytes": 456789}
  },
  "duration_seconds": 45,
  "extract_date": "2026-01-22T10:15:00Z"
}

Step 4: Sync to Data Warehouse (Optional)

If you have a data warehouse connection configured, sync the extracted data:

# Preview what will be synced
vai sync plan \
  --layer bronze \
  --dataset shopify

# Execute the sync
vai sync run \
  --layer bronze \
  --dataset shopify \
  --yes

Available Sources

VirtuousAI supports extraction from many sources via dlt (data load tool):

SourceResourcesIncremental
Shopifyorders, products, customers, inventoryYes
Klaviyoprofiles, events, lists, campaigns, metrics, flows, segments, tagsPartial
Salesforceaccounts, contacts, opportunities, leadsYes
HubSpotcontacts, companies, deals, ticketsYes
Stripecharges, customers, subscriptions, invoicesYes
PostgreSQLAny table via SQLYes
REST APICustom endpointsConfigurable

Each source has its own template and configuration options. Use vai templates list --type connection to see all available templates.

Incremental Extraction

For large datasets, use incremental extraction to only pull new/changed data:

vai extract shopify \
  --connection prod-shopify \
  --resources orders \
  --incremental \
  --wait

Incremental extraction:

  • Tracks the last extraction timestamp
  • Only fetches records modified since then
  • Dramatically reduces API calls and processing time

Scheduling Extractions

To run extractions on a schedule, create an automation:

vai automations create \
  --name "Daily Shopify Sync" \
  --trigger-type schedule \
  --config '{
    "schedule": "0 6 * * *",
    "action": {
      "kind": "dlt_extract",
      "definition": {
        "source": "shopify",
        "resources": ["orders", "products"],
        "incremental": true
      }
    }
  }'

This runs the extraction every day at 6 AM UTC.

Error Handling

If an extraction fails:

vai actions get run_abc123
# Status: failed
# Error: Rate limited by Shopify API

Common issues and solutions:

ErrorSolution
Rate limitedWait and retry, or reduce batch size
Authentication failedVerify credentials, check token expiration
Connection timeoutCheck network, increase timeout
Schema mismatchReview source schema changes

Retry a failed extraction:

vai actions retry run_abc123 --wait

Next Steps

On this page