BigQuery Integration: Storing Your Event Data
BigQuery is your data warehouse for all server-side tracking events. Here's how to set it up and use it effectively.
Why BigQuery?
Benefits
- Unlimited storage
- Fast queries
- Advanced analytics
- Integration with other tools
- Cost-effective
- Google ecosystem integration
Use Cases
- Store all events for analysis
- Create custom reports
- Build data models
- Integrate with BI tools
- Long-term data retention
BigQuery Setup
Step 1: Create Dataset
CREATE SCHEMA IF NOT EXISTS `your-project.tracking_data`
OPTIONS(
description="Server-side tracking event data",
location="EU" -- or US, based on your needs
);
Step 2: Create Event Table
CREATE TABLE IF NOT EXISTS `your-project.tracking_data.events` (
event_name STRING NOT NULL,
event_timestamp TIMESTAMP NOT NULL,
user_id STRING,
session_id STRING,
page_url STRING,
page_title STRING,
referrer STRING,
utm_source STRING,
utm_medium STRING,
utm_campaign STRING,
utm_term STRING,
utm_content STRING,
device_type STRING,
browser STRING,
os STRING,
country STRING,
city STRING,
event_data JSON,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP()
)
PARTITION BY DATE(event_timestamp)
CLUSTER BY event_name, user_id;
Step 3: Configure GTM to Stream to BigQuery
- Add BigQuery destination in GTM
- Configure service account
- Set up streaming inserts
- Test data flow
Data Schema Design
Core Fields
- event_name: Type of event (page_view, form_submit, purchase)
- event_timestamp: When event occurred
- user_id: Anonymous user identifier
- session_id: Session identifier
- page_url: URL where event occurred
- utm_*: Campaign tracking parameters
Extended Fields
- event_data: JSON with event-specific data
- device_info: Device type, browser, OS
- location: Country, city
- custom_dimensions: Business-specific data
Querying BigQuery Data
Example Queries
1. Daily Event Count:
SELECT
DATE(event_timestamp) as date,
event_name,
COUNT(*) as event_count
FROM `your-project.tracking_data.events`
WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY date, event_name
ORDER BY date DESC, event_count DESC;
2. Conversion Funnel:
SELECT
COUNT(DISTINCT CASE WHEN event_name = 'page_view' THEN user_id END) as visitors,
COUNT(DISTINCT CASE WHEN event_name = 'form_submit' THEN user_id END) as form_submits,
COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN user_id END) as purchases
FROM `your-project.tracking_data.events`
WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY);
3. Attribution Analysis:
SELECT
utm_source,
utm_medium,
utm_campaign,
COUNT(DISTINCT user_id) as users,
COUNT(DISTINCT CASE WHEN event_name = 'purchase' THEN user_id END) as conversions
FROM `your-project.tracking_data.events`
WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY utm_source, utm_medium, utm_campaign
ORDER BY conversions DESC;
Data Retention & Partitioning
Partitioning Strategy
- Partition by date (event_timestamp)
- Enables efficient queries
- Automatic data management
- Cost optimization
Data Retention
- Set retention policy (e.g., 2 years)
- Archive old data if needed
- Comply with GDPR requirements
- Optimize costs
Cost Optimization
BigQuery Pricing
- Storage: $0.02/GB/month
- Queries: $5/TB scanned
- Streaming: $0.01/200MB
Cost Optimization Tips
- Use partitioning (reduces scan costs)
- Use clustering (improves query performance)
- Limit data retention
- Optimize queries
- Use scheduled queries for reports
Next Step: Learn about Platform Integrations to connect with GA4, Meta, and LinkedIn.