CockroachDB Schema Design Rule
This rule provides comprehensive guidance for designing efficient schemas in CockroachDB's distributed environment. It covers primary key selection, partitioning strategies, index optimization, and data distribution patterns that maximize performance while avoiding common pitfalls like hotspots and range splits.
# CockroachDB Schema Design Rule
## Overview
This rule provides comprehensive guidance for designing efficient schemas in CockroachDB's distributed environment. It covers primary key selection, partitioning strategies, index optimization, and data distribution patterns that maximize performance while avoiding common pitfalls like hotspots and range splits.
## Implementation
### 1. Primary Key Design Patterns
#### Avoid Sequential Primary Keys
```sql
-- Bad: Sequential primary key causes hotspots
CREATE TABLE orders_bad (
    id SERIAL PRIMARY KEY,
    customer_id INT,
    order_date TIMESTAMP,
    total DECIMAL(10,2)
);
-- Good: UUID primary key for distributed writes
CREATE TABLE orders_good (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id INT,
    order_date TIMESTAMP,
    total DECIMAL(10,2)
);
-- Better: Composite primary key for natural distribution
CREATE TABLE orders_better (
    customer_id INT,
    order_date DATE,
    order_id UUID DEFAULT gen_random_uuid(),
    total DECIMAL(10,2),
    PRIMARY KEY (customer_id, order_date, order_id)
);
```
#### Time-Based Primary Keys
```sql
-- Good: Time-based UUID for chronological ordering
CREATE TABLE events (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_time TIMESTAMPTZ DEFAULT NOW(),
    event_type STRING,
    payload JSONB,
    -- Index for time-based queries
    INDEX idx_events_time (event_time)
);
-- Better: Combine time with hash for distribution
CREATE TABLE events_distributed (
    shard_id INT DEFAULT (random() * 100)::INT,
    event_time TIMESTAMPTZ DEFAULT NOW(),
    event_id UUID DEFAULT gen_random_uuid(),
    event_type STRING,
    payload JSONB,
    PRIMARY KEY (shard_id, event_time, event_id)
);
```
### 2. Partitioning Strategies
#### Range Partitioning
```sql
-- Partition by date ranges
CREATE TABLE sales (
    sale_id UUID DEFAULT gen_random_uuid(),
    sale_date DATE,
    customer_id INT,
    amount DECIMAL(10,2),
    region STRING,
    PRIMARY KEY (sale_date, sale_id)
) PARTITION BY RANGE (sale_date) (
    PARTITION p_2024_q1 VALUES FROM ('2024-01-01') TO ('2024-04-01'),
    PARTITION p_2024_q2 VALUES FROM ('2024-04-01') TO ('2024-07-01'),
    PARTITION p_2024_q3 VALUES FROM ('2024-07-01') TO ('2024-10-01'),
    PARTITION p_2024_q4 VALUES FROM ('2024-10-01') TO ('2025-01-01')
);
-- Add future partitions
ALTER TABLE sales ADD PARTITION p_2025_q1 VALUES FROM ('2025-01-01') TO ('2025-04-01');
```
#### Hash Partitioning
```sql
-- Partition by hash for even distribution
CREATE TABLE user_sessions (
    session_id UUID DEFAULT gen_random_uuid(),
    user_id INT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    last_activity TIMESTAMPTZ,
    session_data JSONB,
    PRIMARY KEY (user_id, session_id)
) PARTITION BY HASH (user_id) (
    PARTITION p0,
    PARTITION p1,
    PARTITION p2,
    PARTITION p3,
    PARTITION p4,
    PARTITION p5,
    PARTITION p6,
    PARTITION p7
);
```
#### List Partitioning
```sql
-- Partition by specific values (regions, categories)
CREATE TABLE products (
    product_id UUID DEFAULT gen_random_uuid(),
    category STRING,
    name STRING,
    price DECIMAL(10,2),
    region STRING,
    PRIMARY KEY (region, product_id)
) PARTITION BY LIST (region) (
    PARTITION p_us VALUES IN ('us-east', 'us-west'),
    PARTITION p_eu VALUES IN ('eu-west', 'eu-central'),
    PARTITION p_asia VALUES IN ('asia-pacific', 'asia-southeast')
);
```
### 3. Index Optimization
#### Covering Indexes
```sql
-- Create covering indexes to avoid key lookups
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    email STRING UNIQUE,
    name STRING,
    city STRING,
    country STRING,
    created_at TIMESTAMPTZ
);
-- Covering index for common query patterns
CREATE INDEX idx_customers_location_covering 
ON customers (country, city) 
STORING (name, email, created_at);
-- Query benefits from covering index
SELECT name, email, created_at 
FROM customers 
WHERE country = 'USA' AND city = 'San Francisco';
```
#### Partial Indexes
```sql
-- Create partial indexes for filtered queries
CREATE TABLE orders (
    order_id UUID PRIMARY KEY,
    customer_id INT,
    status STRING,
    order_date TIMESTAMPTZ,
    total DECIMAL(10,2)
);
-- Partial index for active orders only
CREATE INDEX idx_orders_active 
ON orders (customer_id, order_date) 
WHERE status IN ('pending', 'processing');
-- Partial index for high-value orders
CREATE INDEX idx_orders_high_value 
ON orders (order_date DESC) 
WHERE total > 1000.00;
```
#### Expression Indexes
```sql
-- Index on computed expressions
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    email STRING,
    first_name STRING,
    last_name STRING,
    created_at TIMESTAMPTZ
);
-- Index on computed full name
CREATE INDEX idx_users_full_name 
ON users ((first_name || ' ' || last_name));
-- Index on extracted date part
CREATE INDEX idx_users_created_month 
ON users (EXTRACT(MONTH FROM created_at));
```
### 4. Data Distribution Patterns
#### Avoiding Hotspots
```sql
-- Bad: Creates hotspots on single node
CREATE TABLE metrics_bad (
    id SERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ DEFAULT NOW(),
    metric_name STRING,
    value FLOAT,
    tags JSONB
);
-- Good: Distributed primary key
CREATE TABLE metrics_good (
    metric_name STRING,
    timestamp TIMESTAMPTZ,
    instance_id STRING,
    value FLOAT,
    tags JSONB,
    PRIMARY KEY (metric_name, timestamp, instance_id)
);
-- Better: Hash-distributed with time component
CREATE TABLE metrics_better (
    shard_key INT DEFAULT (random() * 1000)::INT,
    metric_name STRING,
    timestamp TIMESTAMPTZ DEFAULT NOW(),
    instance_id STRING,
    value FLOAT,
    tags JSONB,
    PRIMARY KEY (shard_key, metric_name, timestamp, instance_id)
);
```
#### Multi-Region Schema Design
```sql
-- Configure database for multi-region
ALTER DATABASE myapp SET PRIMARY REGION 'us-east1';
ALTER DATABASE myapp ADD REGION 'us-west1';
ALTER DATABASE myapp ADD REGION 'eu-west1';
-- Regional table for user data
CREATE TABLE users_regional (
    user_id UUID PRIMARY KEY,
    email STRING UNIQUE,
    name STRING,
    region STRING,
    created_at TIMESTAMPTZ
) LOCALITY REGIONAL BY ROW AS region;
-- Global table for reference data
CREATE TABLE countries (
    country_code STRING PRIMARY KEY,
    country_name STRING,
    currency STRING
) LOCALITY GLOBAL;
-- Regional by table for localized content
CREATE TABLE user_preferences (
    user_id UUID PRIMARY KEY,
    preferences JSONB,
    language STRING,
    timezone STRING
) LOCALITY REGIONAL BY TABLE IN 'us-east1';
```
### 5. JSON and Semi-Structured Data
#### JSONB Schema Design
```sql
-- Efficient JSONB column design
CREATE TABLE events (
    event_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    event_type STRING,
    event_data JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    -- Index on JSONB fields
    INDEX idx_events_type (event_type),
    INDEX idx_events_user_id ((event_data->>'user_id')),
    INDEX idx_events_timestamp ((event_data->>'timestamp'))
);
-- GIN index for full JSONB search
CREATE INVERTED INDEX idx_events_data_gin ON events USING GIN (event_data);
-- Computed columns for frequently accessed JSON fields
ALTER TABLE events ADD COLUMN user_id INT AS ((event_data->>'user_id')::INT) STORED;
CREATE INDEX idx_events_user_id_computed ON events (user_id);
```
#### Array and Nested Data
```sql
-- Design for array columns
CREATE TABLE products (
    product_id UUID PRIMARY KEY,
    name STRING,
    tags STRING[],
    categories INT[],
    metadata JSONB
);
-- Index on array elements
CREATE INVERTED INDEX idx_products_tags ON products USING GIN (tags);
CREATE INVERTED INDEX idx_products_categories ON products USING GIN (categories);
-- Query patterns for arrays
SELECT * FROM products WHERE 'electronics' = ANY(tags);
SELECT * FROM products WHERE categories @> ARRAY[1, 2];
```
### 6. Temporal Data Patterns
#### Time-Series Data Design
```sql
-- Time-series table with proper partitioning
CREATE TABLE sensor_data (
    sensor_id INT,
    timestamp TIMESTAMPTZ,
    reading_type STRING,
    value DECIMAL(10,4),
    metadata JSONB,
    PRIMARY KEY (sensor_id, timestamp, reading_type)
) PARTITION BY RANGE (timestamp) (
    PARTITION p_2024_01 VALUES FROM ('2024-01-01') TO ('2024-02-01'),
    PARTITION p_2024_02 VALUES FROM ('2024-02-01') TO ('2024-03-01'),
    PARTITION p_2024_03 VALUES FROM ('2024-03-01') TO ('2024-04-01')
);
-- TTL for automatic data cleanup
ALTER TABLE sensor_data SET (ttl_expire_after = '90 days');
```
#### Change Data Capture Schema
```sql
-- Audit table design
CREATE TABLE audit_log (
    log_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    table_name STRING,
    operation STRING,
    record_id STRING,
    old_values JSONB,
    new_values JSONB,
    changed_by STRING,
    changed_at TIMESTAMPTZ DEFAULT NOW(),
    INDEX idx_audit_table_time (table_name, changed_at),
    INDEX idx_audit_record (record_id)
);
-- Trigger for change tracking
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, operation, record_id, old_values, new_values, changed_by)
        VALUES (TG_TABLE_NAME, 'UPDATE', OLD.id::STRING, row_to_json(OLD), row_to_json(NEW), current_user);
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
```
## Best Practices
### 1. Primary Key Selection
- **Avoid SERIAL/SEQUENCE**: Use UUID or composite keys for distributed writes
- **Consider query patterns**: Design primary keys around most common access patterns
- **Balance distribution**: Ensure even data distribution across nodes
- **Include time component**: For time-based queries, include timestamp in primary key
### 2. Index Strategy
- **Create covering indexes**: Include frequently selected columns in STORING clause
- **Use partial indexes**: Filter indexes for specific query conditions
- **Monitor index usage**: Regularly check and remove unused indexes
- **Consider inverted indexes**: Use GIN indexes for JSONB and array columns
### 3. Partitioning Guidelines
- **Partition large tables**: Use partitioning for tables > 100GB
- **Align with query patterns**: Partition on columns used in WHERE clauses
- **Automate partition management**: Create scripts for adding/dropping partitions
- **Monitor partition sizes**: Keep partitions reasonably sized (< 64GB)
### 4. Data Types and Constraints
- **Use appropriate data types**: Choose minimal types that fit your data
- **Add constraints**: Use CHECK constraints for data validation
- **Consider collation**: Use appropriate collation for string comparisons
- **Use ENUM types**: For fixed sets of values
### 5. Multi-Region Considerations
- **Choose appropriate locality**: REGIONAL BY ROW for user data, GLOBAL for reference data
- **Consider latency**: Place data close to users
- **Plan for consistency**: Understand consistency guarantees across regions
- **Monitor cross-region traffic**: Minimize expensive cross-region operations
## Common Issues
### 1. Hotspot Detection
**Problem**: Single node handling all writes
**Solution**: 
```sql
-- Check for range hotspots
SELECT 
    range_id,
    start_key,
    end_key,
    lease_holder,
    replica_localities
FROM crdb_internal.ranges 
WHERE table_name = 'your_table'
ORDER BY range_id;
-- Monitor query distribution
SELECT 
    node_id,
    count(*) as query_count,
    sum(service_lat) as total_latency
FROM crdb_internal.node_statement_statistics
GROUP BY node_id
ORDER BY query_count DESC;
```
### 2. Large Row Issues
**Problem**: Rows exceeding recommended size (64KB)
**Solution**: 
```sql
-- Check row sizes
SELECT 
    table_name,
    avg_size,
    max_size
FROM crdb_internal.table_row_statistics
WHERE max_size > 65536
ORDER BY max_size DESC;
-- Normalize large columns
CREATE TABLE documents (
    doc_id UUID PRIMARY KEY,
    title STRING,
    summary STRING,
    created_at TIMESTAMPTZ
);
CREATE TABLE document_content (
    doc_id UUID REFERENCES documents(doc_id),
    content_type STRING,
    content BYTES,
    PRIMARY KEY (doc_id, content_type)
);
```
### 3. Index Bloat
**Problem**: Unused or redundant indexes consuming space
**Solution**: 
```sql
-- Find unused indexes
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes 
WHERE idx_tup_read = 0 AND idx_tup_fetch = 0;
-- Check index sizes
SELECT 
    schemaname,
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes 
ORDER BY pg_relation_size(indexrelid) DESC;
```
### 4. Partition Pruning Issues
**Problem**: Queries scanning all partitions
**Solution**: 
```sql
-- Verify partition pruning
EXPLAIN (VERBOSE) 
SELECT * FROM sales 
WHERE sale_date >= '2024-01-01' AND sale_date < '2024-02-01';
-- Ensure partition key in WHERE clause
SELECT * FROM sales 
WHERE sale_date = '2024-01-15'  -- Good: uses partition key
  AND customer_id = 123;
-- Avoid queries without partition key
SELECT * FROM sales 
WHERE customer_id = 123;  -- Bad: scans all partitions
```
### 5. Foreign Key Performance
**Problem**: Foreign key constraints causing contention
**Solution**: 
```sql
-- Consider removing foreign keys in high-throughput scenarios
ALTER TABLE orders DROP CONSTRAINT fk_orders_customer;
-- Use application-level validation instead
CREATE TABLE orders (
    order_id UUID PRIMARY KEY,
    customer_id INT, -- No foreign key constraint
    order_date TIMESTAMPTZ,
    total DECIMAL(10,2)
);
-- Add validation in application code
func ValidateCustomerExists(customerID int) error {
    var exists bool
    err := db.QueryRow("SELECT EXISTS(SELECT 1 FROM customers WHERE id = $1)", customerID).Scan(&exists)
    if err != nil {
        return err
    }
    if !exists {
        return fmt.Errorf("customer %d does not exist", customerID)
    }
    return nil
}
```
## Schema Evolution
### Safe Schema Changes
```sql
-- Add column (safe)
ALTER TABLE users ADD COLUMN phone STRING;
-- Add index concurrently (safe)
CREATE INDEX CONCURRENTLY idx_users_phone ON users (phone);
-- Add constraint (requires validation)
ALTER TABLE users ADD CONSTRAINT chk_phone_format 
CHECK (phone ~ '^[0-9-+()]+$');
-- Drop column (potentially unsafe)
ALTER TABLE users DROP COLUMN old_field;
```
### Migration Scripts
```sql
-- Example migration script
BEGIN;
  -- Create new table structure
  CREATE TABLE users_new (
    user_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email STRING UNIQUE,
    name STRING,
    phone STRING,
    created_at TIMESTAMPTZ DEFAULT NOW()
  );
  
  -- Copy data from old table
  INSERT INTO users_new (user_id, email, name, created_at)
  SELECT user_id, email, name, created_at FROM users;
  
  -- Rename tables
  ALTER TABLE users RENAME TO users_old;
  ALTER TABLE users_new RENAME TO users;
  
  -- Update application code here
  
  -- Drop old table after verification
  -- DROP TABLE users_old;
COMMIT;
```
Created: 6/1/2025
Keywords: text snippets, slack for ai prompts, slack for ai, AI consulting, AI Cheat Tool, AI Cheat Tool for developers, AI Cheat Tool for AI, AI Cheat Tool for ChatGPT, chatgpt prompt generator, AI Cheat Tool for email, AI Cheat Tool for text, AI Cheat Tool for keyboard shortcuts, AI Cheat Tool for text expansion, AI Cheat Tool for text snippets, AI Cheat Tool for text replacement, AI Cheating Tool, AI Cheating Tool for developers, AI Cheating Tool for AI, AI Cheating Tool for ChatGPT, AI Cheating Tool for email, AI Cheating Tool for text, AI Cheating Tool for keyboard shortcuts, prompt cheating, AI prompt engineering, AI context engineering, context engineering, ai prompt manager, AI prompt manager, AI prompt management, ai consulting, prompt engineering consulting, generative ai consulting, ai implementation services, llm integration consultants, ai strategy for enterprises, enterprise ai transformation, ai prompt optimization, large language model consulting, ai training for teams, ai workflow automation, build ai knowledge base, llm prompt management, ai prompt infrastructure, ai adoption consulting, enterprise ai onboarding, custom ai workflow design, ai integration for dev teams, ai productivity tools, team prompt collaboration, github gists, github snippets, github code snippets, github code snippets automation, github, text expansion, text automation, snippet manager, code snippets, team collaboration tools, shared snippets, snippet sharing, keyboard shortcuts, productivity tools, workflow automation, AI-powered productivity, snippet tool for teams, team knowledge base, AI text completion, text expander for teams, snippet collaboration, multi-platform productivity, custom keyboard shortcuts, snippet sharing platform, collaborative snippet management, knowledge base automation, team productivity software, business productivity tools, snippet management software, quick text input, macOS productivity apps, Windows productivity tools, Linux productivity tools, cloud-based snippets, cross-platform snippets, team workspace tools, workflow enhancement tools, automation tools for teams, text automation software, team knowledge sharing, task automation, integrated team tools, real-time collaboration, AI for team productivity, business text automation, time-saving tools, clipboard manager, multi-device clipboard, keyboard shortcut manager, team communication tools, project management integration, productivity boost AI, text snippet sharing, text replacement software, text management tools, efficient team collaboration, AI workspace tools, modern productivity apps, custom text automation, digital workspace tools, collaborative workspaces, cloud productivity tools, streamline team workflows, smart text management, snippets AI app, snippet management for teams, shared knowledge platforms, team-focused text automation, team productivity platform, AI text expansion tools, snippet taking app, note taking app, note taking software, note taking tools, note taking app for teams, note taking app for developers, note taking app for AI, note taking app for ChatGPT, snippet software, snippet tools, snippet app for teams, snippet app for developers, snippet app for AI, snippet app for ChatGPT, AI agent builder, AI agent snippets, AI agent prompts, prompt management, prompt engineering, ChatGPT snippets, ChatGPT prompts, AI prompt optimization, AI-powered prompts, prompt libraries for AI, prompt sharing for ChatGPT, GPT productivity tools, AI assistant snippets, ChatGPT integrations, custom AI prompts, AI agent workflows, machine learning snippets, automated AI prompts, AI workflow automation, collaborative AI prompts, personalized AI agents, text snippets for ChatGPT, AI prompt creation tools, AI code snippet manager, GPT-4 text automation, AI-powered writing assistants, AI tools for developers, AI agent integrations, developer prompt snippets, AI text generation workflows, AI-enhanced productivity, GPT prompt sharing tools, team collaboration for AI, openAI integrations, text automation for AI teams, AI-powered collaboration tools, GPT-4 team tools, AI-driven text expanders, AI-driven productivity solutions, AI agent for email writing, AI agent for text expansion, AI agent for text automation, AI agent for text snippets, AI agent for text replacement, AI agent for keyboard shortcuts, AI Agent Developer, Prompt engineering, Machine Learning Engineer, AI Engineer, Customer Support, Code snippets for developers, Recruiting, AI agent for automation, AI agent for AI automation, AI agent for ChatGPT automation, AI agent for email automation, electron app for snippets, desktop snippet manager, code snippet organization, AI prompt repository, intelligent text expansion, vibe coding, Claude cli ai prompts, prompt optimizer, buy prompts, sell prompts, snippets store, sell scripts, buy scripts, buy python scripts, scraping scripts, AI prompt marketplace, ChatGPT prompt marketplace, best AI prompts, best ChatGPT prompts, AI prompt database, AI prompt packs, AI prompt bundles, GPT prompt marketplace, prompt engineering masterclass, prompt engineering certification, prompt engineering course, ChatGPT prompt store, AI prompt store, prompt monetization, sell AI prompts, buy AI prompts, prompt marketplace platform, AI prompt plugins, Claude prompt marketplace, AI prompt subscription, Custom GPT, real-time prompt collaboration, developer workflow optimization, team prompt library, knowledge management for developers, code snippet search, searchable code library, reusable code blocks, prompt engineering tools, prompt template management, collaborative coding, cross-team knowledge sharing, code snippet versioning, AI prompt templates, technical documentation tools, developer productivity suite, team snippet repository, AI prompt history, snippet synchronization, cloud snippet backup, markdown snippet support, syntax highlighting for snippets, code categorization, programming language snippets, language-specific code templates, contextual code suggestions, snippets with AI integration, command palette for snippets, code snippet folder organization, team snippet discovery, private and public snippets, enterprise code management, team codebase documentation, prompt engineering best practices, Vibe Coding, Vibe Coding for developers, Vibe Coding for AI, Vibe Coding for ChatGPT, Vibe Coding for email, Vibe Coding for text, Vibe Coding for keyboard shortcuts, Vibe Coding for text expansion, Vibe Coding for text snippets, Vibe Coding for text replacement, free prompt generator, ai prompt generator, prompt generator, promptlayer, promptimize ai, langchain prompt management, lanhsmith prompt management, latitude, langchain, langgraph, langchain documentation, raycast, text expander, raycast snippets, raycast mac, cursor, cursro ai, cursor snippets, cursor rules, cursor ai rules, learn prompting, how to prompt, prompting guide, prompting tutorials, best prompting practices, ai prompt best practices, prompting techniques, prompting, gpt, go, rest, ios, performance, react, python, logging, accessibility, typescript, openai, git, jest, pwa, spa, node, express, electron
AI Prompts, ChatGPT, Code Snippets, Prompt Engineering