Parallel Index Creation in TimescaleDB: A Practical Workaround

TimescaleDB (recently rebranded as TigerData) extends PostgreSQL with automated partitioning for time-series workloads through hypertables, which split large tables into time-based chunks. While this partitioning delivers significant query performance benefits, TimescaleDB's index management has a fundamental limitation: it creates indexes sequentially across chunks rather than exploiting the parallelization opportunities that partitioning naturally provides.

TL;DR

TimescaleDB builds indexes sequentially across chunks, turning what should be minute-long operations into hour-long bottlenecks. The workaround bypasses TimescaleDB's CREATE INDEX entirely:

The Performance Bottleneck

When creating an index on a hypertable, TimescaleDB must create corresponding indexes on each underlying chunk. The current implementation iterates through chunks and executes index creation statements sequentially, one chunk at a time. This approach fails to leverage the fundamental advantage of partitioned data: independent chunks can have their indexes built simultaneously without interfering with each other.

This sequential limitation becomes particularly problematic during data migrations, schema changes, or initial data loads. Large deployments with hundreds of chunks can spend hours building indexes that could complete in minutes with proper parallelization and hardware. For organizations managing terabyte-scale deployments, this represents a significant operational burden that directly impacts maintenance windows and system availability.

Solution Overview

The workaround avoids TimescaleDB's sequential index creation entirely by operating directly on individual chunks rather than the hypertable. Instead of using the standard CREATE INDEX on the hypertable, which triggers sequential processing, the approach disables existing indexes during bulk loading, freezes the chunk structure using TimescaleDB's restore mode, then discovers all chunks and creates their indexes concurrently. The parallel-created indexes are subsequently registered in TimescaleDB's catalog to maintain system consistency, effectively achieving the same end result as the built-in mechanism but with dramatically improved performance.

Implementation Details

Disabling Indexes

Standard bulk loading practice involves dropping indexes before data insertion to eliminate update overhead, then recreating them afterward. However, this approach conflicts with the proposed solution, which requires the original index definitions to remain accessible in PostgreSQL's catalog. The key insight is to avoid calling CREATE INDEX directly on the hypertable after data loading, as this would trigger TimescaleDB's sequential index creation process.

The solution uses PostgreSQL's internal indislive flag to disable indexes without removing their definitions. This approach provides the same performance benefits as dropped indexes while preserving the structural information:

UPDATE pg_index SET indislive = FALSE
WHERE indexrelid = 'your_index_name'::regclass

When indislive is false, PostgreSQL ignores the indexes completely during data modification operations (see pg_index catalog documentation). Crucially, preserving these index definitions is essential because TimescaleDB uses the hypertable as a template when creating new chunks.

Freezing Chunk Structure

After bulk data loading completes, the process must prevent any structural changes during index creation. TimescaleDB's restore mode provides this capability:

SELECT timescaledb_pre_restore()

This call guarantees that no new chunks will be created while indexes are being built. Without this guarantee, the parallel creation process could miss newly created chunks or encounter race conditions between chunk discovery and index creation. The restore mode functions are documented in the TimescaleDB administration API.

Discovering Chunk Architecture

With the chunk structure frozen, the process can safely query TimescaleDB's metadata catalog to identify all chunks belonging to each hypertable:

SELECT ck.id, ck.hypertable_id, ck.schema_name, ck.table_name
FROM _timescaledb_catalog.hypertable ht
JOIN _timescaledb_catalog.chunk ck ON ht.id = ck.hypertable_id
WHERE ht.schema_name = 'public'
AND ht.table_name = 'your_hypertable_name'

This metadata query provides the foundation for transforming hypertable index definitions into chunk-specific index statements.

Parallel Index Creation

The core technique involves creating indexes on all chunks simultaneously rather than sequentially. For each index on the hypertable, the process generates corresponding index creation statements for every chunk by transforming the original index definition.

The transformation accounts for the naming differences between hypertables and their underlying chunk tables:

chunk_fqname = f"{chunk_schema}.{chunk_table_name}"
chunk_index_name = f"{chunk_table_name}_{original_index_name}"
chunk_index_sql = (
    original_index_sql
    .replace(original_index_name, chunk_index_name)
    .replace('public.your_hypertable_name', chunk_fqname)
)

Each transformed statement creates an index on a specific chunk table rather than the hypertable. These statements can execute concurrently since they operate on independent table partitions.

Catalog Integration

TimescaleDB requires chunk indexes to be registered in its internal catalog for proper hypertable management. The parallel creation process must populate _timescaledb_catalog.chunk_index with metadata for each created chunk index:

INSERT INTO _timescaledb_catalog.chunk_index (
    chunk_id, index_name, hypertable_id, hypertable_index_name
) VALUES (%s, %s, %s, %s)

This registration step links chunk-specific indexes back to their hypertable counterparts. Without proper catalog integration, TimescaleDB cannot recognize or manage the indexes correctly, potentially leading to inconsistent behavior during future operations.

Re-enabling Indexes

After successfully creating all chunk indexes, the process must re-enable the hypertable indexes that were disabled before data ingestion:

UPDATE pg_index SET indislive = TRUE
WHERE indexrelid = 'your_index_name'::regclass

This step is essential because TimescaleDB relies on these hypertable index definitions as templates when creating new chunks in the future. Without re-enabling them, newly created chunks would lack the required indexes.

Returning to Operational Mode

SELECT timescaledb_post_restore()

This call finalizes the process and re-enables normal TimescaleDB functionality, including the ability to insert new data and allocate new chunks as needed.

Example Implementation

The techniques described were developed as part of the OpenStreetMap-NG project, where bulk loading of geographical data highlighted TimescaleDB's sequential index creation bottleneck. The project required importing massive datasets with complex indexes, making the performance difference between sequential and parallel creation particularly acute.

Conclusion

This workaround highlights an area where TimescaleDB could benefit from native parallel index support. The underlying PostgreSQL infrastructure already supports parallel index creation within individual tables, and the partitioned nature of hypertables provides natural parallelization boundaries.

Until such support arrives, this approach offers a practical solution for organizations managing large TimescaleDB deployments where index creation performance is critical. The implementation demonstrates how understanding and carefully manipulating underlying database structures can overcome limitations in higher-level abstractions.