FASTBCP_RGSIZE
Overview
The FASTBCP_RGSIZE environment variable controls the row group size for Parquet file exports. Row groups are the primary unit for organizing data within a Parquet file and significantly impact query performance and compression efficiency.
Syntax
Windows
$env:FASTBCP_RGSIZE = "500000"
Linux/macOS
export FASTBCP_RGSIZE=500000
Configuration
| Property | Value |
|---|---|
| Type | Integer (> 0) |
| Default | 1000000 (1 million rows) |
| Applies to | Parquet file exports |
| Since | FastBCP 0.30 |
What is a Row Group?
A row group is a horizontal partition of data in a Parquet file:
- Parquet files are divided into one or more row groups
- Each row group contains a subset of rows
- Column data within a row group is compressed together
- Query engines can skip entire row groups based on metadata
Parquet File Structure:
├── Row Group 1 (1M rows)
│ ├── Column A (compressed)
│ ├── Column B (compressed)
│ └── Column C (compressed)
├── Row Group 2 (1M rows)
│ ├── Column A (compressed)
│ ├── Column B (compressed)
│ └── Column C (compressed)
└── Footer (metadata)