Parquet Formatting
Apache Parquet is a columnar storage format optimized for analytics. FastBCP automatically uses Parquet format when the output file has a .parquet extension.
Parquet Compression
Use the --parquetcompression parameter to specify the parquet compression codec.
./FastBCP \
...
--fileoutput "orders.parquet" \
--parquetcompression Snappy \
...
Syntax:
- Long form:
--parquetcompression <algorithm>
Default: Zstd (faster and better compression ratio than others)
Available Algorithms:
None- No compressionSnappy- Fast, small compression, classicGzip- Smaller files, slowerLzo- Slower, medium compression (rarely used)Lz4- Slower, moderate compressionZstd- Best compression and Fast (default)
The Zstd (default) is best for most cases. It provides the best compression ratio with moderate speed, ideal for cloud storage to minimize storage costs and transfer time.
- So no need to set this parameter to use Zstd.
Row Group Size Configuration
This feature is available starting from FastBCP version 0.30.1.
You can control the Parquet row group size using the FASTBCP_RGSIZE environment variable. Row groups are the unit of parallelization in Parquet files, and their size can impact query performance and memory usage.
Environment Variable: FASTBCP_RGSIZE
Default Value: 1000000 (1 million rows)
Usage:
- Windows
- Linux
# Set the row group size to 500,000 rows
$env:FASTBCP_RGSIZE = "500000"
# Run FastBCP
.\FastBCP.exe --connectiontype mssql --server "localhost" ...
# Set the row group size to 500,000 rows
export FASTBCP_RGSIZE=500000
# Run FastBCP
./FastBCP --connectiontype mssql --server "localhost" ...
Complete Example
- Windows
- Linux
$env:FASTBCP_RGSIZE = "500000"
.\FastBCP.exe `
--connectiontype mssql `
--server "localhost" `
--database "sales" `
--trusted `
--query "SELECT * FROM orders WHERE OrderDate >= '2024-01-01'" `
--directory "C:\exports" `
--fileoutput "orders.parquet" `
--parquetcompression Snappy
export FASTBCP_RGSIZE=500000
./FastBCP \
--connectiontype mssql \
--server "localhost" \
--database "sales" \
--trusted \
--query "SELECT * FROM orders WHERE OrderDate >= '2024-01-01'" \
--directory "/exports" \
--fileoutput "orders.parquet" \
--parquetcompression Snappy