Release Notes 0.30
New Features
Timepartition Parallel Method
Added a new parallel method Timepartition for time-based data partitioning. This method uses a date/datetime column to partition data into time-based chunks, allowing efficient parallel export based on temporal ranges.
Key features:
- Ideal for time-series data and large temporal datasets
- Supports multiple time granularities: year, month, week, and day
- Each parallel thread exports data for a specific time period
Distribute Key Column Format:
The --distributekeycolumn parameter must be specified in a special format when using Timepartition:
(datecolumn, year, month)- Partition by year and month(datecolumn, year, month, day)- Partition by year, month, and day(datecolumn, year)- Partition by year only(datecolumn, year, week)- Partition by year and week
Example:
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "tpch_copy" `
--user "FastUser" `
--password "FastPassword" `
--sourceschema "tpch_10" `
--sourcetable "orders_date_sorted" `
--directory "D:\temp\TestPartition" `
--fileoutput "orders.csv" `
--method "Timepartition" `
--distributekeycolumn "(o_orderdate,year,month)" `
--paralleldegree 16 `
--merge "False"
See Parallel Parameters for complete details.
Parquet Row Group Size Configuration
This feature is available starting from FastBCP version 0.30.1.
Added support for configuring Parquet row group size through the FASTBCP_RGSIZE environment variable. This allows fine-tuning of Parquet file structure for optimal query performance and compression.
Key features:
- Control row group size for Parquet exports
- Default value: 1,000,000 rows
- Configurable via environment variable
- Impacts parallelization and compression efficiency
Usage:
- Windows
- Linux
# Set the row group size to 500,000 rows
$env:FASTBCP_RGSIZE = "500000"
# Run FastBCP
.\FastBCP.exe --connectiontype mssql --server "localhost" ...
# Set the row group size to 500,000 rows
export FASTBCP_RGSIZE=500000
# Run FastBCP
./FastBCP --connectiontype mssql --server "localhost" ...
See Parquet Formatting for complete details.
Config File Parameter
Added the --config parameter to support YAML configuration files. This provides a more structured and readable way to manage export configurations compared to JSON settings files.
Key benefits:
- Human-friendly YAML format with comment support
- Structured sections: connection, source, output, performance, logging
- Version control friendly
- Simplifies complex command-lines to a single parameter
YAML Configuration Sections:
- connection: type, server, database, trusted
- source: schema, table
- output: file, directory, delimiter, encoding
- performance: method, degree, distribute_key_column, merge
- logging: run_id
Example:
# FastBCP – MSSQL to CSV using RangeId parallel method
connection:
type: mssql
server: localhost
database: tpch10_collation_bin2
trusted: true
source:
schema: dbo
table: orders
output:
file: "mssql_orders_{startdate}.csv"
directory: 'D:\temp\{database}\{schema}\{table}\full\'
delimiter: "|"
decimal_separator: "."
date_format: "yyyy-MM-dd HH:mm:ss"
encoding: UTF-8
performance:
method: RangeId
degree: -2
distribute_key_column: o_orderkey
merge: false
logging:
run_id: mssql_to_csv_parallel-2_rangeid
Usage:
.\FastBCP.exe --config samples\sample_mssql_to_csv.yaml
See Advanced Parameters for complete details.