Advanced Scenarios
Advanced use cases, partition functions, and troubleshooting examples.
SQL Server Partition Functions
Use SQL Server partition functions for efficient parallel exports.
Using Partition Function with DataDriven
Leverage existing partition functions to distribute work:
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "tpch10_collation_bin2" `
--trusted `
--sourceschema "dbo" `
--sourcetable "orders_part" `
--directory "D:\temp" `
--fileoutput "orders_partitioned.csv" `
--decimalseparator "." `
--delimiter "|" `
--dateformat "yyyy-MM-dd HH:mm:ss" `
--encoding "UTF-8" `
--parallelmethod "DataDriven" `
--distributekeycolumn "`$partition.PF_DATE(o_orderdate)" `
--paralleldegree 10 `
--merge "False"
Use `$partition.FunctionName(ColumnName) to reference a partition function. The backtick escapes the dollar sign in PowerShell.
Date-based Partitioning
Partition by date ranges for time-series data using database partitions:
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "TimeSeriesDB" `
--trusted `
--sourceschema "dbo" `
--sourcetable "SensorReadings" `
--directory "C:\temp" `
--fileoutput "sensor_readings.parquet" `
--parallelmethod "DataDriven" `
--distributekeycolumn "`$partition.PF_Monthly(reading_date)" `
--paralleldegree 12 `
--merge "False"
Partition by date ranges for time-series data using Timepartition FastBCP parallelmethod :
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "TimeSeriesDB" `
--trusted `
--sourceschema "dbo" `
--sourcetable "SensorReadings" `
--directory "s3://mycorpexportbucket/raw/TimeSeriesDB/SensorReadings/" `
--fileoutput "sensor_readings.parquet" `
--cloudprofile "mycorps3profile" `
--parallelmethod "Timepartition" `
--distributekeycolumn "(reading_date,year,month,day)" `
--paralleldegree 12 `
--merge "False"
This technic will produce one file per day inside a directory structure (local or cloud) as a Hive partitionning directory : /year=2026/month=11/day=11/sensor_readings.parquet
You can after a full export use a query to filter your data and overwrite files that was already generated on the overlaping period.
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "TimeSeriesDB" `
--trusted `
--sourceschema "dbo" `
--sourcetable "SensorReadings_Part" `
--query "SELECT * FROM dbo.SensorReadings_Part WHERE reading_date >= CAST(GETDATE() - 10 AS DATE)" `
--directory "s3://mycorpexportbucket/raw/TimeSeriesDB/SensorReadings/" `
--fileoutput "sensor_readings.parquet" `
--cloudprofile "mycorps3profile" `
--parallelmethod "Timepartition" `
--distributekeycolumn "(reading_date,year,month,day)" `
--paralleldegree 10 `
--merge "False"
Complex Queries
Multi-table Joins
Export data with a query with joins:
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "SalesDB" `
--trusted `
--query "SELECT o.order_id, o.order_date, c.customer_name, od.product_id, od.quantity, od.unit_price FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id INNER JOIN order_details od ON o.order_id = od.order_id WHERE o.order_date >= '2024-01-01'" `
--directory "C:\temp" `
--fileoutput "sales_detail.csv" `
--decimalseparator "." `
--delimiter "|" `
--dateformat "yyyy-MM-dd HH:mm:ss" `
--encoding "UTF-8" `
--parallelmethod "Ntile" `
--distributekeycolumn "o.order_id" `
--paralleldegree 8 `
--merge "False"
Aggregated Data
Export pre-aggregated data:
.\FastBCP.exe `
--connectiontype "pgsql" `
--server "localhost" `
--port "5432" `
--database "analytics" `
--user "postgres" `
--password "postgres" `
--query "SELECT date_trunc('day', order_date) as order_day, customer_region, COUNT(*) as order_count, SUM(total_amount) as total_sales FROM orders GROUP BY date_trunc('day', order_date), customer_region" `
--directory "C:\temp" `
--fileoutput "daily_sales_by_region.csv" `
--decimalseparator "." `
--delimiter "|" `
--dateformat "yyyy-MM-dd" `
--encoding "UTF-8" `
--parallelmethod "None"
Performance Tuning
Optimal Parallel Degree
- Auto-detect
- Manual
Use negative values to auto-calculate based on CPU cores:
# Use all cores minus 2
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "LargeDB" `
--trusted `
--sourceschema "dbo" `
--sourcetable "BigTable" `
--directory "C:\temp" `
--fileoutput "bigtable.parquet" `
--parallelmethod "RangeId" `
--distributekeycolumn "id" `
--paralleldegree -2 `
--merge "False"
Specify exact thread count:
# Use exactly 16 threads
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "LargeDB" `
--trusted `
--sourceschema "dbo" `
--sourcetable "BigTable" `
--directory "C:\temp" `
--fileoutput "bigtable.parquet" `
--parallelmethod "RangeId" `
--distributekeycolumn "id" `
--paralleldegree 16 `
--merge "False"
Parquet RowGroup Size
To reduce memory pressure and, in many cases, improve export performance, you can configure the Parquet row group size by setting the FASTBCP_RGSIZE environment variable.
A smaller FASTBCP_RGSIZE value reduces memory usage during export. However, this comes with a trade-off: smaller row groups may reduce Parquet compression efficiency and can sometimes increase the final file size.
Recommended values range from 1000 to 1_000_000.
In practice, values between 32,000 and 64,000 often provide a good balance between memory usage, performance, and compression efficiency.
- Windows
- Linux
$Env:FASTBCP_RGSIZE=32000
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "LargeDB" `
--trusted `
--sourceschema "dbo" `
--sourcetable "BigTable" `
--directory "C:\temp" `
--fileoutput "bigtable.parquet" `
--parallelmethod "RangeId" `
--distributekeycolumn "id" `
--paralleldegree -2 `
--merge "False"
Specify exact thread count:
export FASTBCP_RGSIZE=32000
.\FastBCP /
--connectiontype "mssql" /
--server "mssqlhost.domain.lan,1433" /
--database "LargeDB" /
--user "FastUSer" /
--password "FastPassword" /
--sourceschema "dbo" /
--sourcetable "BigTable" /
--directory "/data/export/bigtable/" /
--fileoutput "bigtable.parquet" /
--parallelmethod "RangeId" /
--distributekeycolumn "id" /
--paralleldegree 16 /
--merge "False"
Merge vs No-Merge
- With Merge
- Without Merge
Combine parallel outputs into single file:
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "DataDB" `
--trusted `
--sourceschema "dbo" `
--sourcetable "Data" `
--directory "C:\temp" `
--fileoutput "data_merged.parquet" `
--parallelmethod "RangeId" `
--distributekeycolumn "id" `
--paralleldegree 8 `
--merge "True"
Result: Single file data_merged.parquet, slower and only OnPremise.
Keep separate files for parallel processing:
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "DataDB" `
--trusted `
--sourceschema "dbo" `
--sourcetable "Data" `
--directory "C:\temp" `
--fileoutput "data_part.parquet" `
--parallelmethod "RangeId" `
--distributekeycolumn "id" `
--paralleldegree 8 `
--merge "False"
Result: Multiple files data_part_Chunk000.parquet, data_part_Chunk001.parquet, etc.
Enable Debug Logging
Get detailed execution information:
.\FastBCP.exe `
--connectiontype "mssql" `
--server "localhost" `
--database "TestDB" `
--trusted `
--sourceschema "dbo" `
--sourcetable "TestTable" `
--directory "C:\temp" `
--fileoutput "debug_export.csv" `
--parallelmethod "Random" `
--distributekeycolumn "id" `
--paralleldegree 4 `
--merge "False" `
--loglevel "Debug" `
--runid "debug_test_run"
Use --loglevel "Debug" to see detailed execution steps, connection strings (without passwords), and query generation details.
Build your command with the Wizard