Version: 0.32 (Latest)

Advanced Scenarios

Advanced use cases, partition functions, and troubleshooting examples.

SQL Server Partition Functions

Use SQL Server partition functions for efficient parallel exports.

Using Partition Function with DataDriven

Leverage existing partition functions to distribute work:

.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "tpch10_collation_bin2" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "orders_part" `
  --directory "D:\temp" `
  --fileoutput "orders_partitioned.csv" `
  --decimalseparator "." `
  --delimiter "|" `
  --dateformat "yyyy-MM-dd HH:mm:ss" `
  --encoding "UTF-8" `
  --parallelmethod "DataDriven" `
  --distributekeycolumn "`$partition.PF_DATE(o_orderdate)" `
  --paralleldegree 10 `
  --merge "False"

Partition Function Syntax

Use `$partition.FunctionName(ColumnName) to reference a partition function. The backtick escapes the dollar sign in PowerShell.

Date-based Partitioning

Partition by date ranges for time-series data using database partitions:

.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "TimeSeriesDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "SensorReadings" `
  --directory "C:\temp" `
  --fileoutput "sensor_readings.parquet" `
  --parallelmethod "DataDriven" `
  --distributekeycolumn "`$partition.PF_Monthly(reading_date)" `
  --paralleldegree 12 `
  --merge "False"

Partition by date ranges for time-series data using Timepartition FastBCP parallelmethod :

.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "TimeSeriesDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "SensorReadings" `
  --directory "s3://mycorpexportbucket/raw/TimeSeriesDB/SensorReadings/" `
  --fileoutput "sensor_readings.parquet" `
  --cloudprofile "mycorps3profile" `
  --parallelmethod "Timepartition" `
  --distributekeycolumn "(reading_date,year,month,day)" `
  --paralleldegree 12 `
  --merge "False"

This technic will produce one file per day inside a directory structure (local or cloud) as a Hive partitionning directory : /year=2026/month=11/day=11/sensor_readings.parquet

You can after a full export use a query to filter your data and overwrite files that was already generated on the overlaping period.

.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "TimeSeriesDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "SensorReadings_Part" `
  --query "SELECT * FROM dbo.SensorReadings_Part WHERE reading_date >= CAST(GETDATE() - 10 AS DATE)" `
  --directory "s3://mycorpexportbucket/raw/TimeSeriesDB/SensorReadings/" `
  --fileoutput "sensor_readings.parquet" `
  --cloudprofile "mycorps3profile" `
  --parallelmethod "Timepartition" `
  --distributekeycolumn "(reading_date,year,month,day)" `
  --paralleldegree 10 `
  --merge "False"

Complex Queries

Multi-table Joins

Export data with a query with joins:

.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "SalesDB" `
  --trusted `
  --query "SELECT o.order_id, o.order_date, c.customer_name, od.product_id, od.quantity, od.unit_price FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id INNER JOIN order_details od ON o.order_id = od.order_id WHERE o.order_date >= '2024-01-01'" `
  --directory "C:\temp" `
  --fileoutput "sales_detail.csv" `
  --decimalseparator "." `
  --delimiter "|" `
  --dateformat "yyyy-MM-dd HH:mm:ss" `
  --encoding "UTF-8" `
  --parallelmethod "Ntile" `
  --distributekeycolumn "o.order_id" `
  --paralleldegree 8 `
  --merge "False"

Aggregated Data

Export pre-aggregated data:

.\FastBCP.exe `
  --connectiontype "pgsql" `
  --server "localhost" `
  --port "5432" `
  --database "analytics" `
  --user "postgres" `
  --password "postgres" `
  --query "SELECT date_trunc('day', order_date) as order_day, customer_region, COUNT(*) as order_count, SUM(total_amount) as total_sales FROM orders GROUP BY date_trunc('day', order_date), customer_region" `
  --directory "C:\temp" `
  --fileoutput "daily_sales_by_region.csv" `
  --decimalseparator "." `
  --delimiter "|" `
  --dateformat "yyyy-MM-dd" `
  --encoding "UTF-8" `
  --parallelmethod "None"

Performance Tuning

Optimal Parallel Degree

Auto-detect
Manual

Use negative values to auto-calculate based on CPU cores:

# Use all cores minus 2
.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "LargeDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "BigTable" `
  --directory "C:\temp" `
  --fileoutput "bigtable.parquet" `
  --parallelmethod "RangeId" `
  --distributekeycolumn "id" `
  --paralleldegree -2 `
  --merge "False"

Specify exact thread count:

# Use exactly 16 threads
.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "LargeDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "BigTable" `
  --directory "C:\temp" `
  --fileoutput "bigtable.parquet" `
  --parallelmethod "RangeId" `
  --distributekeycolumn "id" `
  --paralleldegree 16 `
  --merge "False"

Parquet RowGroup Size

To reduce memory pressure and, in many cases, improve export performance, you can configure the Parquet row group size by setting the FASTBCP_RGSIZE environment variable.

A smaller FASTBCP_RGSIZE value reduces memory usage during export. However, this comes with a trade-off: smaller row groups may reduce Parquet compression efficiency and can sometimes increase the final file size.

Recommended values range from 1000 to 1_000_000.

In practice, values between 32,000 and 64,000 often provide a good balance between memory usage, performance, and compression efficiency.

Windows
Linux

$Env:FASTBCP_RGSIZE=32000

.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "LargeDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "BigTable" `
  --directory "C:\temp" `
  --fileoutput "bigtable.parquet" `
  --parallelmethod "RangeId" `
  --distributekeycolumn "id" `
  --paralleldegree -2 `
  --merge "False"

Specify exact thread count:

export FASTBCP_RGSIZE=32000

.\FastBCP /
  --connectiontype "mssql" /
  --server "mssqlhost.domain.lan,1433" /
  --database "LargeDB" /
  --user "FastUSer" /
  --password "FastPassword" /
  --sourceschema "dbo" /
  --sourcetable "BigTable" /
  --directory "/data/export/bigtable/" /
  --fileoutput "bigtable.parquet" /
  --parallelmethod "RangeId" /
  --distributekeycolumn "id" /
  --paralleldegree 16 /
  --merge "False"

Merge vs No-Merge

With Merge
Without Merge

Combine parallel outputs into single file:

.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "DataDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "Data" `
  --directory "C:\temp" `
  --fileoutput "data_merged.parquet" `
  --parallelmethod "RangeId" `
  --distributekeycolumn "id" `
  --paralleldegree 8 `
  --merge "True"

Result: Single file data_merged.parquet, slower and only OnPremise.

Keep separate files for parallel processing:

.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "DataDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "Data" `
  --directory "C:\temp" `
  --fileoutput "data_part.parquet" `
  --parallelmethod "RangeId" `
  --distributekeycolumn "id" `
  --paralleldegree 8 `
  --merge "False"

Result: Multiple files data_part_Chunk000.parquet, data_part_Chunk001.parquet, etc.

Enable Debug Logging

Get detailed execution information:

.\FastBCP.exe `
  --connectiontype "mssql" `
  --server "localhost" `
  --database "TestDB" `
  --trusted `
  --sourceschema "dbo" `
  --sourcetable "TestTable" `
  --directory "C:\temp" `
  --fileoutput "debug_export.csv" `
  --parallelmethod "Random" `
  --distributekeycolumn "id" `
  --paralleldegree 4 `
  --merge "False" `
  --loglevel "Debug" `
  --runid "debug_test_run"

Debug Mode

Use --loglevel "Debug" to see detailed execution steps, connection strings (without passwords), and query generation details.

tip

Build your command with the Wizard

SQL Server Partition Functions​

Using Partition Function with DataDriven​

Date-based Partitioning​

Partition by date ranges for time-series data using database partitions:​

Partition by date ranges for time-series data using Timepartition FastBCP parallelmethod :​

Complex Queries​

Multi-table Joins​

Aggregated Data​

Performance Tuning​

Optimal Parallel Degree​

Parquet RowGroup Size​

Merge vs No-Merge​

Enable Debug Logging​