Splitting csv files with Powershell

I’ve blogged before about the usefulness of Powershell for data tasks. A few weeks ago I had a requirement at work for merging csv files and recently I needed to split a single csv file into several files.

While this is easy to do using SSIS and a bit of T-SQL it is a little tedious, and more time consuming, that it needs to be for adhoc jobs. Again Powershell came to the rescue! Here’s an example.

First I created a csv called bigCsvfile.csv, with 1, 000 records of data and placed it into my user profile directory; C:\Users\Rhys on my laptop.

splitting_csv_files_with_powershell_1

This Powershell script does require a sequential integer id for each record to work correctly as it uses this to divide up the records. Now assuming you have a csv file, called bigCsvFile.csv, in the correct location then the only thing you should need to change is the $split variable value. This is now many files you would like to split the original file into.

?View Code POWERSHELL
# Author: Rhys Campbell
# 2009-09-22
# Splitting up csv files
 
# Path to csv file. Must contain a sequential unique integer id column 
$csvFile = "$Env:USERPROFILE\bigcsvFile.csv";
# Slit into how many files?
$split = 10;
 
# Get the csv file content
$content = Import-Csv $csvFile;
 
# So we start from Id = 1 in the csv file
$start = 1;
$end = 0;
 
# calc records per file
$records_per_file = [int][Math]::Ceiling($content.Count / $split);
 
for($i = 1; $i -le $split; $i++)
{
	# Set the end value for selecting records
	$end += $records_per_file;
	# Need to cast to int or we get an alphabetic comparison when we want a numeric one
	$content | Where-Object {[int]$_.Id -ge $start -and [int]$_.Id -le $end} | Export-Csv -Path "$Env:USERPROFILE\file$i.csv" -NoTypeInformation;
	# Update start value for selecting records
	$start = $end + 1;
}

Once the script has executed successfully it will create the split csv files in your user profile folder. The new files are named sequentially, i.e. file1.csv, file2.csv, file3.csv etc.

split_csv_files

Depending on the number of records, and the number of files split to, the records may not be the same in each file. Provided the number of records versus file split is reasonable it will be pretty close. Now with a Powershell script it takes just a few moments to perform those previously tedious tasks!


2 Comments

  1. badal says:

    Hey mate, I found your blog – “Splitting CSV files with PowerShell” is really useful.

    In my case, I am fetching the data from some other third party system by consuming web service then converting from XML output to CSV file and uploading the CSV file automatically into SharePoint list.

    The problem is that in the source XML data there is no sequential unique integer id column which is finally converting into CSV.

    I am facing challenge to insert the sequential unique integer id column using PowerShell script. Do you any chance know how can we achieve that would be really appreciate your help mate.

    Thanks.
    Cheers,
    Badal

  2. Rhys says:

    Hi Badal,

    Something like this should work…


    $content = Import-Csv -Path "C:\Path\to\file\message.csv";
    $id = 1;

    foreach ($row in $content)
    {
    $row | Add-Member –MemberType NoteProperty -Name id -Value $id;
    $id++;
    }

    $content | Export-Csv -Path "C:\Path\to\file\tmp.csv" -NoType;

    Although if this files were big I’d probably import them into a database, and then export them with a id added there.

    Cheers,

    Rhys

Leave a Reply