I need to merge multiple .CSV files (using the cat command) but without copying the header for each file.
What's the best way to accomplish this task?
I need to merge multiple .CSV files (using the cat command) but without copying the header for each file.
What's the best way to accomplish this task?
You'll need more than the cat command, as described here:
Say you have 3 CSV-files: file1.csv, file2.csv, and file3.csv and want to join them to bigfile.csv and your header is always (only) the first line, then use
either (keep header from first file "file1.csv"):
cat file1.csv <(tail +2 file2.csv) <(tail +2 file3.csv) > bigfile.csv
or (remove header from all files who's names begin with "file"):
awk 'FNR > 1' file*.csv > bigfile.csv
I agree with the top answer but I suggest to extend it with the following scenario (as I can not comment):
If you want the output file to contain header (once) the correct script is:
awk '(NR == 1) || (FNR > 1)' file*.csv > bigfile.csv
FNR represents the number of the processed record in a single file. And NR represents it globally, so first line is accepted and the rest are ignored as before.
You could also use a group command ({ ; }) instead of process substitution (<()):
{ head -n1 file1.csv; for f in file*.csv; do tail -n+2 "$f"; done; } > new.csv
It also works with CRLF line endings as long as the files end with an empty line (\r\n).
The number-only versions of head and tail were made obsolete by POSIX 1003.1-2001, and they result in warnings in some environments.
Needed to concatenate two large CSVs with identical columns into larger CSV for chunking script (data does not have unique id's).
First took header out of second csv
awk 'FNR > 1' file2.csv > file2_noheading.csv
Next, concatenated via the following
cat file1.csv file2_noheading.csv > newfile.csv
Using the command sequence above resulted in a file looking like this:
header,of,csv1
contents,of,csv1
==> csv2.csv
contents,of,csv2
To make it a proper CSV, with one header line and all the relevant values, I employed the following sed incantation... sed -ie "/^$/d;/^==>/d" bigfile.csv
Easier Solution if you have a ton of files :
awk 'FNR > 1' *.csv > merged.csv
Just go back to edit the big file and add the header back in.
awk 'FNR > 1' file*.csv > bigfile.csv? It's not!
– user3439894
Mar 21 '17 at 00:15
file
– big_smile
Sep 26 '19 at 12:43
Give joinem a try, available via PyPi: python3 -m pip install joinem.
joinem provides a CLI for fast, flexbile concatenation of tabular data using polars. I/O is lazily streamed in order to give good performance when working with numerous, large files.
Pass input files via stdin and output file as an argument.
ls -1 path/to/*.csv | python3 -m joinem out.csv
You can add the --progress flag to get a progress bar.
joinem is also compatible with parquet, JSON, and feather file types. See the project's README for more usage examples and a full command-line interface API listing.
disclosure: I am the library author of joinem.
– Lelon Jun 18 '13 at 20:41cat <(cat file1.csv) <(tail +2 file2.csv) <(tail +2 file3.csv) > bigfile.csvtail -n+2,tail +2wouldn't work – Matthieu Napoli Jan 09 '14 at 16:51