Merging Multiple CSV Files without merging the header

Question

I need to merge multiple .CSV files (using the cat command) but without copying the header for each file.

What's the best way to accomplish this task?

score 45 · Answer 1 · edited Jul 26 '19 at 03:36

45

You'll need more than the cat command, as described here:

Say you have 3 CSV-files: file1.csv, file2.csv, and file3.csv and want to join them to bigfile.csv and your header is always (only) the first line, then use

either (keep header from first file "file1.csv"):

cat file1.csv <(tail +2 file2.csv) <(tail +2 file3.csv) > bigfile.csv

or (remove header from all files who's names begin with "file"):

awk 'FNR > 1' file*.csv > bigfile.csv

edited Jul 26 '19 at 03:36

Lynn

3

answered Jan 31 '13 at 21:16

iolsmit

4,447

4

I found this looking for a generic linux answer, but in my case this didn't exactly work. It would silently ignore file1.csv. I needed to cat that file.
cat <(cat file1.csv) <(tail +2 file2.csv) <(tail +2 file3.csv) > bigfile.csv
– Lelon Jun 18 '13 at 20:41
actually I had to use tail -n+2, tail +2 wouldn't work – Matthieu Napoli Jan 09 '14 at 16:51

score 35 · Answer 2 · answered Jan 15 '19 at 08:40

35

I agree with the top answer but I suggest to extend it with the following scenario (as I can not comment):

If you want the output file to contain header (once) the correct script is:

awk '(NR == 1) || (FNR > 1)' file*.csv > bigfile.csv

FNR represents the number of the processed record in a single file. And NR represents it globally, so first line is accepted and the rest are ignored as before.

answered Jan 15 '19 at 08:40

Marek Grác

451

I keep comming back to this answer everytime I come across a CSV file merge on command line, requiring to keep the header of only one file. – lsdr Dec 17 '20 at 00:31

Lri · Answer 3 · 2013-04-12T10:25:27.517

9

You could also use a group command ({ ; }) instead of process substitution (<()):

{ head -n1 file1.csv; for f in file*.csv; do tail -n+2 "$f"; done; } > new.csv

It also works with CRLF line endings as long as the files end with an empty line (\r\n).

The number-only versions of head and tail were made obsolete by POSIX 1003.1-2001, and they result in warnings in some environments.

edited Apr 12 '13 at 10:25

answered Apr 12 '13 at 10:12

Lri

105,117

score 2 · Answer 4 · edited Mar 11 '17 at 15:15

2

Needed to concatenate two large CSVs with identical columns into larger CSV for chunking script (data does not have unique id's).

First took header out of second csv

awk 'FNR > 1' file2.csv > file2_noheading.csv

Next, concatenated via the following

cat file1.csv file2_noheading.csv > newfile.csv

edited Mar 11 '17 at 15:15

nohillside

100,768

answered Mar 11 '17 at 13:49

mcconnelljk

21

score 1 · Answer 5 · answered Oct 29 '13 at 18:07

Using the command sequence above resulted in a file looking like this:

header,of,csv1
contents,of,csv1
==> csv2.csv

contents,of,csv2

To make it a proper CSV, with one header line and all the relevant values, I employed the following sed incantation... sed -ie "/^$/d;/^==>/d" bigfile.csv

score 1 · Answer 6 · edited Mar 21 '17 at 06:40

1

Easier Solution if you have a ton of files :

awk 'FNR > 1' *.csv > merged.csv

Just go back to edit the big file and add the header back in.

edited Mar 21 '17 at 06:40

nohillside

100,768

answered Mar 20 '17 at 23:51

brian

31

How is your answer any different than what was all ready presented by iolsmit in 2013, being awk 'FNR > 1' file*.csv > bigfile.csv? It's not! – user3439894 Mar 21 '17 at 00:15
Re: how is it any different? It is a more terse answer and the one that I copied and pasted, at least : ) Gets my upvote – Rick Davies Oct 17 '18 at 16:14
1

This is a good answer, because you don’t need all the files to start with file – big_smile Sep 26 '19 at 12:43

score 1 · Answer 7 · answered Feb 20 '24 at 00:19

Give joinem a try, available via PyPi: python3 -m pip install joinem.

joinem provides a CLI for fast, flexbile concatenation of tabular data using polars. I/O is lazily streamed in order to give good performance when working with numerous, large files.

Example Usage

Pass input files via stdin and output file as an argument.

ls -1 path/to/*.csv | python3 -m joinem out.csv

You can add the --progress flag to get a progress bar.

Further Information

joinem is also compatible with parquet, JSON, and feather file types. See the project's README for more usage examples and a full command-line interface API listing.

disclosure: I am the library author of joinem.

Merging Multiple CSV Files without merging the header

7 Answers7

Example Usage

Further Information