35

I want to merge several hundred pdf files in a directory automatically according to their file names.

E.g.

The files 1000.1.pdf 1000.2.pdf 1000.3.pdf 1000.x.pdf should be merged into 1000.pdf

and

2000.abc.pdf 2000.def.pdf 2000.ghi.pdf 2000.jkl.pdf 2000.5.pdf into 2000.pdf.

I don’t want to use solutions based on Preview/Automator (if available) because compared to third party software like Adobe Acrobat or PDFpen merging pdf files often (depending on the source documents) results in a significant increase in file size (see e.g. What causes PDF file size to increase when saving in Preview?)

Do you have any recommendations? Thank you!

lejonet
  • 1,530
  • Combining PDFs will always increase file size, what exactly is your issue with that? – nohillside Aug 18 '13 at 13:21
  • 1
    @patrix I am speaking of hundreds of files to be merged. As linked above there can be a significant increase in file size with different tools included in Mac OS X. Why should I want a merged text file with an increase in size of sometimes several hundred percent? – lejonet Aug 18 '13 at 14:47
  • If you merge 10 files with 1 MB each, I'm not too surprised if the resulting file is 10 MB. What else should happen here? – nohillside Aug 18 '13 at 15:35
  • 2
    @patrix , lejone8 wants to have a automatic merge of PDF files but at 1+1=2 ratio, not with 1+1=5? or more in file size. Furthermore lejonet8 clearly points out that the use of Apple products is not acceptable due to they poor performances as compared to third party products! I do not know why you delete my comments, but so be it. – Ruskes Aug 18 '13 at 15:39
  • @Buscar웃 That's what I'm trying to find out. The linked article is related to 10.5, it might help if the asker could cite specific sizing examples for his scenario. – nohillside Aug 18 '13 at 15:41
  • 1
    I do not understand the Automation request. It is very simple and fast to organize (sort) files by name, then select all in the wanted category, and do the merge in once click in any of the available programs like answered here, or others. The resulting file size will depend on type and content of the PDF files, so the 1+1=2 is not possible. The lejonet8 seems to be hang up on the argument why Apple programs create bigger pdf files then others. Good luck in answering that. – Ruskes Aug 18 '13 at 15:59
  • @patrix Thank you for your comment. When I merge 10 files with 1 MB each I don't get a 10 MB document, but sometimes a 30 or 40 MB document, depending on the source files (I said this several times before). You can easily try this out, merge some files with Preview or Automator and you'll often get bloated results while merging in third party software has no comparable side effects. All I want to do is automate merging according to file names using suitable third-party software. – lejonet Aug 18 '13 at 16:51
  • @Buscar웃 There is not much automation in your solution ("very simple"). I don’t want to pick files manually, I did this before with PDFpen Pro and Adobe Acrobat (with a reasonable resulting file size), but it takes ages. – lejonet Aug 18 '13 at 16:54
  • 2
    Can you please edit the question to describe in more details what "automated" means for you (it seems to mean different things for the people who took the time to propose solutions for your problem)? What should trigger the merging of the documents? Which patterns should be used to find matching files? Especially D.W.'s answer seems to be highly automated at first glance but maybe there is more behind your question than we know right now. – nohillside Aug 18 '13 at 17:28
  • For anyone looking at this in 2023, who is looking for a solution without installing anything, you can use Shortcuts, which has a "Make PDF from files" step. – Melissa Jul 12 '23 at 18:37

4 Answers4

56

There’s a Python script hidden in Automator.app that joins .PDF-files

/System/Library/Automator/Combine\ PDF\ Pages.action/Contents/Resources/join.py --help
Usage: join [--output <file>] [--shuffle] [--verbose]

Example usage:

/System/Library/Automator/Combine\ PDF\ Pages.action/Contents/Resources/join.py --output all.pdf *.pdf
Rob
  • 7,178
akuhn
  • 1,136
  • 2
    Thank you for your answer. I can’t comment on the technical background (there might be a difference in merging in Automator and Preview). Depending on the source files there can be a significant increase in file size as well. I just tested it again and four files with a size of 12 mb in total where joined to a 32 mb document. This is unacceptable. – lejonet Aug 17 '13 at 21:05
  • Sorry, can’t help with that. – akuhn Aug 19 '13 at 15:47
  • 4
    I added this command as an alias in my ~/.bash_profile file like this: alias catpdf="/System/Library/Automator/Combine\ PDF\ Pages.action/Contents/Resources/join.py --output all.pdf *.pdf" so I can just cd into a directory containing PDFs and run catpdf. – Stewart Macdonald Sep 02 '15 at 12:15
  • 2
    Upvoting for ingenious use of Python script hidden in an APP! I decided to use pdftk though for more robust solution. – Blairg23 Dec 16 '15 at 00:00
  • 1
    @lejonet The technical background is that both use the same frameworks (Quartz.CoreGraphics on 10.11), as you guessed. This can be seen from the first lines of the `join.py' script (import statements). – hans_meine Jan 04 '16 at 13:56
  • zsh: no such file or directory: /System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py

    Chip: Apple M1, Model Name: MacBook Pro, OS : macos ventura (13)

    – ViKi Vyas Nov 14 '22 at 13:05
27

Try pdftk. It is command-line software that can join PDF files (and do lots of other stuff, too, but that isn't relevant here). You can download it from the official pdftk web page.

Sample syntax:

pdftk old1.pdf old2.pdf old3.pdf cat output new.pdf

will create the file new.pdf that contains the concatenation of the files old1.pdf, old2.pdf, old3.pdf.

To solve your problem, with your example filenames:

pdftk 1000.*.pdf cat 1000.pdf
pdftk 2000.*.pdf cat 2000.pdf

and so on. You can use shell scripting to make this completely automatic if desired (but you'll have to spend a little time on your own learning how to write shell scripts).


Assuming all files are named 1000.x, 2000.x etc. a shell script could look somehow like this

#!/bin/bash

for n in {1..9}; do
    if [[ -r ${n}000.1.pdf ]]; then
        rm -f ${n}000.pdf
        pdftk ${n}000.*.pdf cat ${n}000.pdf && mv ${n}000.*.pdf ~/.Trash/
    fi
done
espinchi
  • 103
D.W.
  • 4,048
  • Thank you for your comment, but I don't see where is is a workflow for my needs (hundreds of files with similar names, merged according to these names). – lejonet Aug 18 '13 at 01:19
  • 1
    @lejonet8, that's exactly where command-line tools shine! I've edited my answer to give you an example of how to do that. Working out further details is probably beyond the scope of this question, and relates more to how to write shell scripts. – D.W. Aug 18 '13 at 01:46
  • Thank you for your answer. Unfortunately running it on my files it resulted in this error message: Error: Unexpected text in page range end, here: 1000.pdf – lejonet Aug 18 '13 at 10:19
  • 1
    @lejonet8 Maybe you could elaborate on your automation requirements a bit more in the question. Are you looking for things like "Folder Actions" or such? What would trigger the concatenation process in the first place? – nohillside Aug 18 '13 at 15:47
  • @patrix Thank you for your comment. It doesn’t matter if I select all files and a script merges them according to file names, or I run a script on a directory or some GUI application does that after adding all files or choosing the directory. – lejonet Aug 18 '13 at 16:58
  • 3
    @lejonet8, You might have to experiment a bit to see how to make pdftk work for you. One error message, with no context, is unfortunately not enough for me to diagnose the problem. Try concatenating some pairs of files. Experiment. See if you can diagnose when pdftk does/doesn't work and the cause. Read the tutorial. Then, post a question about making pdftk work on a suitable Stack Exchange site or other question-and-answer site. FWIW, pdftk has been very reliable for me, working with a broad variety of pdf files generated by many different programs. Of course your experience may vary. – D.W. Aug 18 '13 at 22:25
  • Unfortunately, pdftk does not build anymore on 10.10 "El Capitan". See https://trac.macports.org/ticket/48528 – hans_meine Jan 04 '16 at 13:52
12

You can use pdfunite distributed with poppler. You can install poppler with Homebrew:

brew install poppler

And now use it:

pdfunite input1.pdf input2.pdf input3.pdf output.pdf

poppler also comes with these other commands: pdfdetach, pdffonts, pdfimages, pdfinfo, pdfseparate, pdftocairo, pdftohtml, pdftoppm, pdftops, pdftotext, in addition to pdfunite.

Flimm
  • 2,016
0

pdftk doesn't work anymore on El Capitan! (OS X 10.10)

An alternative is pagemaster from PDFTron. The syntax would be:

pagemaster -m *.pdf -o output.pdf

It doesn't have the file size increase problem of the Automator solution above, since it uses a custom PDF library.

Note: this is not a free tool. The demo version adds a thin watermark on each page.

antoine
  • 187