3

I need to merge about a 100 PDF files into one where each file uses more or less the same unsubsetted fonts. All the options I have tried so far (pdfunite, gs, etc.) are not intelligent about font duplication and the merged PDF ends up with a 100 copies of the same font and is therefore much larger than it needs to be.

Is there a way to do any one of the following:

  1. Merge the PDFs without duplicating fonts?
  2. De-duplicate the fonts in the PDF later?
  3. Remove fonts from the PDF entirely?

The ideal solution will have a commercial friendly open source license (eg. not APGL).

Kurt Pfeifle
  • 4,205
  • 1
  • 18
  • 21
user2771609
  • 139
  • 5
  • 2
    https://stackoverflow.com/questions/21979200/how-to-create-pdf-with-font-information-and-embed-actual-font-when-merging-them – Tom Brossman Nov 02 '18 at 19:24
  • @TomBrossman iText's `PdfSmartCopy` that the solution you linked to relies on would have been an option, except for the AGPL license. – user2771609 Nov 02 '18 at 20:23
  • @TomBrossman You are not wrong, but please don't make [askubuntu toxic](https://www.google.com/search?q=stackoverflow+toxic) and be polite, you are violating the [code of conduct](https://askubuntu.com/conduct). – user2771609 Nov 03 '18 at 15:38
  • 1
    Thank you for identifying this 'toxic' matter, I suggest you flag any code of conduct breaches you identify to the moderators of this site so they can take a look at them. – Tom Brossman Nov 03 '18 at 17:33

1 Answers1

2

Contrary to what you say, recent versions of Ghostscript have become quite efficient when it comes to merging multiple PDFs into a single one, and at the same time avoiding to embed an identical font multiple times.

Inputs

Here are the details about 3 input PDFs, which I'll merge into a single output:

for i in {1..3}; do pdffonts ${i}.pdf ; echo ; done

 name                       type              encoding         emb sub uni object ID
 -------------------------- ----------------- ---------------- --- --- --- ---------
 Helvetica                  Type 1C           WinAnsi          yes no  no       8  0

 name                       type              encoding         emb sub uni object ID
 -------------------------- ----------------- ---------------- --- --- --- ---------
 Helvetica                  Type 1C           WinAnsi          yes no  no       8  0

 name                       type              encoding         emb sub uni object ID
 -------------------------- ----------------- ---------------- --- --- --- ---------
 Helvetica                  Type 1C           WinAnsi          yes no  no       8  0

Merging

Now merge these three PDF input files with the help of pdftk.

pdftk 1.pdf 2.pdf 3.pdf cat output merged.pdf

Output

Now check the font status of the output merged.pdf:

pdffonts merged.pdf

 name                       type              encoding         emb sub uni object ID
 -------------------------- ----------------- ---------------- --- --- --- ---------
 Helvetica                  Type 1C           WinAnsi          yes no  no       5  0
 Helvetica                  Type 1C           WinAnsi          yes no  no      14  0
 Helvetica                  Type 1C           WinAnsi          yes no  no      23  0

Ok, not yet there...

Optimize with Ghostscript

gs -o optim.pdf -sDEVICE=pdfwrite merged.pdf 

 GPL Ghostscript GIT PRERELEASE 9.27 (2018-11-20)
 Copyright (C) 2018 Artifex Software, Inc.  All rights reserved.
 This software comes with NO WARRANTY: see the file PUBLIC for details.
 Processing pages 1 through 3.
 Page 1
 Page 2
 Page 3

Check font statuses and file sizes

ls -lh {1..3}.pdf merged.pdf optim.pdf 

 -rw-r--r--  1 kurtpfeifle  staff    51K Dec 31 20:25 1.pdf
 -rw-r--r--  1 kurtpfeifle  staff    51K Dec 31 20:25 2.pdf
 -rw-r--r--  1 kurtpfeifle  staff    51K Dec 31 20:25 3.pdf
 -rw-r--r--  1 kurtpfeifle  staff   147K Dec 31 20:32 merged.pdf
 -rw-r--r--  1 kurtpfeifle  staff   7.5K Dec 31 20:34 optim.pdf

Conclusion

I tested this with Ghostscript v9.25.

If this doesn't work for you, you'll need to...

  1. ...tell us the version of Ghostscript you are using;
  2. ...provide a link to (some of) your input PDFs for more detailed analysis.

I'm aware that this answer does not provide you with a solution that meets exactly your license requirements. -- But your false statement about Ghostscript prompted me to give this answer anyway, so other people interested in this topic can still benefit from it...

Kurt Pfeifle
  • 4,205
  • 1
  • 18
  • 21