7

I am using pdftk to perform some operations on PDFs. Right now I am doing it like this:

pdftk <files> OPERATION <options> output - | \
pdftk - <other files> OPERATION <options> output - | \
... 
pdftk - OPERATION <options> output final.pdf

So basically I am outputting the result of the first operation to stdout and then piping it into another pdftk process and so on until I am done.

Is there a better way of doing this with only one pdftk process?

Operations used are: cat, multistamp, shuffle and range selections on the pages.

mael'
  • 1,926
  • 2
  • 10
  • 19
masgo
  • 2,194
  • 1
  • 16
  • 32
  • Oh, I forgot to mention that I am using Linux. The drawback of `FOR` is that is has to write to the disk. I am doing my operations with a lot of PDFs. – masgo Feb 26 '18 at 07:21
  • I am also using pdftk a lot, and was wondering if it was possible to somehow run a single pdftk server process in background and feed it commands and PDFs to handle. As the pdftk CLI is called "pdftk server", I would assume it generally should be somehow possible. Unfortunately I haven't found answer either, yet. – Gnudiff Jun 05 '20 at 08:54
  • Why would you want to do that. Startup time? For me, the biggest performance boost was storing the PDFs in a RAM-Disk. Since I do batch processing, the process is like: copy all to ram disk, do the operations, copy them to the destination. Should the server crash (which is extremly rare), then the process needs to start from scratch since the ramdisk is lost. But server crashes are like one per year .. so thats fine for me. – masgo Jun 05 '20 at 12:38
  • I need to process lots of incoming PDFs daily. each startup of pdftk creates a new java VM, as pdftk executable is nothing much more than a c++ wrapper for java libraries. Which makes my code slower. I am considering modifying pdftk c++ code to make it a demon and pass commands via socket, but that might be a bit too involved for me – Gnudiff Jun 10 '20 at 20:59
  • Could it be that you are using pdftk-java instead of the original pdftk? Because the normal pdftk runs rather fast. Processing ~200 pdfs on a 2-core VM takes only a couple of seconds for me. – masgo Jun 11 '20 at 05:37
  • I am using the standard pdftk package which goes with Ubuntu https://packages.ubuntu.com/xenial/pdftk . If you look at its source files, it is a compiled C++ program, but the C++ program is nothing much more than a wrapper on Java libraries, and it does start Java VM for each execution of the program. – Gnudiff Jun 13 '20 at 14:15
  • You can always throw hardware at a problem, and I do use a rather smallish virtual machine for the pdf manipulation, which also does a lot of other things, it is true. I could increase its resources. However, I feel a nagging just by knowing that it (probably) should be much more efficient to have a pdftk demon running in background, and serving commands one after another, instead of firing up/destroying 60 JVMs per minute. – Gnudiff Jun 13 '20 at 14:15
  • I see your point. As it looks, java will be the future for pdftk. The work on the original pdftk seems to have, more or less, stopped. In newer Ubuntu versions it has been replaced by pdftk-java. Maybe you can talk to the authors of pdftk-java and convince them to use something like https://github.com/facebook/nailgun For me, pdftk is fine as it is. `time pdftk AAAA.pdf BBBB.pdf cat output out.pdf real 0m0.156s user 0m0.088s sys 0m0.059s ` – masgo Jun 15 '20 at 02:55

1 Answers1

0

The pdftk package has been removed at least from Fedora. Consider pdfjam, it is a quite capable PDF mangler (hyperlinks get lost, though). Packages should be available for most Linux distributions and MacOS.

vonbrand
  • 2,451
  • 3
  • 21
  • 21
  • From pdfjam's documentation and examples it looks like it is not a replacement for my use case. I use stamping multi-stamping and watermarking functions a lot. – masgo Aug 25 '19 at 11:54