17

I have a pdf file with some text on each page which I would like to remove.

The text is matched by a regex and I think it comes in one block of the pdf.

I have used pdfedit to select and delete the text with the GUI but I was looking for a way to do this from the terminal.

DrYap
  • 271
  • 1
  • 2
  • 4

5 Answers5

10

You can try pdftk, but it works only a fraction of the time, due to (I believe) a problem with fonts.

It works like this: first you need to uncompress the pdf file,

  pdftk myfile.pdf output unc.pdf uncompress

then you modify it with

  sed 's/oldstring/newstring/g' < unc.pdf > mod_unc.pdf

lastly you recompress it with

 pdftk mod_unc.pdf output myfile_modified.pdf compress

I have had only moderate success with this command, in the sense that sometimes it works, sometimes it doesn't, according to its whim.

MariusMatutiae
  • 46,990
  • 12
  • 80
  • 129
  • I gave it a go but the uncompress turned most of the text to white which never got recovered. Funnily the only text that wasn't white was what I'm trying to get rid of!!! Thanks for the suggestion. – DrYap Feb 01 '14 at 13:25
  • 3
    One can also use `qpdf`: `qpdf --stream-data=uncompress myfile.pdf unc.pdf`. – Eugene Pakhomov Jul 22 '18 at 15:59
  • 1
    Marius, pdftk is not an Open Source any longer, unfortunately. -- do you know of another solution? Thank you. – Maxim Sep 09 '18 at 07:36
  • @Maxim I still have it in my Debian (and Ubuntu) repos, which means the version available to Linux users is certainly opens source. Why do you say it´s not FOSS any longer? There are free and pay versions, but for the users of Windows and MacOS. – MariusMatutiae Sep 10 '18 at 05:51
  • @MariusMatutiae Marius -- I tool the following from the [Fedora project mailing list](https://lists.fedoraproject.org/pipermail/users/2014-December/thread.html#455977) : `Jochen Schmitt 2014-03-04 Because pdftk depends on a gcj-feature which no more supported in Fedora I have retired this package for F20+. So no new version of pdftk will be available on Feodra. whole discussion: https://lists.fedoraproject.org/pipermail/users/2014-December/thread.html#455977 The problem is that libgcj does not exist on F21. So, we can not run pdftk unless we go back to F20` -- – Maxim Jul 06 '19 at 21:04
  • @MariusMatutiae -- I have switched from Fedora to Linux Mint since then, and pdftk is not in the Mint repositories either [any longer]. -- but importantly, `libgcj` is gone both from Fedora and Mint... – Maxim Jul 06 '19 at 21:07
  • Regarding the discussion in the comments, see my [answer here](https://superuser.com/a/1531760/323079). – Hashim Aziz Mar 25 '20 at 20:15
  • @Hashim Thank you, excellent answer, +1 from me. My answer is pretty old... – MariusMatutiae Mar 26 '20 at 09:08
2

On Windows (maybe a virtual machine) you could install PDF-XChange Editor https://www.tracker-software.com/product/downloads/enduser/pdf-xchange-editor

In the free-version can remove text (but not add text) without adding a watermark (of the software, even the software tells you so).

I had to remove several texts, therefore sed was too timeconsuming/exhausting, and sed did not work with umlauts.

Source: https://de.wikipedia.org/wiki/Benutzer:JoKalliauer/PDF

JoKalliauer
  • 270
  • 3
  • 8
1

Copying my answer from a similar but closed question on the main SE site:

changepagestring will do this in a single step, as easy as:

changepagestring -o -v infile.pdf search-regex replace-str outfile.pdf

Finding the right regex can be tricky and even then it may not work with all PDFs, but it's the best option I've found so far.

Brian Z
  • 1,090
  • 10
  • 20
1

inkscape 1.2 added support for ,(import/export) multi page PDFs coupled with its good pdf object(?) support it did the job

yoshco
  • 476
  • 6
  • 19
-4

you can use any PDF editors. Nitro PDF is a good tool to edit PDF. There are also so many free tools. You can add or remove text using this.

http://www.nitropdf.com/free-pdf-software

PDFEdit is a good option for linux. read this link to know how to install. cyberciti.biz/tips/open-source-linux-pdf-writer.html

  • 2
    The OP is on Linux and they said they already used PDFEdit. Please read the question before posting an answer. – slhck Feb 13 '14 at 13:21