4

When I try to detect text on my jpeg, it shows correctly all areas where it suspects text and images, but when I export it to ODT it only creates an ODT with empty text- and imageframes.

Do I have to configure tesseract somehow?

(I use Ubuntu 14.10 32bit)

rubo77
  • 31,573
  • 49
  • 159
  • 281

1 Answers1

3

Try this:

Open the ocrfeeder program.

Edit the engine: Click Tools - OCR Engine

Select the Tesseract engine and click Edit

Where it says arguments engine changed the script for this:

$IMAGE $FILE -l eng -psm 3 > /dev/null 2> /dev/null; cat $FILE.txt; rm $FILE $FILE.txt

To export the document click File - Export

Select the desired output format.

If the document has pictures I advise using the html format text.

If only has text the best is to use the format plain text txt .

kyodake
  • 15,052
  • 3
  • 40
  • 48
  • 2
    You just need to setup the engine command line on OCR Feeder settings. replacing `$LANG` with `-l lang_id` where `lang_id` is the id as shown on the correspondin language package. The lang_ids can be found with `apt-get search tesseract-ocr` for example spa = spanish, fra = french, deu = german, nld = dutch; ita = italian, por = portugese, ... If you just want to scan in your language, you can stick with `$LANG` which is your system language – rubo77 Jul 04 '15 at 21:17
  • 1
    @rubo77 thanks for the hint! But i think you mean `apt-cache search` – Murmel Oct 14 '15 at 08:20