0

I have a folder with lot of html files, I would like to extract only the text contained in the body of this html to a txt file, how can I do that ?

Meds
  • 359
  • 1
  • 8
  • 21

1 Answers1

1

You can iterate over each file in the directory and use a command-line browser such as lynx or w3m to render the HTML to plaintext and save this into a text file.

Lynx example:

lynx -dump in.html > out.txt

w3m example:

w3m -dump in.html > out.txt
rbialon
  • 128
  • 5