Extract html text files to text file

Question

I have a folder with lot of html files, I would like to extract only the text contained in the body of this html to a txt file, how can I do that ?

score 1 · Accepted Answer · answered Oct 04 '15 at 16:30

1

You can iterate over each file in the directory and use a command-line browser such as lynx or w3m to render the HTML to plaintext and save this into a text file.

Lynx example:

lynx -dump in.html > out.txt

w3m example:

w3m -dump in.html > out.txt

answered Oct 04 '15 at 16:30

rbialon

128
5

Extract html text files to text file

1 Answers1