Is there a tool that can detect the language of the text of several paragraphs?
Asked
Active
Viewed 2,294 times
2 Answers
1
The file tool has a bunch of heuristics for guessing file types, including one that reports "English text". I don't know if it knows about other human languages, but it definitely could be upgraded to guess between them.
dmckee --- ex-moderator kitten
- 7,817
- 1
- 29
- 43
1
there are many tools around to do this, the first one thatI can think of is Google's own: http://code.google.com/apis/ajax/playground/#language_detect
- In java, there is http://textcat.sourceforge.net/
- In Ruby https://github.com/peterc/whatlanguage
- In Perl http://search.cpan.org/~ambs/Lingua-Identify-0.29/lib/Lingua/Identify.pm etc.
Hope it helps
Mortimer
- 196
- 1
- 2
-
The language_detect tool by google seems promising, I have to do this for more than one text. I see some code there but I don't know whether I can run it on my machine. – Flethuseo Mar 26 '11 at 21:56
-
google API probably has limits on the amount of queries you can send, so you might need to use one of the other libraries to do it I guess. – Mortimer Apr 01 '11 at 07:53