Info: how to get a list of all articles programmatically?

Question

I'm contemplating an improved indexing of Info pages with Sphinx. But in order to do this, I'd need some way of programmatically extracting all nodes (in order to then insert the contents of each node into a database via xmlpipe). But, I don't see a way to do this programmatically, do you have any suggestions on how to do it?

This is tagged "Emacs" because Emacs can be used as Info browser, and if the original Info program can't produce such a list, then using Emacs Lisp for that purpose would be OK too.

PS. Someone with more rep. please add "info" tag, it's hard enough to google it as it is.

Edit, I think I'm getting somewhere:

info --subnodes -o ./info.txt

This is almost what I want, except it gives too much information (not only the node list, but also some description and decoration).

OK, writing it appears to be easier than searching, well, after all the point of this entire effort is to make search easier, so, I think, no harm done! :D

(defun sphinx-info-nodes ()
  (let ((tmp-file (make-temp-file "info")))
    (shell-command (format "info --subnodes -o '%s'" tmp-file))
    (with-temp-buffer
      (insert-file-contents tmp-file)
      (cl-loop while (re-search-forward "^\\*\\s-*\\([^:]+\\):" nil t)
               collect (match-string 1)))))

I'm a little confused here. By info pages, you mean the GNU `Info` pages with help for programs in general? Or are you referring to the `man` pages? Or are you referring to Python documentation pages? — jcoppens, Jun 29 '15 at 15:55
@jcoppens Info is a program, which can display help contents. The files are typically in a special format, which is mostly text, but also use some non-printable characters to aid the reader. These files are typically created by editing TexInfo files (plain text with some markup). Man pages serve similar purpose, but are in a different format / use a different reader. Here's more formal introduction: http://www.gnu.org/software/texinfo/manual/info-stnd/info-stnd.html#Top — wvxvw, Jun 29 '15 at 17:31
Yes, I know what Info files and man files are. But I regularly notice confusion (*not* mine) when I read articles. I suppose you know the page about TkInfo? (http://math-www.uni-paderborn.de/~axel/tkinfo/) There are many useful 'reader' including `saminfo` which converts info to tree structure. — jcoppens, Jun 29 '15 at 18:07
@jcoppens nope, I didn't know about this one, and had no idea there are so many of them! On the second though though, I'd prefer this to be accomplished using the programs which I can expect to be installed on a typical Linux desktop (Emacs can be assumed to be one, since this is supposed to be used in Emacs). — wvxvw, Jun 29 '15 at 18:27
TkInfo - which is based on Tk, which is part of the Tcl/Tk packge, are unix/linux-based. Almost every Linux distribution comes with Tcl installed, many have Tk too. Also, you could easily create the structure you need by visiting each node in `/usr/share/Info`. — jcoppens, Jun 29 '15 at 20:12
@jcoppens `/usr/share/Info` is certainly not enough. For instance, there's also `$INFOPATH` variable, but there's also `Info-directory-list` in Emacs (for me, there are about ten directories all in all), plus most info files don't map to articles one-to-one, sometimes there are many articles in the same file. But it seems like parsing the output of `info --subnodes` is trivial, so, I'll probably just write some regexps and be done with it. — wvxvw, Jun 29 '15 at 20:27
INFOPATH is probably an environment variable, which you can extract from, well, the environment variables (accessible in Python). I suspect that all other files are auto-generated by the installation of a new Info file, based on information read in the newly added file. — jcoppens, Jun 29 '15 at 20:33

Info: how to get a list of all articles programmatically?

0 Answers0