Tools for converting 8-bit C1 control characters to ESC sequences?

Question

The ECMA-48 ("ANSI escape sequences") standard describe two ways of encoding the C1 set of control codes: using 2 character ESC sequences, or alternatively, using 8-bit control characters.

Wikipedia articles explain that the two character ESC sequences are more appropriate for use with UTF-8.

Quoting from ANSI escape code:

The standard says that in 8-bit environments these two-byte sequences can be merged into single C1 control code in the 0x80–0x9F range. However on modern devices those codes are often used for other purposes, such as parts of UTF-8 or for CP-1252 characters, so only the 2-byte sequence is used.

and from C0 and C1 control codes:

The C1 characters in Unicode require 2 bytes to be encoded in UTF-8 (for instance CSI at U+009B is encoded as the bytes 0xC2, 0x9B in UTF-8). Thus the corresponding control functions are more commonly accessed using the equivalent two byte escape sequence intended for use with systems that have only 7-bit bytes.

Are there any command-line tools can be used to directly convert 8-bit C1 control characters (as specified by ECMA-48) into two character ESC sequences?

My best attempt so far has been to try and use iconv:

$ printf $(echo -en "\x9b") | iconv --from-code=ANSI_X3.4 --to-code=UTF-8 | od -t x1
iconv: illegal input sequence at position 0

For debugging purposes I'm using od -t x1 to render the result back into hexadecimal. The result I'm hoping to get would be the same as the result of running:

$ printf $(echo -en "\x27[") | od -t x1
0000000 27 5b
0000002

In other words, does there exist a command-line tool where you can pipe in a C1 control character like \x9b and get back an escape sequence like \x27[?

EDIT: Or as egmont rightly suggests, more appropriately, an interactive tool rather than something you pipe into.

Could you please give some more context of the overall problem? Where does the data come from? Is it a legacy application that you cannot modify? Does it handle non-ASCII characters? If so, in what encoding? Why do you need to convert from C1 to C0? Does this app not work in some terminal emulator? Are you really looking for a command-line tool that processes "offline" data, or do you need runtime conversion (as e.g. `luit` would do if it had such an option)? What is actually the problem you're trying to fix? — egmont, Jul 24 '18 at 09:23
@egmont luit seems promising. I'll report back if it works for my purposes. I think it would be worth turning into an answer in that case. I'm working on some terminal automation software that uses a pseudo-terminal to connect to a remote machine with legacy software (A little hard to get into the details). Basically looking for a shortcut for converting C1 character set separately so as to stick to modern ESC codes instead. — Rehno Lindeque, Jul 24 '18 at 13:15
For the person down voting: I spent a huge amount of time on google reading and searching for the right tool. I also wrote a section motivating why conversion from C1 characters to ESC sequences is desirable (incompatibility with UTF-8). Please let me know if there's something I can improve thanks! — Rehno Lindeque, Jul 24 '18 at 13:39
By the way, I don't need to convert from C1 to C0. Rather the conversion is from the 8-bit C1 character set (single characters) to the equivalent 2 character escape sequence that was originally intended for 7-bit environments but is now more common due to the fact that the 8-bit C1 character set overlaps with UTF-8. I hope that helps to clarify a little bit. — Rehno Lindeque, Jul 24 '18 at 13:54
Re your latest comment: in that case all you need is an ISO-8859-1 (a.k.a. Latin-1) -> UTF-8 charset conversion, possibly done by `luit`. — egmont, Jul 24 '18 at 14:15
Note that not all terminal emulators support C1 in UTF-8, see e.g. the beginning of http://invisible-island.net/xterm/ctlseqs/ctlseqs.html for why xterm doesn't. — egmont, Jul 24 '18 at 14:16
If you still needed a C1->C0 conversion, I'd probably look at patching `luit`. Any command-line tool like `iconv` or `sed` has the disadvantage that they don't appear as terminals towards the apps that generate the output (the input to these tools); plus they do line-buffering or 4kB-buffering on their output. Instead you'd need something that behaves like a terminal, filters the stream without buffering, and forwards other terminal-related things (e.g. window size) transparently. It's cumbersome to get these right. If you patch luit, you might even contribute your changes upstream. — egmont, Jul 24 '18 at 14:23
I'm starting to come to the conclusion that neither `iconv` nor `luit` support the C1 characters. Rather than being part of ISO 8859-1, they appear to be [undefined in the code page layout](https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_page_layout). Meanwhile, `luit` man page says: "Selecting alternate sets of control characters is not supported and will never be." — Rehno Lindeque, Jul 26 '18 at 04:13
On the other hand the [ISO 8859-1 example in luit](https://invisible-island.net/luit/luit-figures.html#iso_charsets) states: "Note that 0x80-0x9f are "n/a" (not available) since the ISO encoding reserves these for C1 controls." — Rehno Lindeque, Jul 26 '18 at 04:14
This has been what my main attempts have looked like, for iconv: `printf "\n\x9B\n" | iconv --from-code=ISO8859-1 --to-code=UTF-8 | od -t x1` and for luit: `printf "\n\x9B\n" | luit -c -encoding ISO-8859-1 | od -t x1`. — Rehno Lindeque, Jul 26 '18 at 04:15
Sorry to hear that `luit` didn't work out as I expected. Maybe patching it to handle these codepoints specially is the simplest way to go. For the conversion only, you could easily build up a `sed` command line that translates all possible C1 bytes individually, but that would suffer from the buffering issues, that's not suitable for a runtime solution. — egmont, Jul 26 '18 at 08:43
Thanks for the help in any case @egmont, I appreciate it. Not to worry, I have alternative ideas, a standard tool was just my first preference. (Btw, wrt buffering issues, I suppose `sed -u` or `unbuffer -p` might be options.) — Rehno Lindeque, Jul 26 '18 at 15:48
Let us know whenever you manage to come up with something. Good luck! — egmont, Jul 26 '18 at 20:44

Tools for converting 8-bit C1 control characters to ESC sequences?

0 Answers0