Remove Control Return and merge lines in one text file and limit number of characters

Question

I wan to remove Control Return and merge lines in one text file and limit number of characters

input.txt containing:

comment 1
comment 2 
...
comment n

output.txt should one strings:

comment 1 comment 2 ... commnet n

BUT the ouput.txt should be limited to i.e. 32 of characters:

comment 1 comment 2 comment 3 co

Can I use sed, awk tr or somthing else?

By `32 characters` you mean each line should not be more than `32 characters, right? — George Udosen, Sep 10 '17 at 19:47
Do you really mean *carriage return* or rather *newline*, which is the usual line break in the unix world? — Philippos, Sep 19 '17 at 05:49

David Foerster · Answer 1 · 2017-09-10T11:58:39.230

1

head -c 32 input.txt | tr '\n' ' ' > output.txt

head -c 32 discards all but the first 32 bytes.
tr '\n' ' ' replaces all newline characters with space characters.

If you want to limit characters instead of bytes in case of multi-byte character encodings you can use grep instead:

tr '\n' ' ' < input.txt | grep -oEe '^.{,32}' > output.txt

edited Sep 10 '17 at 11:58

answered Sep 10 '17 at 11:50

David Foerster

35,754
55
92
145

score 0 · Answer 2 · answered Sep 10 '17 at 11:49

0

Awk shall be fine. One way is:

$ echo -n "comment 1\rcomment 2\r...\rcomment n\r" > input.txt
$ cat input.txt | awk -v FS="" -v RS="" '{for (i=1;i<=32;i++) printf ($i == "\r")? "" : $i}' > output.txt
$ cat output.txt 
comment 1comment 2...comment

Explanation: by default awk processes input line-by-line, with single line called record; every line processed column-by-column, with single column called field. Every field is referred by variables starting with 1, e.g. $1, $2, $3…

So you change the default behavior by setting Field Separator to "", causing awk to process stuff character-by-character. Then you set Record Separator to "" so you can refer to characters of all text at once (i.e. without writing a code to handle stuff line-by-line).

Finally, you can easily operate on characters, so you loop over the fields (i.e. characters), and print only when the character is not a carriage return.

answered Sep 10 '17 at 11:49

Hi-Angel

3,612
1
29
35

Why all these carriage-return characters (`\r`) in the input? The escape sequence for newline characters is `\n`. – David Foerster Sep 10 '17 at 11:51
@DavidFoerster OP asked for carriage return, idk why. – Hi-Angel Sep 10 '17 at 11:52
1

Hmm… you're right. But I think they actually meant line break/newline characters. – David Foerster Sep 10 '17 at 11:52
@DavidFoerster well, `\r` is easy to replace with `\n`, so it's not a big deal. But FTR, my original answer have used `\n` ☺ But then I noticed OP's "Control Return" with first letters suspiciously similar to CR, and quickly replaced it. This edit is not saved because I did it within the 5 minutes timeout. – Hi-Angel Sep 10 '17 at 12:24

score 0 · Answer 3 · answered Sep 10 '17 at 19:57

0

tr '\n' ' ' < in.txt | cut -c -32

tr '\n' ' ': remove new lines from input text
cut -c -32: limit the output to 32 characters

answered Sep 10 '17 at 19:57

George Udosen

35,970
13
99
121

Remove Control Return and merge lines in one text file and limit number of characters

3 Answers3