1

I wan to remove Control Return and merge lines in one text file and limit number of characters

input.txt containing:

comment 1
comment 2 
...
comment n 

output.txt should one strings:

comment 1 comment 2 ... commnet n

BUT the ouput.txt should be limited to i.e. 32 of characters:

comment 1 comment 2 comment 3 co

Can I use sed, awk tr or somthing else?

George Udosen
  • 35,970
  • 13
  • 99
  • 121

3 Answers3

1
head -c 32 input.txt | tr '\n' ' ' > output.txt
  • head -c 32 discards all but the first 32 bytes.

  • tr '\n' ' ' replaces all newline characters with space characters.

If you want to limit characters instead of bytes in case of multi-byte character encodings you can use grep instead:

tr '\n' ' ' < input.txt | grep -oEe '^.{,32}' > output.txt
David Foerster
  • 35,754
  • 55
  • 92
  • 145
0

Awk shall be fine. One way is:

$ echo -n "comment 1\rcomment 2\r...\rcomment n\r" > input.txt
$ cat input.txt | awk -v FS="" -v RS="" '{for (i=1;i<=32;i++) printf ($i == "\r")? "" : $i}' > output.txt
$ cat output.txt 
comment 1comment 2...comment 

Explanation: by default awk processes input line-by-line, with single line called record; every line processed column-by-column, with single column called field. Every field is referred by variables starting with 1, e.g. $1, $2, $3…

So you change the default behavior by setting Field Separator to "", causing awk to process stuff character-by-character. Then you set Record Separator to "" so you can refer to characters of all text at once (i.e. without writing a code to handle stuff line-by-line).

Finally, you can easily operate on characters, so you loop over the fields (i.e. characters), and print only when the character is not a carriage return.

Hi-Angel
  • 3,612
  • 1
  • 29
  • 35
  • Why all these carriage-return characters (`\r`) in the input? The escape sequence for newline characters is `\n`. – David Foerster Sep 10 '17 at 11:51
  • @DavidFoerster OP asked for carriage return, idk why. – Hi-Angel Sep 10 '17 at 11:52
  • 1
    Hmm… you're right. But I think they actually meant line break/newline characters. – David Foerster Sep 10 '17 at 11:52
  • @DavidFoerster well, `\r` is easy to replace with `\n`, so it's not a big deal. But FTR, my original answer have used `\n` ☺ But then I noticed OP's "Control Return" with first letters suspiciously similar to CR, and quickly replaced it. This edit is not saved because I did it within the 5 minutes timeout. – Hi-Angel Sep 10 '17 at 12:24
0
tr '\n' ' ' < in.txt | cut -c -32
  • tr '\n' ' ': remove new lines from input text
  • cut -c -32: limit the output to 32 characters
George Udosen
  • 35,970
  • 13
  • 99
  • 121