2

I have a large text file with a size of more than 30 megabytes. I want to remove all the lines which don't match some specific criteria, e.g. lines that don't have the string 'START'.

What's the easiest way to do this?

Gaff
  • 18,569
  • 15
  • 57
  • 68
Shawn
  • 437
  • 5
  • 16

5 Answers5

3

If the pattern is really that simple, grep -v will work:

grep -v START bigfile.txt > newfile.txt

newfile.txt will have everything from bigfile.txt except lines with "START".

(In case it isn't obvious, this is something you'll do in Terminal or other command line tool)

Doug Harris
  • 27,333
  • 17
  • 78
  • 105
2

The original question asked how to remove the lines that didn't match a pattern. In other words, how to keep the lines that do match the pattern. Thus, no need for -v.

grep START infile.txt > outfile.txt

Note that grep can use regular expressions to do much more powerful pattern matching. The syntax is a bit obtuse though.

Gaff
  • 18,569
  • 15
  • 57
  • 68
whatever
  • 21
  • 1
1

Use GNU sed with the -i argument.

Ignacio Vazquez-Abrams
  • 111,361
  • 10
  • 201
  • 247
  • 2
    to make the answer a bit more .. verbose: "sed -n -e '/START/p' inputfile". and maybe it is a good idea not to use -i while playing around with file altering commands, just in case. – akira Jun 03 '10 at 02:28
1
grep -v START inputfile

should work. grep is standard on both MacOS and Linux/Unix, can be installed on MS Windows.

Option -v is for inverting the match - only output lines that do not contain the pattern (the inverse of the usual grep behaviour).

sleske
  • 22,652
  • 10
  • 69
  • 93
1

For Windows Command Prompt (help find for options):

find /v "START" original_file.txt > new_file.txt

For Linux, OS X, etc. (man grep for options):

grep -v "START" original_file.txt > new_file.txt

For more complicated text matching grep offers a lot more functionality than find. If you are on Windows you can easily find a port of grep or you can use Windows' findstr instead of find.

Mike Fitzpatrick
  • 16,789
  • 4
  • 46
  • 48