7

I'm currently writing a bash script that should check if the exact string 329, exists in myfile. I have searched through the web and found some answers, but I can't use -x parameters because I have more numbers than 329, on myfile. And without the -x parameter, I can get the Exists result with 329 too, which I don't want.

I tried;

if grep -xqFe "329," "myfile"; then
    echo -e "Exists"
else
    echo -e "Not Exists"
fi

And the output was;

Not Exists

Inside of myfile;

329, 2, 57

How can I solve this?

muru
  • 193,181
  • 53
  • 473
  • 722
Erikli
  • 419
  • 1
  • 4
  • 11

2 Answers2

12

The -x isn't relevant here. That means (from man grep):

-x, --line-regexp
       Select  only  those  matches that exactly match the whole line.
       For a regular expression pattern, this is  like  parenthesizing
       the pattern and then surrounding it with ^ and $.

So it is only useful if you want to find lines that contain nothing other than the exact string you are looking for. The option you want is -w:

-w, --word-regexp
       Select  only  those  lines  containing  matches that form whole
       words.  The test is that the matching substring must either  be
       at  the  beginning  of  the  line,  or  preceded  by a non-word
       constituent character.  Similarly, it must be either at the end
       of  the  line  or followed by a non-word constituent character.
       Word-constituent  characters  are  letters,  digits,  and   the
       underscore.  This option has no effect if -x is also specified.

That will match if you find your target string as a standalone "word", as a string surrounded by "non-word" characters. You also don't need the -F here, that is only useful if your pattern contains characters with special meanings in regular expressions which you want to find literally (e.g. *), and you don't need -e at all, that would be needed if you wanted to give more than one pattern. So you're looking for:

if grep -wq "329," myfile; then 
    echo "Exists" 
else 
    echo "Does not exist"
fi

If you also want to match when the number is the last one on the line, so it has no , after it, you can use grep -E to enable extended regular expressions and then match either a 329 followed by a comma (329,) or a 329 that is at the end of the line (329$). You can combine those like this:

if grep -Ewq "329(,|$)" myfile; then 
    echo "Exists" 
else 
    echo "Does not exist"
fi
wjandrea
  • 14,109
  • 4
  • 48
  • 98
terdon
  • 98,183
  • 15
  • 197
  • 293
  • Is `-F` not likely to be faster? Though I did read that one (the most common?) grep implementation can detect some fixed-strings and optimize for them – D. Ben Knoble Aug 06 '22 at 23:28
  • 3
    @D.BenKnoble the `-F` would only work for the first example, not the second, and given that there are no special regex characters there and we're just searching for a simple string, I doubt it would be any faster. And indeed, I tested with a 2.6G file with one occurrence of `329,` in the middle of 80 million lines of other stuff. I ran `grep -F 329,` and `grep 329,` ten times each and took the average. They both took 0.33 seconds, no significant difference at all. – terdon Aug 07 '22 at 11:57
  • 1
    @D.BenKnoble you still need to read from disk. That will most likely drown cpu usage. – Thorbjørn Ravn Andersen Aug 07 '22 at 19:35
  • @ThorbjørnRavnAndersen Disk reads is not really a solid argument when it comes to text processing and `grep` … It will most likely almost certainly not drown CPU usage … Please, have a look at speed tests in my answer [here](https://askubuntu.com/a/1420653) where `grep` outperforms other text processing tools even when it reads the same huge file multiple times with even less CPU usage time. – Raffa Aug 07 '22 at 21:01
  • @raffa Interesting. How did you ensure Linux didn't cache `file.dat` in memory? – Thorbjørn Ravn Andersen Aug 08 '22 at 17:06
  • @ThorbjørnRavnAndersen I like the way you think :-) ...To be honest, I didn't care about that as the file used by all utilities `sed`, `grep` and `awk` was the same and I ran the tools in series i.e. `sed` then `grep` then `awk` shuffling that order many times ... So if the file was cached, then it would be cached for all tools ... Those which read it once and others which read it multiple times ... So, reading `file.dat` would be equally available to all tools ... What is left with reading speed neutralized is just how efficient is the tool in processing text ... BTW I had that assumption too – Raffa Aug 08 '22 at 17:21
  • @ThorbjørnRavnAndersen And ... Reading my [previous comment](https://askubuntu.com/questions/1422252/how-can-i-check-if-string-exists-in-file/1422254?noredirect=1#comment2475801_1422254) again ... just now ... I got the impression that it might be a bit on the unfriendly language side ... But, it wasn't meant to be so at all ... English language is not my native neither yours ... And knowledge is neither mine(*I'm still learning and will be for a very long time*) nor yours ... So friendly intentions were the drive ... Just to get that out of the way :-) ... Appreciate your reply. – Raffa Aug 08 '22 at 17:38
  • @Raffa "It will most likely almost certainly not drown CPU usage " - on pre-SSD disks it did. For non-complex regular expressions a good algorithms only need to look at some of the incoming data meaning it can basically process the file as fast as the operating system can supply it (SATA cannot provide more than 600 MB/s). SSD's really make a difference here. – Thorbjørn Ravn Andersen Aug 10 '22 at 11:11
  • @ThorbjørnRavnAndersen I understand your point and it's very valid theoretically ... That is assuming nothing else is involved in the background, but the kernel nowadays takes over everything so that it wont allow bottlenecks to happen, it caches data in RAM and moves readily available memory pages to swap and throttles or even kill over-demanding processes e.g. OOM Killer. Plus those tools like `grep` and [`sort`](https://askubuntu.com/a/1403115/968501) are decades old and have been refined for performance so they are unlikely to choke on large files or cause a bottleneck when used right. – Raffa Aug 10 '22 at 12:59
-1

Another alternative might be :

if cat myfile | tr "," "\n" | grep -xqF "329"; then
    echo -e "Exists"
else
    echo -e "Not Exists"
fi

Regards

wjandrea
  • 14,109
  • 4
  • 48
  • 98