41

I am trying to write a shell script. The idea is to select a single line at random from text file and display it as an Ubuntu desktop notification.

But I want different lines to be selected each time I execute the script. Is there any solution to do this? I don't want the entire script. Just that simple thing only.

Eliah Kagan
  • 116,445
  • 54
  • 318
  • 493
Anandu M Das
  • 2,145
  • 11
  • 28
  • 40

8 Answers8

60

You can use shuf utility to print random lines from file

$ shuf -n 1 filename

-n : number of lines to print

Examples:

$ shuf -n 1 /etc/passwd

git:x:998:998:git daemon user:/:/bin/bash

$ shuf -n 2 /etc/passwd

avahi:x:84:84:avahi:/:/bin/false
daemon:x:2:2:daemon:/sbin:/bin/false
αғsнιη
  • 35,092
  • 41
  • 129
  • 192
aneeshep
  • 29,975
  • 14
  • 64
  • 77
  • But by using this, I have to change the value of n manually right? I want that shell to automatically chose another line in random. Not exactly needed to be in random. But some other line. – Anandu M Das Sep 18 '14 at 13:25
  • 4
    @AnanduMDas No you don't have to `n` denotes the number of lines to print. (ie whether you want only one line or two line). Not the line number(ie first line 2nd line). – aneeshep Sep 18 '14 at 13:29
  • @AnanduMDas: I have added some examples to my answer. Hope its clear now. – aneeshep Sep 18 '14 at 13:32
  • 1
    Thank u its clear now :) I also found another algorithm,its like, store the current time(second only, by `date +%S` ) into a variable x, and then select that xth line using the `head` and `tail` commands from the text file. Anyway your method is more easier. Thanks – Anandu M Das Sep 18 '14 at 14:07
  • 2
    +1: `shuf` is in coreutils so it is available by default. Note: it loads the input file into memory. [There is an efficient algorithm that doesn't require it](http://askubuntu.com/a/527778/3712). – jfs Sep 24 '14 at 06:51
18

You can also use sort command to get random line from the file.

sort -R filename | head -n1
Seth
  • 57,282
  • 43
  • 144
  • 200
g_p
  • 18,154
  • 6
  • 56
  • 69
  • note: `sort -R` produces different result than [`shuf -n1`](http://askubuntu.com/a/525601/3712) or [`select-random`](http://askubuntu.com/a/527778/3712) if there are duplicate lines in the input. See [@EliahKagan's comment](http://askubuntu.com/questions/525599/how-to-display-a-random-line-from-a-text-file/525624#comment719930_525624). – jfs Sep 24 '14 at 16:37
  • I think that could be solved with `sort -uR` – ibitebyt3s Apr 22 '20 at 21:26
  • Works on Mac too (where `shuf` is not installed by default) – crmpicco Aug 07 '20 at 02:35
9

Just for fun, here is a pure bash solution which doesn't use shuf, sort, wc, sed, head, tail or any other external tools.

The only advantage over the shuf variant is that it's slightly faster, since it's pure bash. On my machine, for a file of 1000 lines the shuf variant takes about 0.1 seconds, while the following script takes about 0.01 seconds ;) So while shuf is the easiest and shortest variant, this is faster.

In all honesty I would still go for the shuf solution, unless high efficiency is an important concern.

#!/bin/bash

FILE=file.txt

# get line count for $FILE (simulate 'wc -l')
lc=0
while read -r line; do
 ((lc++))
done < $FILE

# get a random number between 1 and $lc
rnd=$RANDOM
let "rnd %= $lc"
((rnd++))

# traverse file and find line number $rnd
i=0
while read -r line; do
 ((i++))
 [ $i -eq $rnd ] && break
done < $FILE

# output random line
printf '%s\n' "$line"
Malte Skoruppa
  • 12,856
  • 5
  • 56
  • 65
  • @EliahKagan Thanks for the suggestions and good points. I'll admit there are quite a few corner cases I hadn't really given too much thought to. I wrote this really more for the fun of it. Using `shuf` is much better anyway. Thinking of it, I don't believe that pure bash is actually more efficient than using `shuf`, as I previously wrote. There may be the tiniest (constant) overhead when firing up an external tool, but then it will run mach faster than interpreted bash. So `shuf` certainly scales better. So let's say the script serves an educational purpose: It's nice to see it can be done ;) – Malte Skoruppa Sep 18 '14 at 20:35
  • GNU/Linux/Un*x has a lot of very well road-tested wheels I wouldn't want to re-invent, not unless it was a purely academic exercise. The "shell" was intended to be used to assemble lots of little existing parts that could be (re-)assembled in various ways via input/output & plenty o' options. Anything else is bad form, unless it's for sport (e.g, http://codegolf.stackexchange.com/tour), in which case, play on...! – michael Sep 19 '14 at 06:51
  • @michael_n Yes, I fully agree -- indeed, that's one of the things I tried to bring across in my last comment. Actually, I wrote pretty much the same: "for the fun of it" meaning "it's for sport", and "the script serves an educational purpose" meaning "it was a purely academic exercise" ;-) – Malte Skoruppa Sep 19 '14 at 08:40
  • 2
    @michael_n Though a "pure bash" way is mainly useful for teaching and to modify for other tasks, this is a more reasonable "for real" implementation than it may seem. Bash is widely available, but `shuf`'s GNU Coreutils – specific (e.g., not in FreeBSD 10.0). `sort -R` is portable, but solves a different (related) problem: strings appearing as multiple lines have probability equal to those appearing only once. (Of course, `wc` and other utilities could still be used.) I think the main limitation here is this never picks anything after the 32768th line (and becomes less random somewhat sooner). – Eliah Kagan Sep 19 '14 at 14:59
  • Note: [`$RANDOM % n` may skew your random distribution even if `$RANDOM` itself is ok](http://stackoverflow.com/questions/1194882/generate-random-number#comment36748687_1195035) – jfs Sep 24 '14 at 07:03
  • @EliahKagan: How is `sort -R` different? E.g., even if `a` lines occur twice as much as any other line: `for i in \`seq 1000000\`; do echo -e 'a\nc\na\nb\n' | sort -R | head -n1 ; done | python -c'from collections import Counter; import sys; print(Counter(sys.stdin))'` the result does *not* show skew: `Counter({'\n': 250559, 'a\n': 250100, 'c\n': 249726, 'b\n': 249615})` – jfs Sep 24 '14 at 12:05
  • EliahKagan and @J.F.Sebastian, you two got me thinking about how to implement a good PRNG in bash. I posted another question here: http://askubuntu.com/questions/527900/how-to-efficiently-generate-large-uniformly-distributed-random-integers-in-bas, maybe you have a good idea? – Malte Skoruppa Sep 24 '14 at 13:23
  • 2
    Malte Skoruppa: I see you've moved [the bash PRNG question to U&L](http://unix.stackexchange.com/q/157250). Cool. Hint: `$((RANDOM<<15|RANDOM))` is in 0..2^30-1. @J.F.Sebastian It's `shuf`, not `sort -R`, that skews toward more frequent inputs. Put `shuf -n 1` in place of `sort -R | head -n1` and compare. (Btw 10^3 iterations is quicker than 10^6 and still quite enough to show the difference.) See also [a rougher, more visual demo](http://paste.ubuntu.com/8418343/) and [this bit of silliness showing it works on big inputs where all strings are high frequency](http://paste.ubuntu.com/8418402/). – Eliah Kagan Sep 24 '14 at 14:35
  • @EliahKagan Thank you for adding the U&L link. That `$((RANDOM<<15|RANDOM))` is a neat idea. It overcomes the "small range" limitation of normal `$RANDOM`. It does not overcome (by itself) the "bias" limitation when modulo'ed. Maybe by using a loop instead of a modulo? At any rate, if you want to post that as an answer to my question on U&L, it would complement the other ideas there nicely :) – Malte Skoruppa Sep 24 '14 at 15:30
  • @EliahKagan: `RANDOM`-based prng for 0..2^32-1 fails [dieharder](http://manpages.ubuntu.com/manpages/trusty/man1/dieharder.1.html) tests: `while echo $(( RANDOM << 17 | RANDOM << 2 | RANDOM >> 13 )); do :; done | perl -ne 'print pack "I>"' | dieharder -a -g 200`. I also tried to use lower bits `(RANDOM & 3)` instead of `(RANDOM >> 13)` (higher bits). I'm not optimistic about `$((RANDOM<<15|RANDOM))` for 0..2^30-1 range. – jfs Sep 26 '14 at 09:10
  • 1
    @J.F.Sebastian In that command, the input to `dieharder` seems to be all zeros. Assuming this is not merely some strange mistake on my part, that certainly would explain why it's not random! Do you get good-looking data if you run `while echo $(( RANDOM << 17 | RANDOM << 2 | RANDOM >> 13 )); do :; done | perl -ne 'print pack "I>"' > out` for a while and then examine the contents of `out` with a hex editor? (Or view it however else you like.) I get all zeros, and `RANDOM` isn't the culprit: I get all zeros when I replace `$(( RANDOM << 17 | RANDOM << 2 | RANDOM >> 13 ))` with `100`, too. – Eliah Kagan Sep 26 '14 at 14:23
  • @EliahKagan: you are right `,$_` is missing at the end in the perl command. I might have checked it with `,$_` and edited it out later. Another way to check for zeros is to pipe to hexdump: `| hd` instead of `> out`. diehard_birthdays test is passed with `while echo $(( RANDOM << 17 | RANDOM << 2 | RANDOM >> 13 )); do :; done | perl -ne 'print pack "I>",$_' | dieharder -a -g 200` command – jfs Sep 26 '14 at 18:21
  • @EliahKagan I implemented a solution in pure bash to generate large random numbers using your idea of concatenating bistrings resulting from repeated invocations of `$RANDOM`. [See my answer at U&L](http://unix.stackexchange.com/a/157837). This solves the limitation you mentioned earlier, that a single invocation of `$RANDOM` never picks anything after the 32768th line. And it's still pure bash :-) – Malte Skoruppa Sep 26 '14 at 23:32
  • @EliahKagan: `$(( RANDOM <<15 | RANDOM ))` (30bit generator) fails the `dieharder` tests for some reason. See [`make test-file` test](https://github.com/zed/test-dieharder-bash-random). – jfs Sep 30 '14 at 12:05
  • @J.F.Sebastian Does the 15-bit test fail too? Also, what happens when you don't explicitly seed `RANDOM`? I'm not sure how `bash` gets its seed when you don't, but I suspect it's--at least sometimes--somewhat robust. Of course, I realize its output should look statistically random for nearly any seed value. – Eliah Kagan Sep 30 '14 at 13:16
  • @EliahKagan: 15-bit fails too. It seems failing dieharder tests accept only 32bit input. – jfs Sep 30 '14 at 19:22
5

Say you have file notifications.txt. We need to count total number of lines, to determine range of random generator:

$ cat notifications.txt | wc -l

Lets write to variable:

$ LINES=$(cat notifications.txt | wc -l)

Now to generate number from 0 to $LINE we will use RANDOM variable.

$ echo $[ $RANDOM % LINES]

Lets write it to variable:

$  R_LINE=$(($RANDOM % LINES))

Now we only need to print this line number:

$ sed -n "${R_LINE}p" notifications.txt

About RANDOM:

   RANDOM Each time this parameter is referenced, a random integer between
          0 and 32767 is generated.  The sequence of random numbers may be
          initialized by assigning a value to RANDOM.  If RANDOM is unset,
          it  loses  its  special  properties,  even if it is subsequently
          reset.

Be sure your file have less then 32767 line numbers. See this if you need bigger random generator that works out of the box.

Example:

$ od -A n -t d -N 3 /dev/urandom | tr -d ' '
c0rp
  • 9,700
  • 2
  • 38
  • 60
  • A stylistic alternative (bash): `LINES=$(wc -l < file.txt); R_LINE=$((RANDOM % LINES)); sed -n "${R_LINE}p" file.txt` – michael Sep 19 '14 at 06:44
  • Note: [`$RANDOM % n` may skew your random distribution even if `$RANDOM` itself is ok](http://stackoverflow.com/questions/1194882/generate-random-number#comment36748687_1195035) – jfs Sep 24 '14 at 07:13
  • e.g., look at the last picture in [Test PRNG using gray bitmap](https://www.wakari.io/sharing/bundle/jfs/Test%20PRNG%20using%20gray%20bitmap) to understand why it is not a good idea to apply `% n` to a random number. – jfs Sep 24 '14 at 07:25
3

Here's a Python script that selects a random line from input files or stdin:

#!/usr/bin/env python
"""Usage: select-random [<file>]..."""
import random

def select_random(iterable, default=None, random=random):
    """Select a random element from iterable.

    Return default if iterable is empty.
    If iterable is a sequence then random.choice() is used for efficiency instead.
    If iterable is an iterator; it is exhausted.
    O(n)-time, O(1)-space algorithm.
    """
    try:
        return random.choice(iterable) # O(1) time and space
    except IndexError: # empty sequence
        return default
    except TypeError: # not a sequence
        return select_random_it(iter(iterable), default, random.randrange)

def select_random_it(iterator, default=None, randrange=random.randrange):
    """Return a random element from iterator.

    Return default if iterator is empty.
    iterator is exhausted.
    O(n)-time, O(1)-space algorithm.
    """
    # from https://stackoverflow.com/a/1456750/4279
    # select 1st item with probability 100% (if input is one item, return it)
    # select 2nd item with probability 50% (or 50% the selection stays the 1st)
    # select 3rd item with probability 33.(3)%
    # select nth item with probability 1/n
    selection = default
    for i, item in enumerate(iterator, start=1):
        if randrange(i) == 0: # random [0..i)
            selection = item
    return selection

if __name__ == "__main__":
    import fileinput
    import sys

    random_line = select_random_it(fileinput.input(), '\n')
    sys.stdout.write(random_line)
    if not random_line.endswith('\n'):
        sys.stdout.write('\n') # always append newline at the end

The algorithm is O(n)-time, O(1)-space. It works for files larger than 32767 lines. It doesn't load input files into memory. It reads each input line exactly once i.e., you can pipe arbitrary large (but finite) content into it. Here's an explanation of the algorithm.

jfs
  • 3,948
  • 1
  • 32
  • 46
1
awk 'BEGIN{srand()}{a[NR]=$0}END{x=int(rand()*NR)+1; print a[x]}' notifications.txt
ufopilot
  • 111
  • 2
1

I'm impressed by the work that Malte Skoruppa and others did, but here is a much simpler "pure bash" way to do it:

IFS=$'\012'
# set field separator to newline only
lines=( $(<test5) )
# slurp entire file into an array
numlines=${#lines[@]}
# count the array elements
num=$(( $RANDOM$RANDOM$RANDOM % numlines ))
# get a (more-or-less) random number within the correct range
line=${lines[$num]}
# select the element corresponding to the random number
echo $line
# display it

As some have noted, $RANDOM is not random. However, the file size limit of 32767 lines is overcome by stringing $RANDOMs together as needed.

Wastrel
  • 201
  • 1
  • 5
0
awk '
BEGIN { srand() }
1/NR >= rand() { line = $0 }
END { print line }'
andrew.46
  • 37,085
  • 25
  • 149
  • 228