0

This is a strange question. I want to copy some large files from one harddrive to another on the same machine (on Linux from the command line).

The problem is that makes everything very slow.

I don't care though, to make the copying much slower and less HDD intensive. Let the machine take it time with it...

Is there a way to make the copy slower, but the resources on the machine more available?

jerrro
  • 1
  • I am no Linux expert, but it looks to me like the [`nice`](https://en.wikipedia.org/wiki/Nice_(Unix)) command could be useful here. – Andrew Morton Oct 01 '21 at 16:09
  • Instead of trying to do this while still working on the machine, I would recommend to just start the process in the evening and let it run over night. Then you don’t have to mess with drive speeds at all, and in addition you will not have to deal with slow speeds while using it the next day. – DarkDiamond Oct 01 '21 at 16:21
  • @DarkDiamond I think it will take more than a few days to copy it all, it is in the order of terrabytes. – jerrro Oct 01 '21 at 17:07
  • Well, then you should do a quick calculation how much you can transfer in one night, which will probably be something like 600 to 800 Gb, and select this much for each night. Doing so you will still have the day without problems, and run the transfer during the nights. – DarkDiamond Oct 01 '21 at 22:23

3 Answers3

2

To run a command with lower scheduling priority, using less resources, you could use the nice command:

The syntax is:

nice [OPTION] [COMMAND [ARG]...] 

where niceness ranges from -20 (most favorable scheduling) to 19 (least favorable) using the following parameter:

-n, --adjustment=N
    add integer N to the niceness (default 10) 

Instead of launching the program with the default priority, you can use the nice command to launch the process with a specific priority. For example, test.pl is launched with a "nice" value of 10 instead of the default 0 (this is a hyphen and not minus):

$ nice -10 perl test.pl

This will make the perl process get less scheduling slices from the kernel, thus getting less resources and leaving more for other processes.

For more information see A brief guide to priority and nice values in the linux ecosystem.


As regarding I/O scheduling, nice also has influence. Wikipedia Completely fair queueing makes this remark:

using nice(1) also modifies I/O priorities somewhat

For more information, see the article Linux I/O Schedulers.

You could also directly set the I/O priority of a process by using ionice(1):

ionice - set or get process I/O scheduling class and priority

The I/O scheduling and priority can be drastically modified, up to the lowest:

Idle
    A program running with idle I/O priority will only get disk
    time when no other program has asked for disk I/O for a
    defined grace period. The impact of an idle I/O process on
    normal system activity should be zero. This scheduling class
    does not take a priority argument. Presently, this scheduling
    class is permitted for an ordinary user (since kernel
    2.6.25).
harrymc
  • 455,459
  • 31
  • 526
  • 924
  • 2
    This will lessen the impact on the CPU but the system is probably most impacted by the I/O transfers and interrupts. These are not affected by the `nice` command. – doneal24 Oct 01 '21 at 18:16
  • @doneal24: Limiting CPU usage should impact everything, as the I/O is initiated by the CPU. Less CPU slices means less time to issue I/O requests. Finding which `nice` value will be enough to have an impact is by experiment. – harrymc Oct 01 '21 at 18:21
  • If you are in an io wait state, it really doesn't matter if the CPU is available or not. – doneal24 Oct 01 '21 at 18:24
  • @doneal24: How does that relate to my comment? – harrymc Oct 01 '21 at 18:25
  • You are I/O bound. Optimizing for CPU does not address the bottleneck. The CPU issues a read or write request. Until that request is satisfied, the CPU cannot proceed on the thread. Telling that thread that is is lower priority doesn't matter since is cannot do anything no matter what the priority. – doneal24 Oct 01 '21 at 18:31
  • @doneal24: My last word|: You're missing the point - if the distance between CPU slices is longer than one I/O request, then I/O is slowed down. – harrymc Oct 01 '21 at 18:36
  • Typically even a slow CPU is so fast that it can handle I/O operations of multiple HDDs without using up the CPU resources, therefore CPU scheduling won't have much effect for slowing down pure I/O bound data transmissions. The only way would be if the OS has an dedicated I/O scheduler with an I/O nice command. Alternatively the copy process can periodically sleep to allow other processes to get I/O time but that would slow down the whole copy process. – Robert Oct 01 '21 at 22:37
  • This article is interesting. https://www.admin-magazine.com/HPC/Articles/Linux-I-O-Schedulers I'm a little surprised but it does appear that thread schedulers do consider I/O when granting execution time. – Frank Thomas Oct 01 '21 at 23:31
1

A disk drive is interrupt driven and will just work to keep up with the data.

So the answer to your question is that you cannot slow down the copy.

If you truly need more resources when copying, then replace both source and target drives with fast SSD drives. That will solve your issue.

I copy 200 GB of machines from one computer to the other monthly or quarterly. Both machines have fast SSD drives.

There is no (practical) slowdown on either machine during the (near) hour copy.

I had the resource issue you mention with slow hard drives (XP days) and the solution then was to do the large copy when the machines were not needed otherwise.

John
  • 46,167
  • 4
  • 33
  • 54
0

I don't care though, to make the copying much slower and less HDD intensive. Let the machine take it time with it...

A manual solution is to limit the transfer with pv. For a single file a basic command can be like:

pv -L 10M /source/file > /destination/file

Note this simple command won't preserve attributes (like cp -p or GNU cp -a would) and cannot copy file hierarchies. Therefore for anything more demanding use tar. Use the fact you can pipe tar to tar like this:

(cd /source && tar -c dirA/ dirB/ file1 file2) | (cd /destination && tar -x)

and it's like cp -R. It's useful for copying to multiple destinations or via SSH, netcat, any pipe. To solve your problem we place pv between tars:

(cd /source && tar -c dirA/ dirB/ file1 file2) | pv -L 10M | (cd /destination && tar -x)

You can use tar -c * (but keep in mind * matches non-dot-files); you can use some advanced options of tar. Don't use compression, you want to throttle the uncompressed stream.

Now the best part: with pv -R (invoked aside) you can change the settings of a running pv. This way you can adjust the throttle anytime; you just need to know the PID of your limiting pv. Use pidof pv to find the PID; or maybe modify the main command in the first place, so the middle part is:

… | pv -P /dev/tty -L 10M | …

and pv itself will print its PID to the terminal. See man 1 pv for details.

Suppose the PID is 2276. To change the throttle run pv -R 2276 -L 100M (or 100K, or 1G). Tweak it, find the setting that's best for you. Change anytime.

This answer can be combined with the nice+ionice idea. I believe renice (not nice) and ionice -p will allow you to renice already running processes.

Kamil Maciorowski
  • 69,815
  • 22
  • 136
  • 202