6

I'm having some trouble with a specific node. Until I resolve it, I don't want any jobs to run on ii. How can I temporarily take this node out of the nodes "pool"?

Gilles 'SO- stop being evil'
  • 69,786
  • 21
  • 137
  • 178
David B
  • 2,454
  • 7
  • 27
  • 33

4 Answers4

6

To disable:

qmod -d *@node_name

To re-enable:

qmod -e *@node_name
Kevin Panko
  • 7,346
  • 22
  • 44
  • 53
user322498
  • 61
  • 1
  • 2
  • Why is this downvoted? – Albert Jan 26 '15 at 11:12
  • As advice, I had an issue getting wildcard queue names to work. I ran a `qstat -f`, got the queue on the host I wanted to disabled and used that as the argument after **-d** in `qmod -d` – Devin Aug 17 '15 at 16:05
2

If you're running 6.1 or better, here's the best way. Create a new hostgroup called @disabled

qconf -ahgrp @disabled

Create a new resource quota set with

qconf -arqs limit hosts @disabled to slots=0

Now, to disable a host, just add it to the host group

qconf -aattr hostgroup hostlist MYHOST @disabled

To reenable the host, remove it from the host group

qconf -dattr hostgroup hostlist MYHOST @disabled

This process will stop new jobs from being scheduled to the machine and allow the currently running jobs to complete.

Kevin Panko
  • 7,346
  • 22
  • 44
  • 53
  • This does not seem to work. Jobs still get executed on the problematic. What can go wrong here? I can see it was added to @disbaled (using qconf -mhgrp @disabled) and I have enabled the quota set. – David B Dec 04 '10 at 12:16
  • By the way, the resource quota set looks like this: `{ name disabled_hosts description created by me enabled TRUE limit hosts @disabled to slots=0 }` – David B Dec 04 '10 at 12:19
  • By the way, this did work: `{ name disabled_hosts description created by me enabled TRUE limit hosts {my_bad_host} to slots=0 }`, so I guess it has something to do with @disabled. – David B Dec 04 '10 at 12:30
0

gridsuspend - Suspends one or more hosts from executing grid jobs. Example: gridsuspend -s -r "reason comment here" <host_name> 1d

  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 25 '22 at 19:57
0

Without knowing your SGE version I cannot say for certain that this will achieve the desired outcome, however, qconf -de foo will delete the execution host foo. qconf -ae foo will then add the host foo back to the execution list.

Tok
  • 499
  • 4
  • 3