1

I'm having problems getting SLURM (for job scheduling) to work with a MySQL database. I was using this as a reference, but perhaps I misunderstood something in it. If someone can let me know what I've missed, that would be great...

This is SLURM 21.08 on Ubuntu 22.10. I'm using MySQL 8.0.32 .

I previously had configured SLURM working with completion and accounting being stored in a file. And it seemed to be working fine; the controller was up and I ran one or two jobs ok.

Then, I switched to MySQL. My /etc/slurm/slurm.conf had these values updated:

 Job Completion Logging | MySQL
      JobCompLoc | slurm_complete_db
      JobCompHost | localhost
      JobCompPort | <blank>
      JobCompUser | slurm
      JobCompPass | ...some password...
 Job Accounting Storage | SlurmDBD
      AccountingStorageLoc | slurm_acct_db
      AccountingStorageHost | localhost
      AccountingStoragePort | <blank>
      AccountingStorageUser | slurm
      AccountingStoragePass | ...
      AccountingStoreFlags | job_script,job_env

And in /etc/slurm/slurmdbd.conf:

 AuthType=auth/munge
 DbdHost=xps8930
 DebugLevel=info
 StorageHost=xps8930
 StorageLoc=slurm_acct_db
 StoragePass=...
 StorageType=accounting_storage/mysql
 StorageUser=slurm
 LogFile=/var/log/slurm/slurmdbd.log
 PidFile=/run/slurmdbd.pid
 SlurmUser=slurm

I've created two MySQL databases, a user called "slurm", and grant privileges as follows:

CREATE DATABASE slurm_complete_db DEFAULT CHARACTER SET utf8 COLLATE
utf8_unicode_ci ;
CREATE DATABASE slurm_acct_db DEFAULT CHARACTER SET utf8 COLLATE
utf8_unicode_ci ;
CREATE USER 'slurm'@'%' IDENTIFIED WITH caching_sha2_password BY '' ;
GRANT ALL ON slurm_complete_db.* TO 'slurm'@'%';
GRANT ALL ON slurm_acct_db.* TO 'slurm'@'%';

I confirmed using the "show engines" command that InnoDB support is enabled.

Since the databases are empty, I believe my next step ought to be configuring the database. In slurm.conf, I called my ClusterName "personal". So, I did this:

$ sacctmgr add cluster personal
sacctmgr: error: slurm_persist_conn_open_without_init: failed to open
persistent connection to host:localhost:6819: Connection refused
sacctmgr: error: Sending PersistInit msg: Connection refused

slurm and slurmdbd are running (SLURM and MySQL are on the same computer):

$ ps -aef | grep slurm
root        1407       1  0 09:42 ?        00:00:08 /usr/sbin/slurmd -D -s
root        1857       1  0 09:43 ?        00:00:03 /usr/sbin/slurmdbd -D -s

In /var/log/slurm/slurmdbd.log, I see this:

[2023-01-26T18:06:02.541] error: mysql_real_connect failed: 2003 Can't
connect to MySQL server on 'xps8930:3306' (111)
[2023-01-26T18:06:02.541] error: The database must be up when starting
the MYSQL plugin.  Trying again in 5 seconds.

In /var/log/slurm/slurmctld.log, I have this:

[2023-01-26T09:42:33.264] error: Configured MailProg is invalid
[2023-01-26T09:42:33.350] slurmctld version 21.08.5 started on cluster personal
[2023-01-26T09:42:36.121] error: slurm_persist_conn_open_without_init: failed to open persistent connection to host:localhost:6819:
Connection refused
[2023-01-26T09:42:36.121] error: Sending PersistInit msg: Connection refused
[2023-01-26T09:42:36.153] accounting_storage/slurmdbd:  clusteracct_storage_p_register_ctld: Registering slurmctld at port 6817 with slurmdbd
[2023-01-26T09:42:36.153] error: Sending PersistInit msg: Connection refused
[2023-01-26T09:42:36.154] error: Sending PersistInit msg: Connection refused
[2023-01-26T09:42:37.456] No memory enforcing mechanism configured.
[2023-01-26T09:42:39.924] error: mysql_real_connect failed: 2002 Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
[2023-01-26T09:42:39.924] fatal: You haven't inited this storage yet.

I'm not sure what I should do next or what steps I'm missing. I guess between slurmdbd and slurmctld, I should focus on slurmdbd first? Once it is working, then either slurmctld should come up and/or I can try to get it working.

Sorry for the long post! Any advice would be appreciated!

PS: The command munge -n | unmunge was successful.

Ray
  • 1,987
  • 17
  • 27
  • 1
    Apparently, MySQL was up when you created the databases, but the `/var/log/slurm/slurmdbd.log` says it's down. What does `systemctl status mysql` say? – Jos Jan 31 '23 at 17:49
  • @Jos Thanks for the suggestion! So, `systemctl status mysql` says it is active and I can confirm I can use `mysql -u root -p` to log in on the command-line. Seems like the problem is with`slurmdbd` -- like I missed something... – Ray Feb 03 '23 at 15:48
  • If Slurm keeps saying MySql is down, it is either trying to reach the wrong server (xps8930) or the wrong port (3306). – Jos Feb 04 '23 at 10:02
  • One thing worth checking is which interface mysqld binds to - I think by default it is 127.0.0.1, and I see you are trying to connect to xps8930, which I guess is the address of the NIC? If so, then that's why it can't find your MySQL – j4nd3r53n Jul 06 '23 at 12:37

0 Answers0