0

I've seen a lot of this question, with no clear and ideal answer, so I am asking it myself.

Ever since Ubuntu 18.04.3 (I think it was .3 that started it) through at least 20.04.1, I've been seeing a 2-minute delay on startup, waiting for the network to come up. IPv6 is not configured on our network, and the DHCP servers are Windows-based, with 2 replicated servers, and IP helpers on the VLANs - although my Ubuntu servers are on the same VLAN as the DHCP servers (all my VMs are on the same VLAN).

The netplan tends to get ignored, as dhcp insists on using the GUID as an identifier, even though the interface.yaml in /etc/netplan/ says to use the MAC. Further, DHCP doesn't even try to get an address until 10-15 minutes after startup, unless I mark the interface as optional. However, invoking dhclient works perfectly and immediately.

From what I've read, this delay seems to be an IPv6 addressing problem. So, I have disabled ipv6 in grub. I marked the interface as optional, and dhcp6 to false, in the netplan yaml file, and that solves the 2-minute start-up delay, but the dhcp server returns an unexpected IP, because the dhcp-identifier line from the netplan is being ignored. Instead of presenting the MAC address, Ubuntu presents the GUID.

Here is my 01-netcfg.yaml:

network:
  version: 2
  renderer: networkd
  ethernets:
    ens160:
      dhcp4: true
      dhcp6: false
      dhcp-identifier: mac
      optional: true
      link-local: [ ]

As soon as I generate and apply the netplan, I get disconnected, and get the wrong IP address. If I manually invoke sudo dhclient -r; sudo dhclient, I get the expected IPv4 address immediately.

Ergo, the only way I've been able to make the server get the correct address without delay, is to set the interface as as stated, then run a script on startup to manually run dhclient.

This seems like a rather klugey way to do things, because it seems to be going around the problem(s) rather than fixing it/them.

Can someone please tell me what is happening, and more importantly, how to fix it?

Thanks in advance!

EDIT with new information: If I let the VM just sit, it finally picks up an IP about 8.5 minutes after the login prompt. I'm guessing that is 10 minutes after the interface comes up.

Also, I read in another thread about a similar issue, that having multiple DHCP servers on the segment caused the problem for another user. We do have 2 Windows 2019 DHCP servers on this segment, doing load balancing (scope replication).

EDIT sudo netplan --debug generate outputs the following:

** (generate:3137): DEBUG: 14:26:01.499: Processing input file /etc/netplan/00-installer-config.yaml
..
** (generate:3137): DEBUG: 14:26:01.501: starting new processing pass
** (generate:3137): DEBUG: 14:26:01.501: We have some netdefs, pass them through a final round of validation
** (generate:3137): DEBUG: 14:26:01.503: ens160: setting default backend to 1
** (generate:3137): DEBUG: 14:26:01.504: Configuration is valid
** (generate:3137): DEBUG: 14:26:01.505: Generating output files..
** (generate:3137): DEBUG: 14:26:01.505: openvswitch: definition ens160 is not for us(backend 1)
** (generate:3137): DEBUG: 14:26:01.506: NetworkManager: definition ens160 is not for us(backend 1)
(generate:3137): GLib-DEBUG: 14:26:01.507: posix_spawn avoided (fd close requested)
(generate:3137): GLib-DEBUG: 14:26:01.511: posix_spawn avoided (fd close requested)

Here is the output of sudo journalctl | grep ens160

Nov 08 08:45:30 graylog systemd-networkd[846]: ens160: DHCP lease lost
Nov 08 08:45:44 graylog kernel: vmxnet3 0000:03:00.0 ens160: renamed from eth0
Nov 08 08:45:52 graylog kernel: vmxnet3 0000:03:00.0 ens160: intr type 3, mode 0, 5 vectors allocated
Nov 08 08:45:52 graylog kernel: vmxnet3 0000:03:00.0 ens160: NIC Link is Up 10000 Mbps
Nov 08 08:45:52 graylog systemd-networkd[843]: ens160: Link UP
Nov 08 08:45:52 graylog systemd-networkd[843]: ens160: Gained carrier
Nov 08 08:45:53 graylog cloud-init[851]: ci-info: | ens160 | True |     .     |     .     |   .   | 00:50:56:96:c6:b7 |
Nov 08 08:45:53 graylog cloud-init[851]: ci-info: |   2   |  multicast  |    ::   |   ens160  |   U   |
Nov 08 08:54:29 graylog systemd-networkd[843]: ens160: DHCPv4 address 10.83.1.5/24 via 10.83.1.1
Jay Duff
  • 41
  • 8
  • Is this a Ubuntu Desktop or Server installation? – heynnema Dec 09 '20 at 21:56
  • @hetnnema - It's Ubuntu Server. 11 months later, and it still happens on my new VM installs. I think there is a bug in the installer. [Here is the report](https://bugs.launchpad.net/ubuntu/+source/netplan.io/+bug/1881832) – Jay Duff Oct 19 '21 at 20:42
  • Remove dhcp6 and dhcp-identifier and link-local from your .yaml file, then do `sudo netplan generate`, `sudo netplan apply`, and `reboot`. Report back (sooner than 11 months). Start comments to me with @heynnema (note the spelling) or I'll miss them. – heynnema Oct 19 '21 at 22:19
  • @heynnema holy smokes - that got it! Thanks! – Jay Duff Nov 05 '21 at 14:34
  • @heynnema - Got excited too soon, unfortunately. After another reboot, it's back to not getting a DHCP address. I saw someone comment that they had a similar problem because they have 2 DHCP servers on the switch. I also have 2 DHCP servers - they are Windows 2019 servers doing scope replication. I wonder if there is a bug that prevents netplan from getting an address from split DHCP servers... – Jay Duff Nov 05 '21 at 14:40
  • I suspect that one of your DHCP servers is broken, you have them configured incorrectly, or you have them cabled into your network/switch incorrectly. Please describe your network cabling, including the server, switch, DHCP servers, etc. – heynnema Nov 05 '21 at 14:53
  • @heynnema - I have literally thousands of Macs and Windows computers (virtual and physical), as well as iPads and Chromebooks using these servers. I also have a CentOS (vCenter virtual appliance) and a Debian VM (homer/sipcapture) which get their addresses immediately. I even have 3 physical servers, running Ubuntu 20.04.3 on bare metal, that work. I guarantee the DHCPs are both good to go. The only devices that have issues are Ubuntu virtual machines. Also, this worked prior to 18.04.3. Thanks for trying though - I appreciate the effort! – Jay Duff Nov 05 '21 at 15:15
  • OK. Are your Ubuntu virtual machines via VirtualBox, or something else? – heynnema Nov 05 '21 at 15:53
  • @heynnema - They are vSphere VMs (using vCenter). – Jay Duff Nov 05 '21 at 20:01
  • Sorry, but I don't know anything about vSphere VMs. – heynnema Nov 05 '21 at 20:12
  • @heynnema the Ethernet interfaces do have odd nomenclature. Specifically, what one would expect to be eth0 actually is ens160, for example. I saw a bug in subiquity about that causing a problem, but I think it is resolved. Also, I believe the workaround was to simply change the interface name in the netplan. I did that already. Anyway - thanks again for trying to help out! *Respect* – Jay Duff Nov 08 '21 at 14:18
  • Thanks for the update. The "new" device naming is **supposed** to make it easier, as it's based on slot # and port #, but it's really just a PITA. So just renaming the interface fixed it for you? – heynnema Nov 08 '21 at 14:23
  • @heynnema Unfortunately, no. That was the fix for the old bug, but it didn't help me. I'm looking through some of the other answers you gave on other questions with similar issues. – Jay Duff Nov 08 '21 at 14:25
  • I'd look more into fixes/bugs with the vSphere VM. Your .yaml looks fine. – heynnema Nov 08 '21 at 14:29
  • @heynnema - I was thinking that too, but everything worked fine under 18.04.2 and before. Also, I have about 50 VMs, most running some flavor of Windows, but a few are running *nix (centos, and debian), and they are all able to get their DHCP data immediately. – Jay Duff Nov 08 '21 at 14:35
  • Have you checked for BIOS updates on the machines that don't work? `sudo dmidecode -s bios-version`. Then go to the computer/motherboard web site and check for a newer BIOS. You can also boot to a Ubuntu Live 21.10 DVD/USB and see if it works there. – heynnema Nov 08 '21 at 14:39
  • @heynnema I'm not sure if you mean the actual BIOS (UEFI) on the hosts (I keep those updated religiously) or the virtual BIOS, but the output of your command was `6.00`. – Jay Duff Nov 08 '21 at 14:54
  • I was talking actual hardware BIOS. With that command output, you can go to the manufacturer's web site to check for a newer BIOS. If you give me the make/model of the computer or motherboard, I'll check it for you. Did you try the second suggestion in my previous comment? That would eliminate a Ubuntu software problem. – heynnema Nov 08 '21 at 15:06
  • @heynnema - Great idea about the LiveCDs. The 20.04.2 LiveCD does NOT get a network, while the 21.10 one DOES. However, the 21.10 uses the duid instead of the MAC, so it's not using the MAC-based reservation. I'll set up a full VM on 21.10 to see how it works, once I switch the DHCP identifier over to MAC address. The servers are abstracted, so I don't think the physical hosts' mlbs will have any bearing on the guest OS. The network driver is VMXNET3, but I tried e1000 and e1000e as well. All had the same results. I'll post again tomorrow, after installing 21.10 on a new VM. – Jay Duff Nov 08 '21 at 22:06
  • Thanks for the update! Let me know if/when I need to write an answer for you. – heynnema Nov 08 '21 at 22:52
  • OK 21.10 seems to fix everything. I tried changing VMWare drivers, adding my stuff to the netplan to disable ipv6, using the mac as the dhcp-identifier and it all worked flawlessly! `ip a` shows more information under 21.10, that shows the network device being aliased (`altname enps0`). Please put in an answer (upgrade to 21.10) and I'll mark it as correct. Thanks again @heynnema! – Jay Duff Nov 10 '21 at 21:12
  • Great news! You're on your way. I put together a quick answer for this. Please click the checkmark icon that appears just to the left of my answer. Thanks! – heynnema Nov 10 '21 at 21:32

2 Answers2

0

Jay, I had this problem I think it started when I switched to 18. One machine even took close to 4 minutes - at first I thought it wasn't booting at all. As you see nobody has come up with a solution that works so I stumbled on Xubuntu!! It is supposed to be a minimal version but in 2 years I haven't discovered what it is missing and it boots in 30 -45 seconds. I uses the same Ubuntu kernel. I am not an expert user but I have one system I use as a Media Player (smart TV) and another I use just for downloads. Xubuntu supports all my hardware and software for those needs. The only problem I had was the graphics support but that is also a problem with Ubuntu which was never fixed. Xubuntu 20.04.1 does fix that and plays video well. Hope that helps - Download the 20.04.1 ISO and install new if you can - that works well.

KenF
  • 1
  • 2
0

From the question, and the comments...

re: "Ever since Ubuntu Server 18.04.3 through at least 20.04.1, my network has trouble getting an appropriate DHCP IP address. It also takes two minutes to start the network. I'm using netplan."

Suggested booting to a Ubuntu Live 21.10 DVD/USB and retesting.

re: "Ubuntu 21.10 seems to fix everything. I tried changing VMWare drivers, adding my stuff to the netplan to disable ipv6, using the mac as the dhcp-identifier and it all worked flawlessly! ip a shows more information under 21.10, that shows the network device being aliased (altname enps0)."

Adding "optional: true" to the .yaml file solved the long boot times.

I believe that there was something wrong in the original network settings all along, or the newer version of netplan in Ubuntu 21.10 solved the problems.

heynnema
  • 68,647
  • 15
  • 124
  • 180
  • I think this particular VM may be having a different issue. I did an in-place upgrade to 21.04 (hirsute), and it was still broken. In place upgraded to 21.10 (impish) and it still isn't getting an address. I think my next step is to do a fresh install on 21.20 and install graylog (the purpose of this particular VM) once more, and see if it's better. But, since it's still broken, I still don't have an answer. – Jay Duff Nov 10 '21 at 22:06
  • @JayDuff I suspect that it'll work with a clean install. Our tests with the Live were just too convincing. Keep me posted. – heynnema Nov 11 '21 at 03:13
  • Migrating an ElasticSearch DB looks *UGLY*. This one is gonna take a while... – Jay Duff Nov 12 '21 at 20:54
  • @JayDuff Can't you try a clean install without having to do an ugly DB migration? I mean, just to test for a proper IP? – heynnema Nov 12 '21 at 20:56
  • Oh - I did that. Worked great! I was referring to the new VM when I said 21.10 fixed everything. – Jay Duff Nov 12 '21 at 20:59
  • @JayDuff Your last comment requires that I ask for some clarification. The *"21.10 fixed everything"*... was that when booted to the Live USB... or after installing it into a new VM container... with/without the DB? Too many variables to fully get the picture. – heynnema Nov 12 '21 at 21:03
  • Of course - no problem. First, I did a boot off the 21.10 livecd, and everything worked great. Then I *installed* 21.10 on a fresh VM, and everything continued to work great, despite my best efforts to break it (changing drivers, etc). When I tried to do an in-place upgrade of the VM that was originally having trouble, first to 21.04, then to 21.10, the problem with DHCP taking 10 minutes persisted. – Jay Duff Nov 12 '21 at 21:36
  • @JayDuff So your original installation fails, and your original installation plus upgrades still fail, but two clean installations worked fine, correct? You know what you have to do... – heynnema Nov 12 '21 at 21:56
  • Yep. I finished the fresh install Friday, and our Network Technician is putting the tweaks into place today some time. Thanks for the help. I think 21.10 solved the problem! – Jay Duff Nov 15 '21 at 17:19
  • @JayDuff Great news! Thanks for the update. – heynnema Nov 15 '21 at 17:23
  • Unfortunately, Ubuntu 22.04 LTS seems to have this DHCP issue with VMs again. The weird part is; it seems to have worked fine until I installed an application on the VM. I wish I could see what changed, but the netplan was stock and it worked with DHCP. It wasn't until I installed a dpkg that it broke. I understand the dpkg could've changed something, but the stock netplan.yaml is unchanged. Where else should I look? – Jay Duff Nov 09 '22 at 21:38
  • If I run networkctl, I see the interface as carrier/configuring. Once I run dhclient, it goes to routable/configured. – Jay Duff Nov 09 '22 at 21:42
  • @JayDuff Please start a new question. Put all of the relevant details in the question. – heynnema Nov 10 '22 at 01:59
  • Done. I'll tag you [there](https://askubuntu.com/questions/1441304/ubuntu-server-22-04-lts-no-dhcp-on-startup). – Jay Duff Dec 07 '22 at 18:01