import site.body

Systemd and network startup

While performing some networking work (as detailed in my Anycast article and an upcoming followup) I hit an issue where machines with bad networking setups would take up to 5 minutes to boot. As i was testing to ensure a machine came up reliably, this drastically increased the time required to test. Luckily the fix for this is fairly simple.

Systemd has a very simple mechanism for extending both user-specified and builtin services, by adding a <service name>.d dir in /etc/systemd/system you can override or append options.

The networking service delaying the boot in our case is called networking.service and a quick systemctl cat networking.service shows a TimeoutStartSec value of 5mins. Based on the above, adding the snippet below will shrink this startup time to something more manageable

/etc/systemd/system/networking.service.d/10-timeout.conf:
[Service]
TimeoutStartSec=20Seconds

I could stop the post here and be done with it however that is doing everyone a disservice. There is a bigger question here and that is 'why was this set to 5 minutes in the first place'.

There are a number of good reasons to set this to 5 minutes, Some devices take awhile to some up In particular i am referring to thins like external network switches. In this particular case it makes sens to wait a couple of minutes in case everything had been turned on at once and the network needs to 'settle' or finish booting.

Another issue that couple potenticaly cause issues is a large number of machien sattempting to obtain a DHCP lease at once, WHile ntoa situation you would encournter at home (very ew people run that many devices, Myself included) an overloaded DHCP server coule take awhile to become responsive and as such this timeout makes sense.

In my particular case, I either control the hardware or the services are HA and geo-distributed. In this setup it is unlikely that these machines will reboot at the same time. A delay of 5 minutes would actively inhibit manual repair with 20 seconds being significantly more palatable.

What this should be set to is will differ in each environment/setup. Please do not copy this advice blindly but instead think about what it should actually be set to. Why would your machine fail, What services does it depend on, How long are those delays. If you don't feel like following this advice feel free to post random values to StackOverflow with no context.

This .d mechanism can be very powerful and is one that i use in my own programs as an extension mechanism. Especially when dealing with automation, This is a significantly more user friendly way to make changes As one can simply drop in config framgments rather than having to chek if a line is in a file or templating the main config file (which can be problmiating with custome overrides).