55

I need to have network messages sent when a systemd service I have crashes or is hung (i.e., enters failed state; I monitor for hung by using WatchdogSec=). I noticed that newer systemd have FailureAction=, but then saw that this doesn't allow arbitrary commands, but just rebooting/shutdown.

Specifically, I need a way to have one network message sent when systemd detects the program has crashed, and another when it detects it has hung.

I'm hoping for a better answer than "parse the logs", and I need something that has a near-instant response time, so I don't think a polling approach is good; it should be something triggered by the event occurring.

3 Answers3

54

systemd units support OnFailure that will activate a unit (or more) when the unit goes to failed. You can put something like

 OnFailure=notify-failed@%n

And then create the notify-failed@.service service where you can use the required specifier (you probably will want at least %i) to launch the script or command that will send notification.

You can see a practical example in http://northernlightlabs.se/systemd.status.mail.on.unit.failure

  • 7
    There are a couple corrections needed to the instructions on the linked site. First, notify%n.service is redundant, and will result in notify@my-service.service.service. Second, %i should be used instead of %I, or all dashes in the name will be converted to forward slashes. – orodbhen Jun 22 '16 at 15:42
  • 9
    Is there a way to do this for multiple or all units, without modifying their unit files? – Vladimir Panteleev Sep 10 '17 at 12:52
  • 1
    @VladimirPanteleev - you don't need to modify the actual unit files - you can just add an override for that specific feature. For example, run systemctl edit my-service.service and in the editor that opens add a line [Unit] followed by OnFailure=notify-failed@%n, save and exit. This will create an override file in /etc/systemd/system/my-service.service.d/override.conf with the added functionality (of course you can automate the creation of such files for multiple services, just don't forget to do systemctl daemon-reload if you modified files not through systemctl). – Guss Feb 06 '22 at 11:41
  • 2
    For anybody looking to do this for all service files at once, check Example 3 at the very end of systemd.unit. You need to place a configuration under service.d directory and it will apply to all services. – Felipe May 19 '22 at 17:59
  • 1
    @Felipe - I tried that on an Ubuntu 18.04 system but can't get it to work as advertised. The OnFailure= failure-handler@%n.service does work when attached to the individual service's [Unit] section but not when /etc/systemd/system/service.d/10-all.conf is the only place it is defined. – cueedee Feb 07 '23 at 08:03
  • 1
    @Felipe - ...adding to my own comment, it seems that top-level drop-ins need systemd version 244 or newer and Ubuntu 18 only has version 237. – cueedee Feb 09 '23 at 21:09
34

Just my way to notify :

/etc/systemd/system/notify-email@.service

[Unit]
Description=Sent email

[Service] Type=oneshot ExecStart=/usr/bin/bash -c '/usr/bin/systemctl status %i | /usr/bin/mailx -Ssendwait -s "[SYSTEMD_%i] Fail" your_admin@company.blablabla'

[Install] WantedBy=multi-user.target

add to systemd:

systemctl enable /etc/systemd/system/notify-email@.service

At others services add:

[Unit]
OnFailure=notify-email@%i.service

Reload the configuration:

systemctl daemon-reload
tjmcewan
  • 493
ceinmart
  • 507
  • 4
  • 11
  • 1
    Is there a way to avoid triggering it lots of times in a row? In some situations receiving 1K emails about a service that failed at night and tried over and over again to restart itself isn't helpful. – starbeamrainbowlabs Sep 20 '19 at 19:27
  • 1
    As far I know, no, there is no option from systemd. You should put some control into the bash command, something like touching a file and checking if it have +10min for example... in simple command logic: find -mmin +10 && send email && touch file ; – ceinmart Apr 07 '20 at 14:30
  • 2
    Why are you enabling the notification service? It's supposed to be started by other units, no reason to start it on boot. – drrlvn Mar 18 '22 at 08:30
  • /bin/bash instead of /usr/bin/bash – JulianW Oct 18 '22 at 12:32
  • 1
    I'm a newbie here, but what I read at https://www.freedesktop.org/software/systemd/man/systemd.unit.html (Example 3. Top level drop-ins with template units) and https://unix.stackexchange.com/a/506374/16256 makes me wonder if the WantedBy=multi-user.target line is unnecessary or unwanted. Would it cause this to send a notification at each boot? – nealmcb May 30 '23 at 04:15
0

I came across this utility which seems to provide this: https://github.com/joonty/systemd_mon