xref: /OK3568_Linux_fs/kernel/Documentation/watchdog/hpwdt.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun===========================
2*4882a593SmuzhiyunHPE iLO NMI Watchdog Driver
3*4882a593Smuzhiyun===========================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyunfor iLO based ProLiant Servers
6*4882a593Smuzhiyun==============================
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunLast reviewed: 08/20/2018
9*4882a593Smuzhiyun
10*4882a593Smuzhiyun
11*4882a593Smuzhiyun The HPE iLO NMI Watchdog driver is a kernel module that provides basic
12*4882a593Smuzhiyun watchdog functionality and handler for the iLO "Generate NMI to System"
13*4882a593Smuzhiyun virtual button.
14*4882a593Smuzhiyun
15*4882a593Smuzhiyun All references to iLO in this document imply it also works on iLO2 and all
16*4882a593Smuzhiyun subsequent generations.
17*4882a593Smuzhiyun
18*4882a593Smuzhiyun Watchdog functionality is enabled like any other common watchdog driver. That
19*4882a593Smuzhiyun is, an application needs to be started that kicks off the watchdog timer. A
20*4882a593Smuzhiyun basic application exists in tools/testing/selftests/watchdog/ named
21*4882a593Smuzhiyun watchdog-test.c. Simply compile the C file and kick it off. If the system
22*4882a593Smuzhiyun gets into a bad state and hangs, the HPE ProLiant iLO timer register will
23*4882a593Smuzhiyun not be updated in a timely fashion and a hardware system reset (also known as
24*4882a593Smuzhiyun an Automatic Server Recovery (ASR)) event will occur.
25*4882a593Smuzhiyun
26*4882a593Smuzhiyun The hpwdt driver also has the following module parameters:
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun ============  ================================================================
29*4882a593Smuzhiyun soft_margin   allows the user to set the watchdog timer value.
30*4882a593Smuzhiyun               Default value is 30 seconds.
31*4882a593Smuzhiyun timeout       an alias of soft_margin.
32*4882a593Smuzhiyun pretimeout    allows the user to set the watchdog pretimeout value.
33*4882a593Smuzhiyun               This is the number of seconds before timeout when an
34*4882a593Smuzhiyun               NMI is delivered to the system. Setting the value to
35*4882a593Smuzhiyun               zero disables the pretimeout NMI.
36*4882a593Smuzhiyun               Default value is 9 seconds.
37*4882a593Smuzhiyun nowayout      basic watchdog parameter that does not allow the timer to
38*4882a593Smuzhiyun               be restarted or an impending ASR to be escaped.
39*4882a593Smuzhiyun               Default value is set when compiling the kernel. If it is set
40*4882a593Smuzhiyun               to "Y", then there is no way of disabling the watchdog once
41*4882a593Smuzhiyun               it has been started.
42*4882a593Smuzhiyun kdumptimeout  Minimum timeout in seconds to apply upon receipt of an NMI
43*4882a593Smuzhiyun               before calling panic. (-1) disables the watchdog.  When value
44*4882a593Smuzhiyun               is > 0, the timer is reprogrammed with the greater of
45*4882a593Smuzhiyun               value or current timeout value.
46*4882a593Smuzhiyun ============  ================================================================
47*4882a593Smuzhiyun
48*4882a593Smuzhiyun NOTE:
49*4882a593Smuzhiyun       More information about watchdog drivers in general, including the ioctl
50*4882a593Smuzhiyun       interface to /dev/watchdog can be found in
51*4882a593Smuzhiyun       Documentation/watchdog/watchdog-api.rst and Documentation/IPMI.txt.
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun Due to limitations in the iLO hardware, the NMI pretimeout if enabled,
54*4882a593Smuzhiyun can only be set to 9 seconds.  Attempts to set pretimeout to other
55*4882a593Smuzhiyun non-zero values will be rounded, possibly to zero.  Users should verify
56*4882a593Smuzhiyun the pretimeout value after attempting to set pretimeout or timeout.
57*4882a593Smuzhiyun
58*4882a593Smuzhiyun Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a
59*4882a593Smuzhiyun panic. This is to allow for a crash dump to be collected.  It is incumbent
60*4882a593Smuzhiyun upon the user to have properly configured the system for kdump.
61*4882a593Smuzhiyun
62*4882a593Smuzhiyun The default Linux kernel behavior upon panic is to print a kernel tombstone
63*4882a593Smuzhiyun and loop forever.  This is generally not what a watchdog user wants.
64*4882a593Smuzhiyun
65*4882a593Smuzhiyun For those wishing to learn more please see:
66*4882a593Smuzhiyun	Documentation/admin-guide/kdump/kdump.rst
67*4882a593Smuzhiyun	Documentation/admin-guide/kernel-parameters.txt (panic=)
68*4882a593Smuzhiyun	Your Linux Distribution specific documentation.
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun If the hpwdt does not receive the NMI associated with an expiring timer,
71*4882a593Smuzhiyun the iLO will proceed to reset the system at timeout if the timer hasn't
72*4882a593Smuzhiyun been updated.
73*4882a593Smuzhiyun
74*4882a593Smuzhiyun--
75*4882a593Smuzhiyun
76*4882a593Smuzhiyun The HPE iLO NMI Watchdog Driver and documentation were originally developed
77*4882a593Smuzhiyun by Tom Mingarelli.
78