1*4882a593Smuzhiyun=========================== 2*4882a593SmuzhiyunHPE iLO NMI Watchdog Driver 3*4882a593Smuzhiyun=========================== 4*4882a593Smuzhiyun 5*4882a593Smuzhiyunfor iLO based ProLiant Servers 6*4882a593Smuzhiyun============================== 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunLast reviewed: 08/20/2018 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun 11*4882a593Smuzhiyun The HPE iLO NMI Watchdog driver is a kernel module that provides basic 12*4882a593Smuzhiyun watchdog functionality and handler for the iLO "Generate NMI to System" 13*4882a593Smuzhiyun virtual button. 14*4882a593Smuzhiyun 15*4882a593Smuzhiyun All references to iLO in this document imply it also works on iLO2 and all 16*4882a593Smuzhiyun subsequent generations. 17*4882a593Smuzhiyun 18*4882a593Smuzhiyun Watchdog functionality is enabled like any other common watchdog driver. That 19*4882a593Smuzhiyun is, an application needs to be started that kicks off the watchdog timer. A 20*4882a593Smuzhiyun basic application exists in tools/testing/selftests/watchdog/ named 21*4882a593Smuzhiyun watchdog-test.c. Simply compile the C file and kick it off. If the system 22*4882a593Smuzhiyun gets into a bad state and hangs, the HPE ProLiant iLO timer register will 23*4882a593Smuzhiyun not be updated in a timely fashion and a hardware system reset (also known as 24*4882a593Smuzhiyun an Automatic Server Recovery (ASR)) event will occur. 25*4882a593Smuzhiyun 26*4882a593Smuzhiyun The hpwdt driver also has the following module parameters: 27*4882a593Smuzhiyun 28*4882a593Smuzhiyun ============ ================================================================ 29*4882a593Smuzhiyun soft_margin allows the user to set the watchdog timer value. 30*4882a593Smuzhiyun Default value is 30 seconds. 31*4882a593Smuzhiyun timeout an alias of soft_margin. 32*4882a593Smuzhiyun pretimeout allows the user to set the watchdog pretimeout value. 33*4882a593Smuzhiyun This is the number of seconds before timeout when an 34*4882a593Smuzhiyun NMI is delivered to the system. Setting the value to 35*4882a593Smuzhiyun zero disables the pretimeout NMI. 36*4882a593Smuzhiyun Default value is 9 seconds. 37*4882a593Smuzhiyun nowayout basic watchdog parameter that does not allow the timer to 38*4882a593Smuzhiyun be restarted or an impending ASR to be escaped. 39*4882a593Smuzhiyun Default value is set when compiling the kernel. If it is set 40*4882a593Smuzhiyun to "Y", then there is no way of disabling the watchdog once 41*4882a593Smuzhiyun it has been started. 42*4882a593Smuzhiyun kdumptimeout Minimum timeout in seconds to apply upon receipt of an NMI 43*4882a593Smuzhiyun before calling panic. (-1) disables the watchdog. When value 44*4882a593Smuzhiyun is > 0, the timer is reprogrammed with the greater of 45*4882a593Smuzhiyun value or current timeout value. 46*4882a593Smuzhiyun ============ ================================================================ 47*4882a593Smuzhiyun 48*4882a593Smuzhiyun NOTE: 49*4882a593Smuzhiyun More information about watchdog drivers in general, including the ioctl 50*4882a593Smuzhiyun interface to /dev/watchdog can be found in 51*4882a593Smuzhiyun Documentation/watchdog/watchdog-api.rst and Documentation/IPMI.txt. 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun Due to limitations in the iLO hardware, the NMI pretimeout if enabled, 54*4882a593Smuzhiyun can only be set to 9 seconds. Attempts to set pretimeout to other 55*4882a593Smuzhiyun non-zero values will be rounded, possibly to zero. Users should verify 56*4882a593Smuzhiyun the pretimeout value after attempting to set pretimeout or timeout. 57*4882a593Smuzhiyun 58*4882a593Smuzhiyun Upon receipt of an NMI from the iLO, the hpwdt driver will initiate a 59*4882a593Smuzhiyun panic. This is to allow for a crash dump to be collected. It is incumbent 60*4882a593Smuzhiyun upon the user to have properly configured the system for kdump. 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun The default Linux kernel behavior upon panic is to print a kernel tombstone 63*4882a593Smuzhiyun and loop forever. This is generally not what a watchdog user wants. 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun For those wishing to learn more please see: 66*4882a593Smuzhiyun Documentation/admin-guide/kdump/kdump.rst 67*4882a593Smuzhiyun Documentation/admin-guide/kernel-parameters.txt (panic=) 68*4882a593Smuzhiyun Your Linux Distribution specific documentation. 69*4882a593Smuzhiyun 70*4882a593Smuzhiyun If the hpwdt does not receive the NMI associated with an expiring timer, 71*4882a593Smuzhiyun the iLO will proceed to reset the system at timeout if the timer hasn't 72*4882a593Smuzhiyun been updated. 73*4882a593Smuzhiyun 74*4882a593Smuzhiyun-- 75*4882a593Smuzhiyun 76*4882a593Smuzhiyun The HPE iLO NMI Watchdog Driver and documentation were originally developed 77*4882a593Smuzhiyun by Tom Mingarelli. 78