xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/sysctl/kernel.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun===================================
2*4882a593SmuzhiyunDocumentation for /proc/sys/kernel/
3*4882a593Smuzhiyun===================================
4*4882a593Smuzhiyun
5*4882a593Smuzhiyun.. See scripts/check-sysctl-docs to keep this up to date
6*4882a593Smuzhiyun
7*4882a593Smuzhiyun
8*4882a593SmuzhiyunCopyright (c) 1998, 1999,  Rik van Riel <riel@nl.linux.org>
9*4882a593Smuzhiyun
10*4882a593SmuzhiyunCopyright (c) 2009,        Shen Feng<shen@cn.fujitsu.com>
11*4882a593Smuzhiyun
12*4882a593SmuzhiyunFor general info and legal blurb, please look in :doc:`index`.
13*4882a593Smuzhiyun
14*4882a593Smuzhiyun------------------------------------------------------------------------------
15*4882a593Smuzhiyun
16*4882a593SmuzhiyunThis file contains documentation for the sysctl files in
17*4882a593Smuzhiyun``/proc/sys/kernel/`` and is valid for Linux kernel version 2.2.
18*4882a593Smuzhiyun
19*4882a593SmuzhiyunThe files in this directory can be used to tune and monitor
20*4882a593Smuzhiyunmiscellaneous and general things in the operation of the Linux
21*4882a593Smuzhiyunkernel. Since some of the files *can* be used to screw up your
22*4882a593Smuzhiyunsystem, it is advisable to read both documentation and source
23*4882a593Smuzhiyunbefore actually making adjustments.
24*4882a593Smuzhiyun
25*4882a593SmuzhiyunCurrently, these files might (depending on your configuration)
26*4882a593Smuzhiyunshow up in ``/proc/sys/kernel``:
27*4882a593Smuzhiyun
28*4882a593Smuzhiyun.. contents:: :local:
29*4882a593Smuzhiyun
30*4882a593Smuzhiyun
31*4882a593Smuzhiyunacct
32*4882a593Smuzhiyun====
33*4882a593Smuzhiyun
34*4882a593Smuzhiyun::
35*4882a593Smuzhiyun
36*4882a593Smuzhiyun    highwater lowwater frequency
37*4882a593Smuzhiyun
38*4882a593SmuzhiyunIf BSD-style process accounting is enabled these values control
39*4882a593Smuzhiyunits behaviour. If free space on filesystem where the log lives
40*4882a593Smuzhiyungoes below ``lowwater``% accounting suspends. If free space gets
41*4882a593Smuzhiyunabove ``highwater``% accounting resumes. ``frequency`` determines
42*4882a593Smuzhiyunhow often do we check the amount of free space (value is in
43*4882a593Smuzhiyunseconds). Default:
44*4882a593Smuzhiyun
45*4882a593Smuzhiyun::
46*4882a593Smuzhiyun
47*4882a593Smuzhiyun    4 2 30
48*4882a593Smuzhiyun
49*4882a593SmuzhiyunThat is, suspend accounting if free space drops below 2%; resume it
50*4882a593Smuzhiyunif it increases to at least 4%; consider information about amount of
51*4882a593Smuzhiyunfree space valid for 30 seconds.
52*4882a593Smuzhiyun
53*4882a593Smuzhiyun
54*4882a593Smuzhiyunacpi_video_flags
55*4882a593Smuzhiyun================
56*4882a593Smuzhiyun
57*4882a593SmuzhiyunSee :doc:`/power/video`. This allows the video resume mode to be set,
58*4882a593Smuzhiyunin a similar fashion to the ``acpi_sleep`` kernel parameter, by
59*4882a593Smuzhiyuncombining the following values:
60*4882a593Smuzhiyun
61*4882a593Smuzhiyun= =======
62*4882a593Smuzhiyun1 s3_bios
63*4882a593Smuzhiyun2 s3_mode
64*4882a593Smuzhiyun4 s3_beep
65*4882a593Smuzhiyun= =======
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun
68*4882a593Smuzhiyunauto_msgmni
69*4882a593Smuzhiyun===========
70*4882a593Smuzhiyun
71*4882a593SmuzhiyunThis variable has no effect and may be removed in future kernel
72*4882a593Smuzhiyunreleases. Reading it always returns 0.
73*4882a593SmuzhiyunUp to Linux 3.17, it enabled/disabled automatic recomputing of
74*4882a593Smuzhiyun`msgmni`_
75*4882a593Smuzhiyunupon memory add/remove or upon IPC namespace creation/removal.
76*4882a593SmuzhiyunEchoing "1" into this file enabled msgmni automatic recomputing.
77*4882a593SmuzhiyunEchoing "0" turned it off. The default value was 1.
78*4882a593Smuzhiyun
79*4882a593Smuzhiyun
80*4882a593Smuzhiyunbootloader_type (x86 only)
81*4882a593Smuzhiyun==========================
82*4882a593Smuzhiyun
83*4882a593SmuzhiyunThis gives the bootloader type number as indicated by the bootloader,
84*4882a593Smuzhiyunshifted left by 4, and OR'd with the low four bits of the bootloader
85*4882a593Smuzhiyunversion.  The reason for this encoding is that this used to match the
86*4882a593Smuzhiyun``type_of_loader`` field in the kernel header; the encoding is kept for
87*4882a593Smuzhiyunbackwards compatibility.  That is, if the full bootloader type number
88*4882a593Smuzhiyunis 0x15 and the full version number is 0x234, this file will contain
89*4882a593Smuzhiyunthe value 340 = 0x154.
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunSee the ``type_of_loader`` and ``ext_loader_type`` fields in
92*4882a593Smuzhiyun:doc:`/x86/boot` for additional information.
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun
95*4882a593Smuzhiyunbootloader_version (x86 only)
96*4882a593Smuzhiyun=============================
97*4882a593Smuzhiyun
98*4882a593SmuzhiyunThe complete bootloader version number.  In the example above, this
99*4882a593Smuzhiyunfile will contain the value 564 = 0x234.
100*4882a593Smuzhiyun
101*4882a593SmuzhiyunSee the ``type_of_loader`` and ``ext_loader_ver`` fields in
102*4882a593Smuzhiyun:doc:`/x86/boot` for additional information.
103*4882a593Smuzhiyun
104*4882a593Smuzhiyun
105*4882a593Smuzhiyunbpf_stats_enabled
106*4882a593Smuzhiyun=================
107*4882a593Smuzhiyun
108*4882a593SmuzhiyunControls whether the kernel should collect statistics on BPF programs
109*4882a593Smuzhiyun(total time spent running, number of times run...). Enabling
110*4882a593Smuzhiyunstatistics causes a slight reduction in performance on each program
111*4882a593Smuzhiyunrun. The statistics can be seen using ``bpftool``.
112*4882a593Smuzhiyun
113*4882a593Smuzhiyun= ===================================
114*4882a593Smuzhiyun0 Don't collect statistics (default).
115*4882a593Smuzhiyun1 Collect statistics.
116*4882a593Smuzhiyun= ===================================
117*4882a593Smuzhiyun
118*4882a593Smuzhiyun
119*4882a593Smuzhiyuncad_pid
120*4882a593Smuzhiyun=======
121*4882a593Smuzhiyun
122*4882a593SmuzhiyunThis is the pid which will be signalled on reboot (notably, by
123*4882a593SmuzhiyunCtrl-Alt-Delete). Writing a value to this file which doesn't
124*4882a593Smuzhiyuncorrespond to a running process will result in ``-ESRCH``.
125*4882a593Smuzhiyun
126*4882a593SmuzhiyunSee also `ctrl-alt-del`_.
127*4882a593Smuzhiyun
128*4882a593Smuzhiyun
129*4882a593Smuzhiyuncap_last_cap
130*4882a593Smuzhiyun============
131*4882a593Smuzhiyun
132*4882a593SmuzhiyunHighest valid capability of the running kernel.  Exports
133*4882a593Smuzhiyun``CAP_LAST_CAP`` from the kernel.
134*4882a593Smuzhiyun
135*4882a593Smuzhiyun
136*4882a593Smuzhiyuncore_pattern
137*4882a593Smuzhiyun============
138*4882a593Smuzhiyun
139*4882a593Smuzhiyun``core_pattern`` is used to specify a core dumpfile pattern name.
140*4882a593Smuzhiyun
141*4882a593Smuzhiyun* max length 127 characters; default value is "core"
142*4882a593Smuzhiyun* ``core_pattern`` is used as a pattern template for the output
143*4882a593Smuzhiyun  filename; certain string patterns (beginning with '%') are
144*4882a593Smuzhiyun  substituted with their actual values.
145*4882a593Smuzhiyun* backward compatibility with ``core_uses_pid``:
146*4882a593Smuzhiyun
147*4882a593Smuzhiyun	If ``core_pattern`` does not include "%p" (default does not)
148*4882a593Smuzhiyun	and ``core_uses_pid`` is set, then .PID will be appended to
149*4882a593Smuzhiyun	the filename.
150*4882a593Smuzhiyun
151*4882a593Smuzhiyun* corename format specifiers
152*4882a593Smuzhiyun
153*4882a593Smuzhiyun	========	==========================================
154*4882a593Smuzhiyun	%<NUL>		'%' is dropped
155*4882a593Smuzhiyun	%%		output one '%'
156*4882a593Smuzhiyun	%p		pid
157*4882a593Smuzhiyun	%P		global pid (init PID namespace)
158*4882a593Smuzhiyun	%i		tid
159*4882a593Smuzhiyun	%I		global tid (init PID namespace)
160*4882a593Smuzhiyun	%u		uid (in initial user namespace)
161*4882a593Smuzhiyun	%g		gid (in initial user namespace)
162*4882a593Smuzhiyun	%d		dump mode, matches ``PR_SET_DUMPABLE`` and
163*4882a593Smuzhiyun			``/proc/sys/fs/suid_dumpable``
164*4882a593Smuzhiyun	%s		signal number
165*4882a593Smuzhiyun	%t		UNIX time of dump
166*4882a593Smuzhiyun	%h		hostname
167*4882a593Smuzhiyun	%e		executable filename (may be shortened, could be changed by prctl etc)
168*4882a593Smuzhiyun	%f      	executable filename
169*4882a593Smuzhiyun	%E		executable path
170*4882a593Smuzhiyun	%c		maximum size of core file by resource limit RLIMIT_CORE
171*4882a593Smuzhiyun	%<OTHER>	both are dropped
172*4882a593Smuzhiyun	========	==========================================
173*4882a593Smuzhiyun
174*4882a593Smuzhiyun* If the first character of the pattern is a '|', the kernel will treat
175*4882a593Smuzhiyun  the rest of the pattern as a command to run.  The core dump will be
176*4882a593Smuzhiyun  written to the standard input of that program instead of to a file.
177*4882a593Smuzhiyun
178*4882a593Smuzhiyun
179*4882a593Smuzhiyuncore_pipe_limit
180*4882a593Smuzhiyun===============
181*4882a593Smuzhiyun
182*4882a593SmuzhiyunThis sysctl is only applicable when `core_pattern`_ is configured to
183*4882a593Smuzhiyunpipe core files to a user space helper (when the first character of
184*4882a593Smuzhiyun``core_pattern`` is a '|', see above).
185*4882a593SmuzhiyunWhen collecting cores via a pipe to an application, it is occasionally
186*4882a593Smuzhiyunuseful for the collecting application to gather data about the
187*4882a593Smuzhiyuncrashing process from its ``/proc/pid`` directory.
188*4882a593SmuzhiyunIn order to do this safely, the kernel must wait for the collecting
189*4882a593Smuzhiyunprocess to exit, so as not to remove the crashing processes proc files
190*4882a593Smuzhiyunprematurely.
191*4882a593SmuzhiyunThis in turn creates the possibility that a misbehaving userspace
192*4882a593Smuzhiyuncollecting process can block the reaping of a crashed process simply
193*4882a593Smuzhiyunby never exiting.
194*4882a593SmuzhiyunThis sysctl defends against that.
195*4882a593SmuzhiyunIt defines how many concurrent crashing processes may be piped to user
196*4882a593Smuzhiyunspace applications in parallel.
197*4882a593SmuzhiyunIf this value is exceeded, then those crashing processes above that
198*4882a593Smuzhiyunvalue are noted via the kernel log and their cores are skipped.
199*4882a593Smuzhiyun0 is a special value, indicating that unlimited processes may be
200*4882a593Smuzhiyuncaptured in parallel, but that no waiting will take place (i.e. the
201*4882a593Smuzhiyuncollecting process is not guaranteed access to ``/proc/<crashing
202*4882a593Smuzhiyunpid>/``).
203*4882a593SmuzhiyunThis value defaults to 0.
204*4882a593Smuzhiyun
205*4882a593Smuzhiyun
206*4882a593Smuzhiyuncore_uses_pid
207*4882a593Smuzhiyun=============
208*4882a593Smuzhiyun
209*4882a593SmuzhiyunThe default coredump filename is "core".  By setting
210*4882a593Smuzhiyun``core_uses_pid`` to 1, the coredump filename becomes core.PID.
211*4882a593SmuzhiyunIf `core_pattern`_ does not include "%p" (default does not)
212*4882a593Smuzhiyunand ``core_uses_pid`` is set, then .PID will be appended to
213*4882a593Smuzhiyunthe filename.
214*4882a593Smuzhiyun
215*4882a593Smuzhiyun
216*4882a593Smuzhiyunctrl-alt-del
217*4882a593Smuzhiyun============
218*4882a593Smuzhiyun
219*4882a593SmuzhiyunWhen the value in this file is 0, ctrl-alt-del is trapped and
220*4882a593Smuzhiyunsent to the ``init(1)`` program to handle a graceful restart.
221*4882a593SmuzhiyunWhen, however, the value is > 0, Linux's reaction to a Vulcan
222*4882a593SmuzhiyunNerve Pinch (tm) will be an immediate reboot, without even
223*4882a593Smuzhiyunsyncing its dirty buffers.
224*4882a593Smuzhiyun
225*4882a593SmuzhiyunNote:
226*4882a593Smuzhiyun  when a program (like dosemu) has the keyboard in 'raw'
227*4882a593Smuzhiyun  mode, the ctrl-alt-del is intercepted by the program before it
228*4882a593Smuzhiyun  ever reaches the kernel tty layer, and it's up to the program
229*4882a593Smuzhiyun  to decide what to do with it.
230*4882a593Smuzhiyun
231*4882a593Smuzhiyun
232*4882a593Smuzhiyundmesg_restrict
233*4882a593Smuzhiyun==============
234*4882a593Smuzhiyun
235*4882a593SmuzhiyunThis toggle indicates whether unprivileged users are prevented
236*4882a593Smuzhiyunfrom using ``dmesg(8)`` to view messages from the kernel's log
237*4882a593Smuzhiyunbuffer.
238*4882a593SmuzhiyunWhen ``dmesg_restrict`` is set to 0 there are no restrictions.
239*4882a593SmuzhiyunWhen ``dmesg_restrict`` is set to 1, users must have
240*4882a593Smuzhiyun``CAP_SYSLOG`` to use ``dmesg(8)``.
241*4882a593Smuzhiyun
242*4882a593SmuzhiyunThe kernel config option ``CONFIG_SECURITY_DMESG_RESTRICT`` sets the
243*4882a593Smuzhiyundefault value of ``dmesg_restrict``.
244*4882a593Smuzhiyun
245*4882a593Smuzhiyun
246*4882a593Smuzhiyundomainname & hostname
247*4882a593Smuzhiyun=====================
248*4882a593Smuzhiyun
249*4882a593SmuzhiyunThese files can be used to set the NIS/YP domainname and the
250*4882a593Smuzhiyunhostname of your box in exactly the same way as the commands
251*4882a593Smuzhiyundomainname and hostname, i.e.::
252*4882a593Smuzhiyun
253*4882a593Smuzhiyun	# echo "darkstar" > /proc/sys/kernel/hostname
254*4882a593Smuzhiyun	# echo "mydomain" > /proc/sys/kernel/domainname
255*4882a593Smuzhiyun
256*4882a593Smuzhiyunhas the same effect as::
257*4882a593Smuzhiyun
258*4882a593Smuzhiyun	# hostname "darkstar"
259*4882a593Smuzhiyun	# domainname "mydomain"
260*4882a593Smuzhiyun
261*4882a593SmuzhiyunNote, however, that the classic darkstar.frop.org has the
262*4882a593Smuzhiyunhostname "darkstar" and DNS (Internet Domain Name Server)
263*4882a593Smuzhiyundomainname "frop.org", not to be confused with the NIS (Network
264*4882a593SmuzhiyunInformation Service) or YP (Yellow Pages) domainname. These two
265*4882a593Smuzhiyundomain names are in general different. For a detailed discussion
266*4882a593Smuzhiyunsee the ``hostname(1)`` man page.
267*4882a593Smuzhiyun
268*4882a593Smuzhiyun
269*4882a593Smuzhiyunfirmware_config
270*4882a593Smuzhiyun===============
271*4882a593Smuzhiyun
272*4882a593SmuzhiyunSee :doc:`/driver-api/firmware/fallback-mechanisms`.
273*4882a593Smuzhiyun
274*4882a593SmuzhiyunThe entries in this directory allow the firmware loader helper
275*4882a593Smuzhiyunfallback to be controlled:
276*4882a593Smuzhiyun
277*4882a593Smuzhiyun* ``force_sysfs_fallback``, when set to 1, forces the use of the
278*4882a593Smuzhiyun  fallback;
279*4882a593Smuzhiyun* ``ignore_sysfs_fallback``, when set to 1, ignores any fallback.
280*4882a593Smuzhiyun
281*4882a593Smuzhiyun
282*4882a593Smuzhiyunftrace_dump_on_oops
283*4882a593Smuzhiyun===================
284*4882a593Smuzhiyun
285*4882a593SmuzhiyunDetermines whether ``ftrace_dump()`` should be called on an oops (or
286*4882a593Smuzhiyunkernel panic). This will output the contents of the ftrace buffers to
287*4882a593Smuzhiyunthe console.  This is very useful for capturing traces that lead to
288*4882a593Smuzhiyuncrashes and outputting them to a serial console.
289*4882a593Smuzhiyun
290*4882a593Smuzhiyun= ===================================================
291*4882a593Smuzhiyun0 Disabled (default).
292*4882a593Smuzhiyun1 Dump buffers of all CPUs.
293*4882a593Smuzhiyun2 Dump the buffer of the CPU that triggered the oops.
294*4882a593Smuzhiyun= ===================================================
295*4882a593Smuzhiyun
296*4882a593Smuzhiyun
297*4882a593Smuzhiyunftrace_enabled, stack_tracer_enabled
298*4882a593Smuzhiyun====================================
299*4882a593Smuzhiyun
300*4882a593SmuzhiyunSee :doc:`/trace/ftrace`.
301*4882a593Smuzhiyun
302*4882a593Smuzhiyun
303*4882a593Smuzhiyunhardlockup_all_cpu_backtrace
304*4882a593Smuzhiyun============================
305*4882a593Smuzhiyun
306*4882a593SmuzhiyunThis value controls the hard lockup detector behavior when a hard
307*4882a593Smuzhiyunlockup condition is detected as to whether or not to gather further
308*4882a593Smuzhiyundebug information. If enabled, arch-specific all-CPU stack dumping
309*4882a593Smuzhiyunwill be initiated.
310*4882a593Smuzhiyun
311*4882a593Smuzhiyun= ============================================
312*4882a593Smuzhiyun0 Do nothing. This is the default behavior.
313*4882a593Smuzhiyun1 On detection capture more debug information.
314*4882a593Smuzhiyun= ============================================
315*4882a593Smuzhiyun
316*4882a593Smuzhiyun
317*4882a593Smuzhiyunhardlockup_panic
318*4882a593Smuzhiyun================
319*4882a593Smuzhiyun
320*4882a593SmuzhiyunThis parameter can be used to control whether the kernel panics
321*4882a593Smuzhiyunwhen a hard lockup is detected.
322*4882a593Smuzhiyun
323*4882a593Smuzhiyun= ===========================
324*4882a593Smuzhiyun0 Don't panic on hard lockup.
325*4882a593Smuzhiyun1 Panic on hard lockup.
326*4882a593Smuzhiyun= ===========================
327*4882a593Smuzhiyun
328*4882a593SmuzhiyunSee :doc:`/admin-guide/lockup-watchdogs` for more information.
329*4882a593SmuzhiyunThis can also be set using the nmi_watchdog kernel parameter.
330*4882a593Smuzhiyun
331*4882a593Smuzhiyun
332*4882a593Smuzhiyunhotplug
333*4882a593Smuzhiyun=======
334*4882a593Smuzhiyun
335*4882a593SmuzhiyunPath for the hotplug policy agent.
336*4882a593SmuzhiyunDefault value is "``/sbin/hotplug``".
337*4882a593Smuzhiyun
338*4882a593Smuzhiyun
339*4882a593Smuzhiyunhung_task_all_cpu_backtrace
340*4882a593Smuzhiyun===========================
341*4882a593Smuzhiyun
342*4882a593SmuzhiyunIf this option is set, the kernel will send an NMI to all CPUs to dump
343*4882a593Smuzhiyuntheir backtraces when a hung task is detected. This file shows up if
344*4882a593SmuzhiyunCONFIG_DETECT_HUNG_TASK and CONFIG_SMP are enabled.
345*4882a593Smuzhiyun
346*4882a593Smuzhiyun0: Won't show all CPUs backtraces when a hung task is detected.
347*4882a593SmuzhiyunThis is the default behavior.
348*4882a593Smuzhiyun
349*4882a593Smuzhiyun1: Will non-maskably interrupt all CPUs and dump their backtraces when
350*4882a593Smuzhiyuna hung task is detected.
351*4882a593Smuzhiyun
352*4882a593Smuzhiyun
353*4882a593Smuzhiyunhung_task_panic
354*4882a593Smuzhiyun===============
355*4882a593Smuzhiyun
356*4882a593SmuzhiyunControls the kernel's behavior when a hung task is detected.
357*4882a593SmuzhiyunThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
358*4882a593Smuzhiyun
359*4882a593Smuzhiyun= =================================================
360*4882a593Smuzhiyun0 Continue operation. This is the default behavior.
361*4882a593Smuzhiyun1 Panic immediately.
362*4882a593Smuzhiyun= =================================================
363*4882a593Smuzhiyun
364*4882a593Smuzhiyun
365*4882a593Smuzhiyunhung_task_check_count
366*4882a593Smuzhiyun=====================
367*4882a593Smuzhiyun
368*4882a593SmuzhiyunThe upper bound on the number of tasks that are checked.
369*4882a593SmuzhiyunThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
370*4882a593Smuzhiyun
371*4882a593Smuzhiyun
372*4882a593Smuzhiyunhung_task_timeout_secs
373*4882a593Smuzhiyun======================
374*4882a593Smuzhiyun
375*4882a593SmuzhiyunWhen a task in D state did not get scheduled
376*4882a593Smuzhiyunfor more than this value report a warning.
377*4882a593SmuzhiyunThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
378*4882a593Smuzhiyun
379*4882a593Smuzhiyun0 means infinite timeout, no checking is done.
380*4882a593Smuzhiyun
381*4882a593SmuzhiyunPossible values to set are in range {0:``LONG_MAX``/``HZ``}.
382*4882a593Smuzhiyun
383*4882a593Smuzhiyun
384*4882a593Smuzhiyunhung_task_check_interval_secs
385*4882a593Smuzhiyun=============================
386*4882a593Smuzhiyun
387*4882a593SmuzhiyunHung task check interval. If hung task checking is enabled
388*4882a593Smuzhiyun(see `hung_task_timeout_secs`_), the check is done every
389*4882a593Smuzhiyun``hung_task_check_interval_secs`` seconds.
390*4882a593SmuzhiyunThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
391*4882a593Smuzhiyun
392*4882a593Smuzhiyun0 (default) means use ``hung_task_timeout_secs`` as checking
393*4882a593Smuzhiyuninterval.
394*4882a593Smuzhiyun
395*4882a593SmuzhiyunPossible values to set are in range {0:``LONG_MAX``/``HZ``}.
396*4882a593Smuzhiyun
397*4882a593Smuzhiyun
398*4882a593Smuzhiyunhung_task_warnings
399*4882a593Smuzhiyun==================
400*4882a593Smuzhiyun
401*4882a593SmuzhiyunThe maximum number of warnings to report. During a check interval
402*4882a593Smuzhiyunif a hung task is detected, this value is decreased by 1.
403*4882a593SmuzhiyunWhen this value reaches 0, no more warnings will be reported.
404*4882a593SmuzhiyunThis file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled.
405*4882a593Smuzhiyun
406*4882a593Smuzhiyun-1: report an infinite number of warnings.
407*4882a593Smuzhiyun
408*4882a593Smuzhiyun
409*4882a593Smuzhiyunhyperv_record_panic_msg
410*4882a593Smuzhiyun=======================
411*4882a593Smuzhiyun
412*4882a593SmuzhiyunControls whether the panic kmsg data should be reported to Hyper-V.
413*4882a593Smuzhiyun
414*4882a593Smuzhiyun= =========================================================
415*4882a593Smuzhiyun0 Do not report panic kmsg data.
416*4882a593Smuzhiyun1 Report the panic kmsg data. This is the default behavior.
417*4882a593Smuzhiyun= =========================================================
418*4882a593Smuzhiyun
419*4882a593Smuzhiyun
420*4882a593Smuzhiyunignore-unaligned-usertrap
421*4882a593Smuzhiyun=========================
422*4882a593Smuzhiyun
423*4882a593SmuzhiyunOn architectures where unaligned accesses cause traps, and where this
424*4882a593Smuzhiyunfeature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN``;
425*4882a593Smuzhiyuncurrently, ``arc`` and ``ia64``), controls whether all unaligned traps
426*4882a593Smuzhiyunare logged.
427*4882a593Smuzhiyun
428*4882a593Smuzhiyun= =============================================================
429*4882a593Smuzhiyun0 Log all unaligned accesses.
430*4882a593Smuzhiyun1 Only warn the first time a process traps. This is the default
431*4882a593Smuzhiyun  setting.
432*4882a593Smuzhiyun= =============================================================
433*4882a593Smuzhiyun
434*4882a593SmuzhiyunSee also `unaligned-trap`_ and `unaligned-dump-stack`_. On ``ia64``,
435*4882a593Smuzhiyunthis allows system administrators to override the
436*4882a593Smuzhiyun``IA64_THREAD_UAC_NOPRINT`` ``prctl`` and avoid logs being flooded.
437*4882a593Smuzhiyun
438*4882a593Smuzhiyun
439*4882a593Smuzhiyunkexec_load_disabled
440*4882a593Smuzhiyun===================
441*4882a593Smuzhiyun
442*4882a593SmuzhiyunA toggle indicating if the ``kexec_load`` syscall has been disabled.
443*4882a593SmuzhiyunThis value defaults to 0 (false: ``kexec_load`` enabled), but can be
444*4882a593Smuzhiyunset to 1 (true: ``kexec_load`` disabled).
445*4882a593SmuzhiyunOnce true, kexec can no longer be used, and the toggle cannot be set
446*4882a593Smuzhiyunback to false.
447*4882a593SmuzhiyunThis allows a kexec image to be loaded before disabling the syscall,
448*4882a593Smuzhiyunallowing a system to set up (and later use) an image without it being
449*4882a593Smuzhiyunaltered.
450*4882a593SmuzhiyunGenerally used together with the `modules_disabled`_ sysctl.
451*4882a593Smuzhiyun
452*4882a593Smuzhiyun
453*4882a593Smuzhiyunkptr_restrict
454*4882a593Smuzhiyun=============
455*4882a593Smuzhiyun
456*4882a593SmuzhiyunThis toggle indicates whether restrictions are placed on
457*4882a593Smuzhiyunexposing kernel addresses via ``/proc`` and other interfaces.
458*4882a593Smuzhiyun
459*4882a593SmuzhiyunWhen ``kptr_restrict`` is set to 0 (the default) the address is hashed
460*4882a593Smuzhiyunbefore printing.
461*4882a593Smuzhiyun(This is the equivalent to %p.)
462*4882a593Smuzhiyun
463*4882a593SmuzhiyunWhen ``kptr_restrict`` is set to 1, kernel pointers printed using the
464*4882a593Smuzhiyun%pK format specifier will be replaced with 0s unless the user has
465*4882a593Smuzhiyun``CAP_SYSLOG`` and effective user and group ids are equal to the real
466*4882a593Smuzhiyunids.
467*4882a593SmuzhiyunThis is because %pK checks are done at read() time rather than open()
468*4882a593Smuzhiyuntime, so if permissions are elevated between the open() and the read()
469*4882a593Smuzhiyun(e.g via a setuid binary) then %pK will not leak kernel pointers to
470*4882a593Smuzhiyununprivileged users.
471*4882a593SmuzhiyunNote, this is a temporary solution only.
472*4882a593SmuzhiyunThe correct long-term solution is to do the permission checks at
473*4882a593Smuzhiyunopen() time.
474*4882a593SmuzhiyunConsider removing world read permissions from files that use %pK, and
475*4882a593Smuzhiyunusing `dmesg_restrict`_ to protect against uses of %pK in ``dmesg(8)``
476*4882a593Smuzhiyunif leaking kernel pointer values to unprivileged users is a concern.
477*4882a593Smuzhiyun
478*4882a593SmuzhiyunWhen ``kptr_restrict`` is set to 2, kernel pointers printed using
479*4882a593Smuzhiyun%pK will be replaced with 0s regardless of privileges.
480*4882a593Smuzhiyun
481*4882a593Smuzhiyun
482*4882a593Smuzhiyunmodprobe
483*4882a593Smuzhiyun========
484*4882a593Smuzhiyun
485*4882a593SmuzhiyunThe full path to the usermode helper for autoloading kernel modules,
486*4882a593Smuzhiyunby default "/sbin/modprobe".  This binary is executed when the kernel
487*4882a593Smuzhiyunrequests a module.  For example, if userspace passes an unknown
488*4882a593Smuzhiyunfilesystem type to mount(), then the kernel will automatically request
489*4882a593Smuzhiyunthe corresponding filesystem module by executing this usermode helper.
490*4882a593SmuzhiyunThis usermode helper should insert the needed module into the kernel.
491*4882a593Smuzhiyun
492*4882a593SmuzhiyunThis sysctl only affects module autoloading.  It has no effect on the
493*4882a593Smuzhiyunability to explicitly insert modules.
494*4882a593Smuzhiyun
495*4882a593SmuzhiyunThis sysctl can be used to debug module loading requests::
496*4882a593Smuzhiyun
497*4882a593Smuzhiyun    echo '#! /bin/sh' > /tmp/modprobe
498*4882a593Smuzhiyun    echo 'echo "$@" >> /tmp/modprobe.log' >> /tmp/modprobe
499*4882a593Smuzhiyun    echo 'exec /sbin/modprobe "$@"' >> /tmp/modprobe
500*4882a593Smuzhiyun    chmod a+x /tmp/modprobe
501*4882a593Smuzhiyun    echo /tmp/modprobe > /proc/sys/kernel/modprobe
502*4882a593Smuzhiyun
503*4882a593SmuzhiyunAlternatively, if this sysctl is set to the empty string, then module
504*4882a593Smuzhiyunautoloading is completely disabled.  The kernel will not try to
505*4882a593Smuzhiyunexecute a usermode helper at all, nor will it call the
506*4882a593Smuzhiyunkernel_module_request LSM hook.
507*4882a593Smuzhiyun
508*4882a593SmuzhiyunIf CONFIG_STATIC_USERMODEHELPER=y is set in the kernel configuration,
509*4882a593Smuzhiyunthen the configured static usermode helper overrides this sysctl,
510*4882a593Smuzhiyunexcept that the empty string is still accepted to completely disable
511*4882a593Smuzhiyunmodule autoloading as described above.
512*4882a593Smuzhiyun
513*4882a593Smuzhiyunmodules_disabled
514*4882a593Smuzhiyun================
515*4882a593Smuzhiyun
516*4882a593SmuzhiyunA toggle value indicating if modules are allowed to be loaded
517*4882a593Smuzhiyunin an otherwise modular kernel.  This toggle defaults to off
518*4882a593Smuzhiyun(0), but can be set true (1).  Once true, modules can be
519*4882a593Smuzhiyunneither loaded nor unloaded, and the toggle cannot be set back
520*4882a593Smuzhiyunto false.  Generally used with the `kexec_load_disabled`_ toggle.
521*4882a593Smuzhiyun
522*4882a593Smuzhiyun
523*4882a593Smuzhiyun.. _msgmni:
524*4882a593Smuzhiyun
525*4882a593Smuzhiyunmsgmax, msgmnb, and msgmni
526*4882a593Smuzhiyun==========================
527*4882a593Smuzhiyun
528*4882a593Smuzhiyun``msgmax`` is the maximum size of an IPC message, in bytes. 8192 by
529*4882a593Smuzhiyundefault (``MSGMAX``).
530*4882a593Smuzhiyun
531*4882a593Smuzhiyun``msgmnb`` is the maximum size of an IPC queue, in bytes. 16384 by
532*4882a593Smuzhiyundefault (``MSGMNB``).
533*4882a593Smuzhiyun
534*4882a593Smuzhiyun``msgmni`` is the maximum number of IPC queues. 32000 by default
535*4882a593Smuzhiyun(``MSGMNI``).
536*4882a593Smuzhiyun
537*4882a593Smuzhiyun
538*4882a593Smuzhiyunmsg_next_id, sem_next_id, and shm_next_id (System V IPC)
539*4882a593Smuzhiyun========================================================
540*4882a593Smuzhiyun
541*4882a593SmuzhiyunThese three toggles allows to specify desired id for next allocated IPC
542*4882a593Smuzhiyunobject: message, semaphore or shared memory respectively.
543*4882a593Smuzhiyun
544*4882a593SmuzhiyunBy default they are equal to -1, which means generic allocation logic.
545*4882a593SmuzhiyunPossible values to set are in range {0:``INT_MAX``}.
546*4882a593Smuzhiyun
547*4882a593SmuzhiyunNotes:
548*4882a593Smuzhiyun  1) kernel doesn't guarantee, that new object will have desired id. So,
549*4882a593Smuzhiyun     it's up to userspace, how to handle an object with "wrong" id.
550*4882a593Smuzhiyun  2) Toggle with non-default value will be set back to -1 by kernel after
551*4882a593Smuzhiyun     successful IPC object allocation. If an IPC object allocation syscall
552*4882a593Smuzhiyun     fails, it is undefined if the value remains unmodified or is reset to -1.
553*4882a593Smuzhiyun
554*4882a593Smuzhiyun
555*4882a593Smuzhiyunngroups_max
556*4882a593Smuzhiyun===========
557*4882a593Smuzhiyun
558*4882a593SmuzhiyunMaximum number of supplementary groups, _i.e._ the maximum size which
559*4882a593Smuzhiyun``setgroups`` will accept. Exports ``NGROUPS_MAX`` from the kernel.
560*4882a593Smuzhiyun
561*4882a593Smuzhiyun
562*4882a593Smuzhiyun
563*4882a593Smuzhiyunnmi_watchdog
564*4882a593Smuzhiyun============
565*4882a593Smuzhiyun
566*4882a593SmuzhiyunThis parameter can be used to control the NMI watchdog
567*4882a593Smuzhiyun(i.e. the hard lockup detector) on x86 systems.
568*4882a593Smuzhiyun
569*4882a593Smuzhiyun= =================================
570*4882a593Smuzhiyun0 Disable the hard lockup detector.
571*4882a593Smuzhiyun1 Enable the hard lockup detector.
572*4882a593Smuzhiyun= =================================
573*4882a593Smuzhiyun
574*4882a593SmuzhiyunThe hard lockup detector monitors each CPU for its ability to respond to
575*4882a593Smuzhiyuntimer interrupts. The mechanism utilizes CPU performance counter registers
576*4882a593Smuzhiyunthat are programmed to generate Non-Maskable Interrupts (NMIs) periodically
577*4882a593Smuzhiyunwhile a CPU is busy. Hence, the alternative name 'NMI watchdog'.
578*4882a593Smuzhiyun
579*4882a593SmuzhiyunThe NMI watchdog is disabled by default if the kernel is running as a guest
580*4882a593Smuzhiyunin a KVM virtual machine. This default can be overridden by adding::
581*4882a593Smuzhiyun
582*4882a593Smuzhiyun   nmi_watchdog=1
583*4882a593Smuzhiyun
584*4882a593Smuzhiyunto the guest kernel command line (see :doc:`/admin-guide/kernel-parameters`).
585*4882a593Smuzhiyun
586*4882a593Smuzhiyun
587*4882a593Smuzhiyunnuma_balancing
588*4882a593Smuzhiyun==============
589*4882a593Smuzhiyun
590*4882a593SmuzhiyunEnables/disables automatic page fault based NUMA memory
591*4882a593Smuzhiyunbalancing. Memory is moved automatically to nodes
592*4882a593Smuzhiyunthat access it often.
593*4882a593Smuzhiyun
594*4882a593SmuzhiyunEnables/disables automatic NUMA memory balancing. On NUMA machines, there
595*4882a593Smuzhiyunis a performance penalty if remote memory is accessed by a CPU. When this
596*4882a593Smuzhiyunfeature is enabled the kernel samples what task thread is accessing memory
597*4882a593Smuzhiyunby periodically unmapping pages and later trapping a page fault. At the
598*4882a593Smuzhiyuntime of the page fault, it is determined if the data being accessed should
599*4882a593Smuzhiyunbe migrated to a local memory node.
600*4882a593Smuzhiyun
601*4882a593SmuzhiyunThe unmapping of pages and trapping faults incur additional overhead that
602*4882a593Smuzhiyunideally is offset by improved memory locality but there is no universal
603*4882a593Smuzhiyunguarantee. If the target workload is already bound to NUMA nodes then this
604*4882a593Smuzhiyunfeature should be disabled. Otherwise, if the system overhead from the
605*4882a593Smuzhiyunfeature is too high then the rate the kernel samples for NUMA hinting
606*4882a593Smuzhiyunfaults may be controlled by the `numa_balancing_scan_period_min_ms,
607*4882a593Smuzhiyunnuma_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
608*4882a593Smuzhiyunnuma_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls.
609*4882a593Smuzhiyun
610*4882a593Smuzhiyun
611*4882a593Smuzhiyunnuma_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
612*4882a593Smuzhiyun===============================================================================================================================
613*4882a593Smuzhiyun
614*4882a593Smuzhiyun
615*4882a593SmuzhiyunAutomatic NUMA balancing scans tasks address space and unmaps pages to
616*4882a593Smuzhiyundetect if pages are properly placed or if the data should be migrated to a
617*4882a593Smuzhiyunmemory node local to where the task is running.  Every "scan delay" the task
618*4882a593Smuzhiyunscans the next "scan size" number of pages in its address space. When the
619*4882a593Smuzhiyunend of the address space is reached the scanner restarts from the beginning.
620*4882a593Smuzhiyun
621*4882a593SmuzhiyunIn combination, the "scan delay" and "scan size" determine the scan rate.
622*4882a593SmuzhiyunWhen "scan delay" decreases, the scan rate increases.  The scan delay and
623*4882a593Smuzhiyunhence the scan rate of every task is adaptive and depends on historical
624*4882a593Smuzhiyunbehaviour. If pages are properly placed then the scan delay increases,
625*4882a593Smuzhiyunotherwise the scan delay decreases.  The "scan size" is not adaptive but
626*4882a593Smuzhiyunthe higher the "scan size", the higher the scan rate.
627*4882a593Smuzhiyun
628*4882a593SmuzhiyunHigher scan rates incur higher system overhead as page faults must be
629*4882a593Smuzhiyuntrapped and potentially data must be migrated. However, the higher the scan
630*4882a593Smuzhiyunrate, the more quickly a tasks memory is migrated to a local node if the
631*4882a593Smuzhiyunworkload pattern changes and minimises performance impact due to remote
632*4882a593Smuzhiyunmemory accesses. These sysctls control the thresholds for scan delays and
633*4882a593Smuzhiyunthe number of pages scanned.
634*4882a593Smuzhiyun
635*4882a593Smuzhiyun``numa_balancing_scan_period_min_ms`` is the minimum time in milliseconds to
636*4882a593Smuzhiyunscan a tasks virtual memory. It effectively controls the maximum scanning
637*4882a593Smuzhiyunrate for each task.
638*4882a593Smuzhiyun
639*4882a593Smuzhiyun``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a task
640*4882a593Smuzhiyunwhen it initially forks.
641*4882a593Smuzhiyun
642*4882a593Smuzhiyun``numa_balancing_scan_period_max_ms`` is the maximum time in milliseconds to
643*4882a593Smuzhiyunscan a tasks virtual memory. It effectively controls the minimum scanning
644*4882a593Smuzhiyunrate for each task.
645*4882a593Smuzhiyun
646*4882a593Smuzhiyun``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are
647*4882a593Smuzhiyunscanned for a given scan.
648*4882a593Smuzhiyun
649*4882a593Smuzhiyun
650*4882a593Smuzhiyunoops_all_cpu_backtrace
651*4882a593Smuzhiyun======================
652*4882a593Smuzhiyun
653*4882a593SmuzhiyunIf this option is set, the kernel will send an NMI to all CPUs to dump
654*4882a593Smuzhiyuntheir backtraces when an oops event occurs. It should be used as a last
655*4882a593Smuzhiyunresort in case a panic cannot be triggered (to protect VMs running, for
656*4882a593Smuzhiyunexample) or kdump can't be collected. This file shows up if CONFIG_SMP
657*4882a593Smuzhiyunis enabled.
658*4882a593Smuzhiyun
659*4882a593Smuzhiyun0: Won't show all CPUs backtraces when an oops is detected.
660*4882a593SmuzhiyunThis is the default behavior.
661*4882a593Smuzhiyun
662*4882a593Smuzhiyun1: Will non-maskably interrupt all CPUs and dump their backtraces when
663*4882a593Smuzhiyunan oops event is detected.
664*4882a593Smuzhiyun
665*4882a593Smuzhiyun
666*4882a593Smuzhiyunosrelease, ostype & version
667*4882a593Smuzhiyun===========================
668*4882a593Smuzhiyun
669*4882a593Smuzhiyun::
670*4882a593Smuzhiyun
671*4882a593Smuzhiyun  # cat osrelease
672*4882a593Smuzhiyun  2.1.88
673*4882a593Smuzhiyun  # cat ostype
674*4882a593Smuzhiyun  Linux
675*4882a593Smuzhiyun  # cat version
676*4882a593Smuzhiyun  #5 Wed Feb 25 21:49:24 MET 1998
677*4882a593Smuzhiyun
678*4882a593SmuzhiyunThe files ``osrelease`` and ``ostype`` should be clear enough.
679*4882a593Smuzhiyun``version``
680*4882a593Smuzhiyunneeds a little more clarification however. The '#5' means that
681*4882a593Smuzhiyunthis is the fifth kernel built from this source base and the
682*4882a593Smuzhiyundate behind it indicates the time the kernel was built.
683*4882a593SmuzhiyunThe only way to tune these values is to rebuild the kernel :-)
684*4882a593Smuzhiyun
685*4882a593Smuzhiyun
686*4882a593Smuzhiyunoverflowgid & overflowuid
687*4882a593Smuzhiyun=========================
688*4882a593Smuzhiyun
689*4882a593Smuzhiyunif your architecture did not always support 32-bit UIDs (i.e. arm,
690*4882a593Smuzhiyuni386, m68k, sh, and sparc32), a fixed UID and GID will be returned to
691*4882a593Smuzhiyunapplications that use the old 16-bit UID/GID system calls, if the
692*4882a593Smuzhiyunactual UID or GID would exceed 65535.
693*4882a593Smuzhiyun
694*4882a593SmuzhiyunThese sysctls allow you to change the value of the fixed UID and GID.
695*4882a593SmuzhiyunThe default is 65534.
696*4882a593Smuzhiyun
697*4882a593Smuzhiyun
698*4882a593Smuzhiyunpanic
699*4882a593Smuzhiyun=====
700*4882a593Smuzhiyun
701*4882a593SmuzhiyunThe value in this file determines the behaviour of the kernel on a
702*4882a593Smuzhiyunpanic:
703*4882a593Smuzhiyun
704*4882a593Smuzhiyun* if zero, the kernel will loop forever;
705*4882a593Smuzhiyun* if negative, the kernel will reboot immediately;
706*4882a593Smuzhiyun* if positive, the kernel will reboot after the corresponding number
707*4882a593Smuzhiyun  of seconds.
708*4882a593Smuzhiyun
709*4882a593SmuzhiyunWhen you use the software watchdog, the recommended setting is 60.
710*4882a593Smuzhiyun
711*4882a593Smuzhiyun
712*4882a593Smuzhiyunpanic_on_io_nmi
713*4882a593Smuzhiyun===============
714*4882a593Smuzhiyun
715*4882a593SmuzhiyunControls the kernel's behavior when a CPU receives an NMI caused by
716*4882a593Smuzhiyunan IO error.
717*4882a593Smuzhiyun
718*4882a593Smuzhiyun= ==================================================================
719*4882a593Smuzhiyun0 Try to continue operation (default).
720*4882a593Smuzhiyun1 Panic immediately. The IO error triggered an NMI. This indicates a
721*4882a593Smuzhiyun  serious system condition which could result in IO data corruption.
722*4882a593Smuzhiyun  Rather than continuing, panicking might be a better choice. Some
723*4882a593Smuzhiyun  servers issue this sort of NMI when the dump button is pushed,
724*4882a593Smuzhiyun  and you can use this option to take a crash dump.
725*4882a593Smuzhiyun= ==================================================================
726*4882a593Smuzhiyun
727*4882a593Smuzhiyun
728*4882a593Smuzhiyunpanic_on_oops
729*4882a593Smuzhiyun=============
730*4882a593Smuzhiyun
731*4882a593SmuzhiyunControls the kernel's behaviour when an oops or BUG is encountered.
732*4882a593Smuzhiyun
733*4882a593Smuzhiyun= ===================================================================
734*4882a593Smuzhiyun0 Try to continue operation.
735*4882a593Smuzhiyun1 Panic immediately.  If the `panic` sysctl is also non-zero then the
736*4882a593Smuzhiyun  machine will be rebooted.
737*4882a593Smuzhiyun= ===================================================================
738*4882a593Smuzhiyun
739*4882a593Smuzhiyun
740*4882a593Smuzhiyunpanic_on_stackoverflow
741*4882a593Smuzhiyun======================
742*4882a593Smuzhiyun
743*4882a593SmuzhiyunControls the kernel's behavior when detecting the overflows of
744*4882a593Smuzhiyunkernel, IRQ and exception stacks except a user stack.
745*4882a593SmuzhiyunThis file shows up if ``CONFIG_DEBUG_STACKOVERFLOW`` is enabled.
746*4882a593Smuzhiyun
747*4882a593Smuzhiyun= ==========================
748*4882a593Smuzhiyun0 Try to continue operation.
749*4882a593Smuzhiyun1 Panic immediately.
750*4882a593Smuzhiyun= ==========================
751*4882a593Smuzhiyun
752*4882a593Smuzhiyun
753*4882a593Smuzhiyunpanic_on_unrecovered_nmi
754*4882a593Smuzhiyun========================
755*4882a593Smuzhiyun
756*4882a593SmuzhiyunThe default Linux behaviour on an NMI of either memory or unknown is
757*4882a593Smuzhiyunto continue operation. For many environments such as scientific
758*4882a593Smuzhiyuncomputing it is preferable that the box is taken out and the error
759*4882a593Smuzhiyundealt with than an uncorrected parity/ECC error get propagated.
760*4882a593Smuzhiyun
761*4882a593SmuzhiyunA small number of systems do generate NMIs for bizarre random reasons
762*4882a593Smuzhiyunsuch as power management so the default is off. That sysctl works like
763*4882a593Smuzhiyunthe existing panic controls already in that directory.
764*4882a593Smuzhiyun
765*4882a593Smuzhiyun
766*4882a593Smuzhiyunpanic_on_warn
767*4882a593Smuzhiyun=============
768*4882a593Smuzhiyun
769*4882a593SmuzhiyunCalls panic() in the WARN() path when set to 1.  This is useful to avoid
770*4882a593Smuzhiyuna kernel rebuild when attempting to kdump at the location of a WARN().
771*4882a593Smuzhiyun
772*4882a593Smuzhiyun= ================================================
773*4882a593Smuzhiyun0 Only WARN(), default behaviour.
774*4882a593Smuzhiyun1 Call panic() after printing out WARN() location.
775*4882a593Smuzhiyun= ================================================
776*4882a593Smuzhiyun
777*4882a593Smuzhiyun
778*4882a593Smuzhiyunpanic_print
779*4882a593Smuzhiyun===========
780*4882a593Smuzhiyun
781*4882a593SmuzhiyunBitmask for printing system info when panic happens. User can chose
782*4882a593Smuzhiyuncombination of the following bits:
783*4882a593Smuzhiyun
784*4882a593Smuzhiyun=====  ============================================
785*4882a593Smuzhiyunbit 0  print all tasks info
786*4882a593Smuzhiyunbit 1  print system memory info
787*4882a593Smuzhiyunbit 2  print timer info
788*4882a593Smuzhiyunbit 3  print locks info if ``CONFIG_LOCKDEP`` is on
789*4882a593Smuzhiyunbit 4  print ftrace buffer
790*4882a593Smuzhiyunbit 5  print all printk messages in buffer
791*4882a593Smuzhiyun=====  ============================================
792*4882a593Smuzhiyun
793*4882a593SmuzhiyunSo for example to print tasks and memory info on panic, user can::
794*4882a593Smuzhiyun
795*4882a593Smuzhiyun  echo 3 > /proc/sys/kernel/panic_print
796*4882a593Smuzhiyun
797*4882a593Smuzhiyun
798*4882a593Smuzhiyunpanic_on_rcu_stall
799*4882a593Smuzhiyun==================
800*4882a593Smuzhiyun
801*4882a593SmuzhiyunWhen set to 1, calls panic() after RCU stall detection messages. This
802*4882a593Smuzhiyunis useful to define the root cause of RCU stalls using a vmcore.
803*4882a593Smuzhiyun
804*4882a593Smuzhiyun= ============================================================
805*4882a593Smuzhiyun0 Do not panic() when RCU stall takes place, default behavior.
806*4882a593Smuzhiyun1 panic() after printing RCU stall messages.
807*4882a593Smuzhiyun= ============================================================
808*4882a593Smuzhiyun
809*4882a593Smuzhiyun
810*4882a593Smuzhiyunperf_cpu_time_max_percent
811*4882a593Smuzhiyun=========================
812*4882a593Smuzhiyun
813*4882a593SmuzhiyunHints to the kernel how much CPU time it should be allowed to
814*4882a593Smuzhiyunuse to handle perf sampling events.  If the perf subsystem
815*4882a593Smuzhiyunis informed that its samples are exceeding this limit, it
816*4882a593Smuzhiyunwill drop its sampling frequency to attempt to reduce its CPU
817*4882a593Smuzhiyunusage.
818*4882a593Smuzhiyun
819*4882a593SmuzhiyunSome perf sampling happens in NMIs.  If these samples
820*4882a593Smuzhiyununexpectedly take too long to execute, the NMIs can become
821*4882a593Smuzhiyunstacked up next to each other so much that nothing else is
822*4882a593Smuzhiyunallowed to execute.
823*4882a593Smuzhiyun
824*4882a593Smuzhiyun===== ========================================================
825*4882a593Smuzhiyun0     Disable the mechanism.  Do not monitor or correct perf's
826*4882a593Smuzhiyun      sampling rate no matter how CPU time it takes.
827*4882a593Smuzhiyun
828*4882a593Smuzhiyun1-100 Attempt to throttle perf's sample rate to this
829*4882a593Smuzhiyun      percentage of CPU.  Note: the kernel calculates an
830*4882a593Smuzhiyun      "expected" length of each sample event.  100 here means
831*4882a593Smuzhiyun      100% of that expected length.  Even if this is set to
832*4882a593Smuzhiyun      100, you may still see sample throttling if this
833*4882a593Smuzhiyun      length is exceeded.  Set to 0 if you truly do not care
834*4882a593Smuzhiyun      how much CPU is consumed.
835*4882a593Smuzhiyun===== ========================================================
836*4882a593Smuzhiyun
837*4882a593Smuzhiyun
838*4882a593Smuzhiyunperf_event_paranoid
839*4882a593Smuzhiyun===================
840*4882a593Smuzhiyun
841*4882a593SmuzhiyunControls use of the performance events system by unprivileged
842*4882a593Smuzhiyunusers (without CAP_PERFMON).  The default value is 2.
843*4882a593Smuzhiyun
844*4882a593SmuzhiyunFor backward compatibility reasons access to system performance
845*4882a593Smuzhiyunmonitoring and observability remains open for CAP_SYS_ADMIN
846*4882a593Smuzhiyunprivileged processes but CAP_SYS_ADMIN usage for secure system
847*4882a593Smuzhiyunperformance monitoring and observability operations is discouraged
848*4882a593Smuzhiyunwith respect to CAP_PERFMON use cases.
849*4882a593Smuzhiyun
850*4882a593Smuzhiyun===  ==================================================================
851*4882a593Smuzhiyun -1  Allow use of (almost) all events by all users.
852*4882a593Smuzhiyun
853*4882a593Smuzhiyun     Ignore mlock limit after perf_event_mlock_kb without
854*4882a593Smuzhiyun     ``CAP_IPC_LOCK``.
855*4882a593Smuzhiyun
856*4882a593Smuzhiyun>=0  Disallow ftrace function tracepoint by users without
857*4882a593Smuzhiyun     ``CAP_PERFMON``.
858*4882a593Smuzhiyun
859*4882a593Smuzhiyun     Disallow raw tracepoint access by users without ``CAP_PERFMON``.
860*4882a593Smuzhiyun
861*4882a593Smuzhiyun>=1  Disallow CPU event access by users without ``CAP_PERFMON``.
862*4882a593Smuzhiyun
863*4882a593Smuzhiyun>=2  Disallow kernel profiling by users without ``CAP_PERFMON``.
864*4882a593Smuzhiyun===  ==================================================================
865*4882a593Smuzhiyun
866*4882a593Smuzhiyun
867*4882a593Smuzhiyunperf_event_max_stack
868*4882a593Smuzhiyun====================
869*4882a593Smuzhiyun
870*4882a593SmuzhiyunControls maximum number of stack frames to copy for (``attr.sample_type &
871*4882a593SmuzhiyunPERF_SAMPLE_CALLCHAIN``) configured events, for instance, when using
872*4882a593Smuzhiyun'``perf record -g``' or '``perf trace --call-graph fp``'.
873*4882a593Smuzhiyun
874*4882a593SmuzhiyunThis can only be done when no events are in use that have callchains
875*4882a593Smuzhiyunenabled, otherwise writing to this file will return ``-EBUSY``.
876*4882a593Smuzhiyun
877*4882a593SmuzhiyunThe default value is 127.
878*4882a593Smuzhiyun
879*4882a593Smuzhiyun
880*4882a593Smuzhiyunperf_event_mlock_kb
881*4882a593Smuzhiyun===================
882*4882a593Smuzhiyun
883*4882a593SmuzhiyunControl size of per-cpu ring buffer not counted agains mlock limit.
884*4882a593Smuzhiyun
885*4882a593SmuzhiyunThe default value is 512 + 1 page
886*4882a593Smuzhiyun
887*4882a593Smuzhiyun
888*4882a593Smuzhiyunperf_event_max_contexts_per_stack
889*4882a593Smuzhiyun=================================
890*4882a593Smuzhiyun
891*4882a593SmuzhiyunControls maximum number of stack frame context entries for
892*4882a593Smuzhiyun(``attr.sample_type & PERF_SAMPLE_CALLCHAIN``) configured events, for
893*4882a593Smuzhiyuninstance, when using '``perf record -g``' or '``perf trace --call-graph fp``'.
894*4882a593Smuzhiyun
895*4882a593SmuzhiyunThis can only be done when no events are in use that have callchains
896*4882a593Smuzhiyunenabled, otherwise writing to this file will return ``-EBUSY``.
897*4882a593Smuzhiyun
898*4882a593SmuzhiyunThe default value is 8.
899*4882a593Smuzhiyun
900*4882a593Smuzhiyun
901*4882a593Smuzhiyunpid_max
902*4882a593Smuzhiyun=======
903*4882a593Smuzhiyun
904*4882a593SmuzhiyunPID allocation wrap value.  When the kernel's next PID value
905*4882a593Smuzhiyunreaches this value, it wraps back to a minimum PID value.
906*4882a593SmuzhiyunPIDs of value ``pid_max`` or larger are not allocated.
907*4882a593Smuzhiyun
908*4882a593Smuzhiyun
909*4882a593Smuzhiyunns_last_pid
910*4882a593Smuzhiyun===========
911*4882a593Smuzhiyun
912*4882a593SmuzhiyunThe last pid allocated in the current (the one task using this sysctl
913*4882a593Smuzhiyunlives in) pid namespace. When selecting a pid for a next task on fork
914*4882a593Smuzhiyunkernel tries to allocate a number starting from this one.
915*4882a593Smuzhiyun
916*4882a593Smuzhiyun
917*4882a593Smuzhiyunpowersave-nap (PPC only)
918*4882a593Smuzhiyun========================
919*4882a593Smuzhiyun
920*4882a593SmuzhiyunIf set, Linux-PPC will use the 'nap' mode of powersaving,
921*4882a593Smuzhiyunotherwise the 'doze' mode will be used.
922*4882a593Smuzhiyun
923*4882a593Smuzhiyun
924*4882a593Smuzhiyun==============================================================
925*4882a593Smuzhiyun
926*4882a593Smuzhiyunprintk
927*4882a593Smuzhiyun======
928*4882a593Smuzhiyun
929*4882a593SmuzhiyunThe four values in printk denote: ``console_loglevel``,
930*4882a593Smuzhiyun``default_message_loglevel``, ``minimum_console_loglevel`` and
931*4882a593Smuzhiyun``default_console_loglevel`` respectively.
932*4882a593Smuzhiyun
933*4882a593SmuzhiyunThese values influence printk() behavior when printing or
934*4882a593Smuzhiyunlogging error messages. See '``man 2 syslog``' for more info on
935*4882a593Smuzhiyunthe different loglevels.
936*4882a593Smuzhiyun
937*4882a593Smuzhiyun======================== =====================================
938*4882a593Smuzhiyunconsole_loglevel         messages with a higher priority than
939*4882a593Smuzhiyun                         this will be printed to the console
940*4882a593Smuzhiyundefault_message_loglevel messages without an explicit priority
941*4882a593Smuzhiyun                         will be printed with this priority
942*4882a593Smuzhiyunminimum_console_loglevel minimum (highest) value to which
943*4882a593Smuzhiyun                         console_loglevel can be set
944*4882a593Smuzhiyundefault_console_loglevel default value for console_loglevel
945*4882a593Smuzhiyun======================== =====================================
946*4882a593Smuzhiyun
947*4882a593Smuzhiyun
948*4882a593Smuzhiyunprintk_delay
949*4882a593Smuzhiyun============
950*4882a593Smuzhiyun
951*4882a593SmuzhiyunDelay each printk message in ``printk_delay`` milliseconds
952*4882a593Smuzhiyun
953*4882a593SmuzhiyunValue from 0 - 10000 is allowed.
954*4882a593Smuzhiyun
955*4882a593Smuzhiyun
956*4882a593Smuzhiyunprintk_ratelimit
957*4882a593Smuzhiyun================
958*4882a593Smuzhiyun
959*4882a593SmuzhiyunSome warning messages are rate limited. ``printk_ratelimit`` specifies
960*4882a593Smuzhiyunthe minimum length of time between these messages (in seconds).
961*4882a593SmuzhiyunThe default value is 5 seconds.
962*4882a593Smuzhiyun
963*4882a593SmuzhiyunA value of 0 will disable rate limiting.
964*4882a593Smuzhiyun
965*4882a593Smuzhiyun
966*4882a593Smuzhiyunprintk_ratelimit_burst
967*4882a593Smuzhiyun======================
968*4882a593Smuzhiyun
969*4882a593SmuzhiyunWhile long term we enforce one message per `printk_ratelimit`_
970*4882a593Smuzhiyunseconds, we do allow a burst of messages to pass through.
971*4882a593Smuzhiyun``printk_ratelimit_burst`` specifies the number of messages we can
972*4882a593Smuzhiyunsend before ratelimiting kicks in.
973*4882a593Smuzhiyun
974*4882a593SmuzhiyunThe default value is 10 messages.
975*4882a593Smuzhiyun
976*4882a593Smuzhiyun
977*4882a593Smuzhiyunprintk_devkmsg
978*4882a593Smuzhiyun==============
979*4882a593Smuzhiyun
980*4882a593SmuzhiyunControl the logging to ``/dev/kmsg`` from userspace:
981*4882a593Smuzhiyun
982*4882a593Smuzhiyun========= =============================================
983*4882a593Smuzhiyunratelimit default, ratelimited
984*4882a593Smuzhiyunon        unlimited logging to /dev/kmsg from userspace
985*4882a593Smuzhiyunoff       logging to /dev/kmsg disabled
986*4882a593Smuzhiyun========= =============================================
987*4882a593Smuzhiyun
988*4882a593SmuzhiyunThe kernel command line parameter ``printk.devkmsg=`` overrides this and is
989*4882a593Smuzhiyuna one-time setting until next reboot: once set, it cannot be changed by
990*4882a593Smuzhiyunthis sysctl interface anymore.
991*4882a593Smuzhiyun
992*4882a593Smuzhiyun==============================================================
993*4882a593Smuzhiyun
994*4882a593Smuzhiyun
995*4882a593Smuzhiyunpty
996*4882a593Smuzhiyun===
997*4882a593Smuzhiyun
998*4882a593SmuzhiyunSee Documentation/filesystems/devpts.rst.
999*4882a593Smuzhiyun
1000*4882a593Smuzhiyun
1001*4882a593Smuzhiyunrandom
1002*4882a593Smuzhiyun======
1003*4882a593Smuzhiyun
1004*4882a593SmuzhiyunThis is a directory, with the following entries:
1005*4882a593Smuzhiyun
1006*4882a593Smuzhiyun* ``boot_id``: a UUID generated the first time this is retrieved, and
1007*4882a593Smuzhiyun  unvarying after that;
1008*4882a593Smuzhiyun
1009*4882a593Smuzhiyun* ``uuid``: a UUID generated every time this is retrieved (this can
1010*4882a593Smuzhiyun  thus be used to generate UUIDs at will);
1011*4882a593Smuzhiyun
1012*4882a593Smuzhiyun* ``entropy_avail``: the pool's entropy count, in bits;
1013*4882a593Smuzhiyun
1014*4882a593Smuzhiyun* ``poolsize``: the entropy pool size, in bits;
1015*4882a593Smuzhiyun
1016*4882a593Smuzhiyun* ``urandom_min_reseed_secs``: obsolete (used to determine the minimum
1017*4882a593Smuzhiyun  number of seconds between urandom pool reseeding). This file is
1018*4882a593Smuzhiyun  writable for compatibility purposes, but writing to it has no effect
1019*4882a593Smuzhiyun  on any RNG behavior;
1020*4882a593Smuzhiyun
1021*4882a593Smuzhiyun* ``write_wakeup_threshold``: when the entropy count drops below this
1022*4882a593Smuzhiyun  (as a number of bits), processes waiting to write to ``/dev/random``
1023*4882a593Smuzhiyun  are woken up. This file is writable for compatibility purposes, but
1024*4882a593Smuzhiyun  writing to it has no effect on any RNG behavior.
1025*4882a593Smuzhiyun
1026*4882a593Smuzhiyun
1027*4882a593Smuzhiyunrandomize_va_space
1028*4882a593Smuzhiyun==================
1029*4882a593Smuzhiyun
1030*4882a593SmuzhiyunThis option can be used to select the type of process address
1031*4882a593Smuzhiyunspace randomization that is used in the system, for architectures
1032*4882a593Smuzhiyunthat support this feature.
1033*4882a593Smuzhiyun
1034*4882a593Smuzhiyun==  ===========================================================================
1035*4882a593Smuzhiyun0   Turn the process address space randomization off.  This is the
1036*4882a593Smuzhiyun    default for architectures that do not support this feature anyways,
1037*4882a593Smuzhiyun    and kernels that are booted with the "norandmaps" parameter.
1038*4882a593Smuzhiyun
1039*4882a593Smuzhiyun1   Make the addresses of mmap base, stack and VDSO page randomized.
1040*4882a593Smuzhiyun    This, among other things, implies that shared libraries will be
1041*4882a593Smuzhiyun    loaded to random addresses.  Also for PIE-linked binaries, the
1042*4882a593Smuzhiyun    location of code start is randomized.  This is the default if the
1043*4882a593Smuzhiyun    ``CONFIG_COMPAT_BRK`` option is enabled.
1044*4882a593Smuzhiyun
1045*4882a593Smuzhiyun2   Additionally enable heap randomization.  This is the default if
1046*4882a593Smuzhiyun    ``CONFIG_COMPAT_BRK`` is disabled.
1047*4882a593Smuzhiyun
1048*4882a593Smuzhiyun    There are a few legacy applications out there (such as some ancient
1049*4882a593Smuzhiyun    versions of libc.so.5 from 1996) that assume that brk area starts
1050*4882a593Smuzhiyun    just after the end of the code+bss.  These applications break when
1051*4882a593Smuzhiyun    start of the brk area is randomized.  There are however no known
1052*4882a593Smuzhiyun    non-legacy applications that would be broken this way, so for most
1053*4882a593Smuzhiyun    systems it is safe to choose full randomization.
1054*4882a593Smuzhiyun
1055*4882a593Smuzhiyun    Systems with ancient and/or broken binaries should be configured
1056*4882a593Smuzhiyun    with ``CONFIG_COMPAT_BRK`` enabled, which excludes the heap from process
1057*4882a593Smuzhiyun    address space randomization.
1058*4882a593Smuzhiyun==  ===========================================================================
1059*4882a593Smuzhiyun
1060*4882a593Smuzhiyun
1061*4882a593Smuzhiyunreal-root-dev
1062*4882a593Smuzhiyun=============
1063*4882a593Smuzhiyun
1064*4882a593SmuzhiyunSee :doc:`/admin-guide/initrd`.
1065*4882a593Smuzhiyun
1066*4882a593Smuzhiyun
1067*4882a593Smuzhiyunreboot-cmd (SPARC only)
1068*4882a593Smuzhiyun=======================
1069*4882a593Smuzhiyun
1070*4882a593Smuzhiyun??? This seems to be a way to give an argument to the Sparc
1071*4882a593SmuzhiyunROM/Flash boot loader. Maybe to tell it what to do after
1072*4882a593Smuzhiyunrebooting. ???
1073*4882a593Smuzhiyun
1074*4882a593Smuzhiyun
1075*4882a593Smuzhiyunsched_energy_aware
1076*4882a593Smuzhiyun==================
1077*4882a593Smuzhiyun
1078*4882a593SmuzhiyunEnables/disables Energy Aware Scheduling (EAS). EAS starts
1079*4882a593Smuzhiyunautomatically on platforms where it can run (that is,
1080*4882a593Smuzhiyunplatforms with asymmetric CPU topologies and having an Energy
1081*4882a593SmuzhiyunModel available). If your platform happens to meet the
1082*4882a593Smuzhiyunrequirements for EAS but you do not want to use it, change
1083*4882a593Smuzhiyunthis value to 0.
1084*4882a593Smuzhiyun
1085*4882a593Smuzhiyun
1086*4882a593Smuzhiyunsched_schedstats
1087*4882a593Smuzhiyun================
1088*4882a593Smuzhiyun
1089*4882a593SmuzhiyunEnables/disables scheduler statistics. Enabling this feature
1090*4882a593Smuzhiyunincurs a small amount of overhead in the scheduler but is
1091*4882a593Smuzhiyunuseful for debugging and performance tuning.
1092*4882a593Smuzhiyun
1093*4882a593Smuzhiyunsched_util_clamp_min:
1094*4882a593Smuzhiyun=====================
1095*4882a593Smuzhiyun
1096*4882a593SmuzhiyunMax allowed *minimum* utilization.
1097*4882a593Smuzhiyun
1098*4882a593SmuzhiyunDefault value is 1024, which is the maximum possible value.
1099*4882a593Smuzhiyun
1100*4882a593SmuzhiyunIt means that any requested uclamp.min value cannot be greater than
1101*4882a593Smuzhiyunsched_util_clamp_min, i.e., it is restricted to the range
1102*4882a593Smuzhiyun[0:sched_util_clamp_min].
1103*4882a593Smuzhiyun
1104*4882a593Smuzhiyunsched_util_clamp_max:
1105*4882a593Smuzhiyun=====================
1106*4882a593Smuzhiyun
1107*4882a593SmuzhiyunMax allowed *maximum* utilization.
1108*4882a593Smuzhiyun
1109*4882a593SmuzhiyunDefault value is 1024, which is the maximum possible value.
1110*4882a593Smuzhiyun
1111*4882a593SmuzhiyunIt means that any requested uclamp.max value cannot be greater than
1112*4882a593Smuzhiyunsched_util_clamp_max, i.e., it is restricted to the range
1113*4882a593Smuzhiyun[0:sched_util_clamp_max].
1114*4882a593Smuzhiyun
1115*4882a593Smuzhiyunsched_util_clamp_min_rt_default:
1116*4882a593Smuzhiyun================================
1117*4882a593Smuzhiyun
1118*4882a593SmuzhiyunBy default Linux is tuned for performance. Which means that RT tasks always run
1119*4882a593Smuzhiyunat the highest frequency and most capable (highest capacity) CPU (in
1120*4882a593Smuzhiyunheterogeneous systems).
1121*4882a593Smuzhiyun
1122*4882a593SmuzhiyunUclamp achieves this by setting the requested uclamp.min of all RT tasks to
1123*4882a593Smuzhiyun1024 by default, which effectively boosts the tasks to run at the highest
1124*4882a593Smuzhiyunfrequency and biases them to run on the biggest CPU.
1125*4882a593Smuzhiyun
1126*4882a593SmuzhiyunThis knob allows admins to change the default behavior when uclamp is being
1127*4882a593Smuzhiyunused. In battery powered devices particularly, running at the maximum
1128*4882a593Smuzhiyuncapacity and frequency will increase energy consumption and shorten the battery
1129*4882a593Smuzhiyunlife.
1130*4882a593Smuzhiyun
1131*4882a593SmuzhiyunThis knob is only effective for RT tasks which the user hasn't modified their
1132*4882a593Smuzhiyunrequested uclamp.min value via sched_setattr() syscall.
1133*4882a593Smuzhiyun
1134*4882a593SmuzhiyunThis knob will not escape the range constraint imposed by sched_util_clamp_min
1135*4882a593Smuzhiyundefined above.
1136*4882a593Smuzhiyun
1137*4882a593SmuzhiyunFor example if
1138*4882a593Smuzhiyun
1139*4882a593Smuzhiyun	sched_util_clamp_min_rt_default = 800
1140*4882a593Smuzhiyun	sched_util_clamp_min = 600
1141*4882a593Smuzhiyun
1142*4882a593SmuzhiyunThen the boost will be clamped to 600 because 800 is outside of the permissible
1143*4882a593Smuzhiyunrange of [0:600]. This could happen for instance if a powersave mode will
1144*4882a593Smuzhiyunrestrict all boosts temporarily by modifying sched_util_clamp_min. As soon as
1145*4882a593Smuzhiyunthis restriction is lifted, the requested sched_util_clamp_min_rt_default
1146*4882a593Smuzhiyunwill take effect.
1147*4882a593Smuzhiyun
1148*4882a593Smuzhiyunseccomp
1149*4882a593Smuzhiyun=======
1150*4882a593Smuzhiyun
1151*4882a593SmuzhiyunSee :doc:`/userspace-api/seccomp_filter`.
1152*4882a593Smuzhiyun
1153*4882a593Smuzhiyun
1154*4882a593Smuzhiyunsg-big-buff
1155*4882a593Smuzhiyun===========
1156*4882a593Smuzhiyun
1157*4882a593SmuzhiyunThis file shows the size of the generic SCSI (sg) buffer.
1158*4882a593SmuzhiyunYou can't tune it just yet, but you could change it on
1159*4882a593Smuzhiyuncompile time by editing ``include/scsi/sg.h`` and changing
1160*4882a593Smuzhiyunthe value of ``SG_BIG_BUFF``.
1161*4882a593Smuzhiyun
1162*4882a593SmuzhiyunThere shouldn't be any reason to change this value. If
1163*4882a593Smuzhiyunyou can come up with one, you probably know what you
1164*4882a593Smuzhiyunare doing anyway :)
1165*4882a593Smuzhiyun
1166*4882a593Smuzhiyun
1167*4882a593Smuzhiyunshmall
1168*4882a593Smuzhiyun======
1169*4882a593Smuzhiyun
1170*4882a593SmuzhiyunThis parameter sets the total amount of shared memory pages that
1171*4882a593Smuzhiyuncan be used system wide. Hence, ``shmall`` should always be at least
1172*4882a593Smuzhiyun``ceil(shmmax/PAGE_SIZE)``.
1173*4882a593Smuzhiyun
1174*4882a593SmuzhiyunIf you are not sure what the default ``PAGE_SIZE`` is on your Linux
1175*4882a593Smuzhiyunsystem, you can run the following command::
1176*4882a593Smuzhiyun
1177*4882a593Smuzhiyun	# getconf PAGE_SIZE
1178*4882a593Smuzhiyun
1179*4882a593Smuzhiyun
1180*4882a593Smuzhiyunshmmax
1181*4882a593Smuzhiyun======
1182*4882a593Smuzhiyun
1183*4882a593SmuzhiyunThis value can be used to query and set the run time limit
1184*4882a593Smuzhiyunon the maximum shared memory segment size that can be created.
1185*4882a593SmuzhiyunShared memory segments up to 1Gb are now supported in the
1186*4882a593Smuzhiyunkernel.  This value defaults to ``SHMMAX``.
1187*4882a593Smuzhiyun
1188*4882a593Smuzhiyun
1189*4882a593Smuzhiyunshmmni
1190*4882a593Smuzhiyun======
1191*4882a593Smuzhiyun
1192*4882a593SmuzhiyunThis value determines the maximum number of shared memory segments.
1193*4882a593Smuzhiyun4096 by default (``SHMMNI``).
1194*4882a593Smuzhiyun
1195*4882a593Smuzhiyun
1196*4882a593Smuzhiyunshm_rmid_forced
1197*4882a593Smuzhiyun===============
1198*4882a593Smuzhiyun
1199*4882a593SmuzhiyunLinux lets you set resource limits, including how much memory one
1200*4882a593Smuzhiyunprocess can consume, via ``setrlimit(2)``.  Unfortunately, shared memory
1201*4882a593Smuzhiyunsegments are allowed to exist without association with any process, and
1202*4882a593Smuzhiyunthus might not be counted against any resource limits.  If enabled,
1203*4882a593Smuzhiyunshared memory segments are automatically destroyed when their attach
1204*4882a593Smuzhiyuncount becomes zero after a detach or a process termination.  It will
1205*4882a593Smuzhiyunalso destroy segments that were created, but never attached to, on exit
1206*4882a593Smuzhiyunfrom the process.  The only use left for ``IPC_RMID`` is to immediately
1207*4882a593Smuzhiyundestroy an unattached segment.  Of course, this breaks the way things are
1208*4882a593Smuzhiyundefined, so some applications might stop working.  Note that this
1209*4882a593Smuzhiyunfeature will do you no good unless you also configure your resource
1210*4882a593Smuzhiyunlimits (in particular, ``RLIMIT_AS`` and ``RLIMIT_NPROC``).  Most systems don't
1211*4882a593Smuzhiyunneed this.
1212*4882a593Smuzhiyun
1213*4882a593SmuzhiyunNote that if you change this from 0 to 1, already created segments
1214*4882a593Smuzhiyunwithout users and with a dead originative process will be destroyed.
1215*4882a593Smuzhiyun
1216*4882a593Smuzhiyun
1217*4882a593Smuzhiyunsysctl_writes_strict
1218*4882a593Smuzhiyun====================
1219*4882a593Smuzhiyun
1220*4882a593SmuzhiyunControl how file position affects the behavior of updating sysctl values
1221*4882a593Smuzhiyunvia the ``/proc/sys`` interface:
1222*4882a593Smuzhiyun
1223*4882a593Smuzhiyun  ==   ======================================================================
1224*4882a593Smuzhiyun  -1   Legacy per-write sysctl value handling, with no printk warnings.
1225*4882a593Smuzhiyun       Each write syscall must fully contain the sysctl value to be
1226*4882a593Smuzhiyun       written, and multiple writes on the same sysctl file descriptor
1227*4882a593Smuzhiyun       will rewrite the sysctl value, regardless of file position.
1228*4882a593Smuzhiyun   0   Same behavior as above, but warn about processes that perform writes
1229*4882a593Smuzhiyun       to a sysctl file descriptor when the file position is not 0.
1230*4882a593Smuzhiyun   1   (default) Respect file position when writing sysctl strings. Multiple
1231*4882a593Smuzhiyun       writes will append to the sysctl value buffer. Anything past the max
1232*4882a593Smuzhiyun       length of the sysctl value buffer will be ignored. Writes to numeric
1233*4882a593Smuzhiyun       sysctl entries must always be at file position 0 and the value must
1234*4882a593Smuzhiyun       be fully contained in the buffer sent in the write syscall.
1235*4882a593Smuzhiyun  ==   ======================================================================
1236*4882a593Smuzhiyun
1237*4882a593Smuzhiyun
1238*4882a593Smuzhiyunsoftlockup_all_cpu_backtrace
1239*4882a593Smuzhiyun============================
1240*4882a593Smuzhiyun
1241*4882a593SmuzhiyunThis value controls the soft lockup detector thread's behavior
1242*4882a593Smuzhiyunwhen a soft lockup condition is detected as to whether or not
1243*4882a593Smuzhiyunto gather further debug information. If enabled, each cpu will
1244*4882a593Smuzhiyunbe issued an NMI and instructed to capture stack trace.
1245*4882a593Smuzhiyun
1246*4882a593SmuzhiyunThis feature is only applicable for architectures which support
1247*4882a593SmuzhiyunNMI.
1248*4882a593Smuzhiyun
1249*4882a593Smuzhiyun= ============================================
1250*4882a593Smuzhiyun0 Do nothing. This is the default behavior.
1251*4882a593Smuzhiyun1 On detection capture more debug information.
1252*4882a593Smuzhiyun= ============================================
1253*4882a593Smuzhiyun
1254*4882a593Smuzhiyun
1255*4882a593Smuzhiyunsoftlockup_panic
1256*4882a593Smuzhiyun=================
1257*4882a593Smuzhiyun
1258*4882a593SmuzhiyunThis parameter can be used to control whether the kernel panics
1259*4882a593Smuzhiyunwhen a soft lockup is detected.
1260*4882a593Smuzhiyun
1261*4882a593Smuzhiyun= ============================================
1262*4882a593Smuzhiyun0 Don't panic on soft lockup.
1263*4882a593Smuzhiyun1 Panic on soft lockup.
1264*4882a593Smuzhiyun= ============================================
1265*4882a593Smuzhiyun
1266*4882a593SmuzhiyunThis can also be set using the softlockup_panic kernel parameter.
1267*4882a593Smuzhiyun
1268*4882a593Smuzhiyun
1269*4882a593Smuzhiyunsoft_watchdog
1270*4882a593Smuzhiyun=============
1271*4882a593Smuzhiyun
1272*4882a593SmuzhiyunThis parameter can be used to control the soft lockup detector.
1273*4882a593Smuzhiyun
1274*4882a593Smuzhiyun= =================================
1275*4882a593Smuzhiyun0 Disable the soft lockup detector.
1276*4882a593Smuzhiyun1 Enable the soft lockup detector.
1277*4882a593Smuzhiyun= =================================
1278*4882a593Smuzhiyun
1279*4882a593SmuzhiyunThe soft lockup detector monitors CPUs for threads that are hogging the CPUs
1280*4882a593Smuzhiyunwithout rescheduling voluntarily, and thus prevent the 'watchdog/N' threads
1281*4882a593Smuzhiyunfrom running. The mechanism depends on the CPUs ability to respond to timer
1282*4882a593Smuzhiyuninterrupts which are needed for the 'watchdog/N' threads to be woken up by
1283*4882a593Smuzhiyunthe watchdog timer function, otherwise the NMI watchdog — if enabled — can
1284*4882a593Smuzhiyundetect a hard lockup condition.
1285*4882a593Smuzhiyun
1286*4882a593Smuzhiyun
1287*4882a593Smuzhiyunstack_erasing
1288*4882a593Smuzhiyun=============
1289*4882a593Smuzhiyun
1290*4882a593SmuzhiyunThis parameter can be used to control kernel stack erasing at the end
1291*4882a593Smuzhiyunof syscalls for kernels built with ``CONFIG_GCC_PLUGIN_STACKLEAK``.
1292*4882a593Smuzhiyun
1293*4882a593SmuzhiyunThat erasing reduces the information which kernel stack leak bugs
1294*4882a593Smuzhiyuncan reveal and blocks some uninitialized stack variable attacks.
1295*4882a593SmuzhiyunThe tradeoff is the performance impact: on a single CPU system kernel
1296*4882a593Smuzhiyuncompilation sees a 1% slowdown, other systems and workloads may vary.
1297*4882a593Smuzhiyun
1298*4882a593Smuzhiyun= ====================================================================
1299*4882a593Smuzhiyun0 Kernel stack erasing is disabled, STACKLEAK_METRICS are not updated.
1300*4882a593Smuzhiyun1 Kernel stack erasing is enabled (default), it is performed before
1301*4882a593Smuzhiyun  returning to the userspace at the end of syscalls.
1302*4882a593Smuzhiyun= ====================================================================
1303*4882a593Smuzhiyun
1304*4882a593Smuzhiyun
1305*4882a593Smuzhiyunstop-a (SPARC only)
1306*4882a593Smuzhiyun===================
1307*4882a593Smuzhiyun
1308*4882a593SmuzhiyunControls Stop-A:
1309*4882a593Smuzhiyun
1310*4882a593Smuzhiyun= ====================================
1311*4882a593Smuzhiyun0 Stop-A has no effect.
1312*4882a593Smuzhiyun1 Stop-A breaks to the PROM (default).
1313*4882a593Smuzhiyun= ====================================
1314*4882a593Smuzhiyun
1315*4882a593SmuzhiyunStop-A is always enabled on a panic, so that the user can return to
1316*4882a593Smuzhiyunthe boot PROM.
1317*4882a593Smuzhiyun
1318*4882a593Smuzhiyun
1319*4882a593Smuzhiyunsysrq
1320*4882a593Smuzhiyun=====
1321*4882a593Smuzhiyun
1322*4882a593SmuzhiyunSee :doc:`/admin-guide/sysrq`.
1323*4882a593Smuzhiyun
1324*4882a593Smuzhiyun
1325*4882a593Smuzhiyuntainted
1326*4882a593Smuzhiyun=======
1327*4882a593Smuzhiyun
1328*4882a593SmuzhiyunNon-zero if the kernel has been tainted. Numeric values, which can be
1329*4882a593SmuzhiyunORed together. The letters are seen in "Tainted" line of Oops reports.
1330*4882a593Smuzhiyun
1331*4882a593Smuzhiyun======  =====  ==============================================================
1332*4882a593Smuzhiyun     1  `(P)`  proprietary module was loaded
1333*4882a593Smuzhiyun     2  `(F)`  module was force loaded
1334*4882a593Smuzhiyun     4  `(S)`  SMP kernel oops on an officially SMP incapable processor
1335*4882a593Smuzhiyun     8  `(R)`  module was force unloaded
1336*4882a593Smuzhiyun    16  `(M)`  processor reported a Machine Check Exception (MCE)
1337*4882a593Smuzhiyun    32  `(B)`  bad page referenced or some unexpected page flags
1338*4882a593Smuzhiyun    64  `(U)`  taint requested by userspace application
1339*4882a593Smuzhiyun   128  `(D)`  kernel died recently, i.e. there was an OOPS or BUG
1340*4882a593Smuzhiyun   256  `(A)`  an ACPI table was overridden by user
1341*4882a593Smuzhiyun   512  `(W)`  kernel issued warning
1342*4882a593Smuzhiyun  1024  `(C)`  staging driver was loaded
1343*4882a593Smuzhiyun  2048  `(I)`  workaround for bug in platform firmware applied
1344*4882a593Smuzhiyun  4096  `(O)`  externally-built ("out-of-tree") module was loaded
1345*4882a593Smuzhiyun  8192  `(E)`  unsigned module was loaded
1346*4882a593Smuzhiyun 16384  `(L)`  soft lockup occurred
1347*4882a593Smuzhiyun 32768  `(K)`  kernel has been live patched
1348*4882a593Smuzhiyun 65536  `(X)`  Auxiliary taint, defined and used by for distros
1349*4882a593Smuzhiyun131072  `(T)`  The kernel was built with the struct randomization plugin
1350*4882a593Smuzhiyun======  =====  ==============================================================
1351*4882a593Smuzhiyun
1352*4882a593SmuzhiyunSee :doc:`/admin-guide/tainted-kernels` for more information.
1353*4882a593Smuzhiyun
1354*4882a593SmuzhiyunNote:
1355*4882a593Smuzhiyun  writes to this sysctl interface will fail with ``EINVAL`` if the kernel is
1356*4882a593Smuzhiyun  booted with the command line option ``panic_on_taint=<bitmask>,nousertaint``
1357*4882a593Smuzhiyun  and any of the ORed together values being written to ``tainted`` match with
1358*4882a593Smuzhiyun  the bitmask declared on panic_on_taint.
1359*4882a593Smuzhiyun  See :doc:`/admin-guide/kernel-parameters` for more details on that particular
1360*4882a593Smuzhiyun  kernel command line option and its optional ``nousertaint`` switch.
1361*4882a593Smuzhiyun
1362*4882a593Smuzhiyunthreads-max
1363*4882a593Smuzhiyun===========
1364*4882a593Smuzhiyun
1365*4882a593SmuzhiyunThis value controls the maximum number of threads that can be created
1366*4882a593Smuzhiyunusing ``fork()``.
1367*4882a593Smuzhiyun
1368*4882a593SmuzhiyunDuring initialization the kernel sets this value such that even if the
1369*4882a593Smuzhiyunmaximum number of threads is created, the thread structures occupy only
1370*4882a593Smuzhiyuna part (1/8th) of the available RAM pages.
1371*4882a593Smuzhiyun
1372*4882a593SmuzhiyunThe minimum value that can be written to ``threads-max`` is 1.
1373*4882a593Smuzhiyun
1374*4882a593SmuzhiyunThe maximum value that can be written to ``threads-max`` is given by the
1375*4882a593Smuzhiyunconstant ``FUTEX_TID_MASK`` (0x3fffffff).
1376*4882a593Smuzhiyun
1377*4882a593SmuzhiyunIf a value outside of this range is written to ``threads-max`` an
1378*4882a593Smuzhiyun``EINVAL`` error occurs.
1379*4882a593Smuzhiyun
1380*4882a593Smuzhiyun
1381*4882a593Smuzhiyuntraceoff_on_warning
1382*4882a593Smuzhiyun===================
1383*4882a593Smuzhiyun
1384*4882a593SmuzhiyunWhen set, disables tracing (see :doc:`/trace/ftrace`) when a
1385*4882a593Smuzhiyun``WARN()`` is hit.
1386*4882a593Smuzhiyun
1387*4882a593Smuzhiyun
1388*4882a593Smuzhiyuntracepoint_printk
1389*4882a593Smuzhiyun=================
1390*4882a593Smuzhiyun
1391*4882a593SmuzhiyunWhen tracepoints are sent to printk() (enabled by the ``tp_printk``
1392*4882a593Smuzhiyunboot parameter), this entry provides runtime control::
1393*4882a593Smuzhiyun
1394*4882a593Smuzhiyun    echo 0 > /proc/sys/kernel/tracepoint_printk
1395*4882a593Smuzhiyun
1396*4882a593Smuzhiyunwill stop tracepoints from being sent to printk(), and::
1397*4882a593Smuzhiyun
1398*4882a593Smuzhiyun    echo 1 > /proc/sys/kernel/tracepoint_printk
1399*4882a593Smuzhiyun
1400*4882a593Smuzhiyunwill send them to printk() again.
1401*4882a593Smuzhiyun
1402*4882a593SmuzhiyunThis only works if the kernel was booted with ``tp_printk`` enabled.
1403*4882a593Smuzhiyun
1404*4882a593SmuzhiyunSee :doc:`/admin-guide/kernel-parameters` and
1405*4882a593Smuzhiyun:doc:`/trace/boottime-trace`.
1406*4882a593Smuzhiyun
1407*4882a593Smuzhiyun
1408*4882a593Smuzhiyun.. _unaligned-dump-stack:
1409*4882a593Smuzhiyun
1410*4882a593Smuzhiyununaligned-dump-stack (ia64)
1411*4882a593Smuzhiyun===========================
1412*4882a593Smuzhiyun
1413*4882a593SmuzhiyunWhen logging unaligned accesses, controls whether the stack is
1414*4882a593Smuzhiyundumped.
1415*4882a593Smuzhiyun
1416*4882a593Smuzhiyun= ===================================================
1417*4882a593Smuzhiyun0 Do not dump the stack. This is the default setting.
1418*4882a593Smuzhiyun1 Dump the stack.
1419*4882a593Smuzhiyun= ===================================================
1420*4882a593Smuzhiyun
1421*4882a593SmuzhiyunSee also `ignore-unaligned-usertrap`_.
1422*4882a593Smuzhiyun
1423*4882a593Smuzhiyun
1424*4882a593Smuzhiyununaligned-trap
1425*4882a593Smuzhiyun==============
1426*4882a593Smuzhiyun
1427*4882a593SmuzhiyunOn architectures where unaligned accesses cause traps, and where this
1428*4882a593Smuzhiyunfeature is supported (``CONFIG_SYSCTL_ARCH_UNALIGN_ALLOW``; currently,
1429*4882a593Smuzhiyun``arc`` and ``parisc``), controls whether unaligned traps are caught
1430*4882a593Smuzhiyunand emulated (instead of failing).
1431*4882a593Smuzhiyun
1432*4882a593Smuzhiyun= ========================================================
1433*4882a593Smuzhiyun0 Do not emulate unaligned accesses.
1434*4882a593Smuzhiyun1 Emulate unaligned accesses. This is the default setting.
1435*4882a593Smuzhiyun= ========================================================
1436*4882a593Smuzhiyun
1437*4882a593SmuzhiyunSee also `ignore-unaligned-usertrap`_.
1438*4882a593Smuzhiyun
1439*4882a593Smuzhiyun
1440*4882a593Smuzhiyununknown_nmi_panic
1441*4882a593Smuzhiyun=================
1442*4882a593Smuzhiyun
1443*4882a593SmuzhiyunThe value in this file affects behavior of handling NMI. When the
1444*4882a593Smuzhiyunvalue is non-zero, unknown NMI is trapped and then panic occurs. At
1445*4882a593Smuzhiyunthat time, kernel debugging information is displayed on console.
1446*4882a593Smuzhiyun
1447*4882a593SmuzhiyunNMI switch that most IA32 servers have fires unknown NMI up, for
1448*4882a593Smuzhiyunexample.  If a system hangs up, try pressing the NMI switch.
1449*4882a593Smuzhiyun
1450*4882a593Smuzhiyun
1451*4882a593Smuzhiyununprivileged_bpf_disabled
1452*4882a593Smuzhiyun=========================
1453*4882a593Smuzhiyun
1454*4882a593SmuzhiyunWriting 1 to this entry will disable unprivileged calls to ``bpf()``;
1455*4882a593Smuzhiyunonce disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` or ``CAP_BPF``
1456*4882a593Smuzhiyunwill return ``-EPERM``. Once set to 1, this can't be cleared from the
1457*4882a593Smuzhiyunrunning kernel anymore.
1458*4882a593Smuzhiyun
1459*4882a593SmuzhiyunWriting 2 to this entry will also disable unprivileged calls to ``bpf()``,
1460*4882a593Smuzhiyunhowever, an admin can still change this setting later on, if needed, by
1461*4882a593Smuzhiyunwriting 0 or 1 to this entry.
1462*4882a593Smuzhiyun
1463*4882a593SmuzhiyunIf ``BPF_UNPRIV_DEFAULT_OFF`` is enabled in the kernel config, then this
1464*4882a593Smuzhiyunentry will default to 2 instead of 0.
1465*4882a593Smuzhiyun
1466*4882a593Smuzhiyun= =============================================================
1467*4882a593Smuzhiyun0 Unprivileged calls to ``bpf()`` are enabled
1468*4882a593Smuzhiyun1 Unprivileged calls to ``bpf()`` are disabled without recovery
1469*4882a593Smuzhiyun2 Unprivileged calls to ``bpf()`` are disabled
1470*4882a593Smuzhiyun= =============================================================
1471*4882a593Smuzhiyun
1472*4882a593Smuzhiyunwatchdog
1473*4882a593Smuzhiyun========
1474*4882a593Smuzhiyun
1475*4882a593SmuzhiyunThis parameter can be used to disable or enable the soft lockup detector
1476*4882a593Smuzhiyun*and* the NMI watchdog (i.e. the hard lockup detector) at the same time.
1477*4882a593Smuzhiyun
1478*4882a593Smuzhiyun= ==============================
1479*4882a593Smuzhiyun0 Disable both lockup detectors.
1480*4882a593Smuzhiyun1 Enable both lockup detectors.
1481*4882a593Smuzhiyun= ==============================
1482*4882a593Smuzhiyun
1483*4882a593SmuzhiyunThe soft lockup detector and the NMI watchdog can also be disabled or
1484*4882a593Smuzhiyunenabled individually, using the ``soft_watchdog`` and ``nmi_watchdog``
1485*4882a593Smuzhiyunparameters.
1486*4882a593SmuzhiyunIf the ``watchdog`` parameter is read, for example by executing::
1487*4882a593Smuzhiyun
1488*4882a593Smuzhiyun   cat /proc/sys/kernel/watchdog
1489*4882a593Smuzhiyun
1490*4882a593Smuzhiyunthe output of this command (0 or 1) shows the logical OR of
1491*4882a593Smuzhiyun``soft_watchdog`` and ``nmi_watchdog``.
1492*4882a593Smuzhiyun
1493*4882a593Smuzhiyun
1494*4882a593Smuzhiyunwatchdog_cpumask
1495*4882a593Smuzhiyun================
1496*4882a593Smuzhiyun
1497*4882a593SmuzhiyunThis value can be used to control on which cpus the watchdog may run.
1498*4882a593SmuzhiyunThe default cpumask is all possible cores, but if ``NO_HZ_FULL`` is
1499*4882a593Smuzhiyunenabled in the kernel config, and cores are specified with the
1500*4882a593Smuzhiyun``nohz_full=`` boot argument, those cores are excluded by default.
1501*4882a593SmuzhiyunOffline cores can be included in this mask, and if the core is later
1502*4882a593Smuzhiyunbrought online, the watchdog will be started based on the mask value.
1503*4882a593Smuzhiyun
1504*4882a593SmuzhiyunTypically this value would only be touched in the ``nohz_full`` case
1505*4882a593Smuzhiyunto re-enable cores that by default were not running the watchdog,
1506*4882a593Smuzhiyunif a kernel lockup was suspected on those cores.
1507*4882a593Smuzhiyun
1508*4882a593SmuzhiyunThe argument value is the standard cpulist format for cpumasks,
1509*4882a593Smuzhiyunso for example to enable the watchdog on cores 0, 2, 3, and 4 you
1510*4882a593Smuzhiyunmight say::
1511*4882a593Smuzhiyun
1512*4882a593Smuzhiyun  echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask
1513*4882a593Smuzhiyun
1514*4882a593Smuzhiyun
1515*4882a593Smuzhiyunwatchdog_thresh
1516*4882a593Smuzhiyun===============
1517*4882a593Smuzhiyun
1518*4882a593SmuzhiyunThis value can be used to control the frequency of hrtimer and NMI
1519*4882a593Smuzhiyunevents and the soft and hard lockup thresholds. The default threshold
1520*4882a593Smuzhiyunis 10 seconds.
1521*4882a593Smuzhiyun
1522*4882a593SmuzhiyunThe softlockup threshold is (``2 * watchdog_thresh``). Setting this
1523*4882a593Smuzhiyuntunable to zero will disable lockup detection altogether.
1524