Lines Matching +full:cpu +full:- +full:viewed
9 conventions of cgroup v2. It describes all userland-visible aspects
12 v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
17 1-1. Terminology
18 1-2. What is cgroup?
20 2-1. Mounting
21 2-2. Organizing Processes and Threads
22 2-2-1. Processes
23 2-2-2. Threads
24 2-3. [Un]populated Notification
25 2-4. Controlling Controllers
26 2-4-1. Enabling and Disabling
27 2-4-2. Top-down Constraint
28 2-4-3. No Internal Process Constraint
29 2-5. Delegation
30 2-5-1. Model of Delegation
31 2-5-2. Delegation Containment
32 2-6. Guidelines
33 2-6-1. Organize Once and Control
34 2-6-2. Avoid Name Collisions
36 3-1. Weights
37 3-2. Limits
38 3-3. Protections
39 3-4. Allocations
41 4-1. Format
42 4-2. Conventions
43 4-3. Core Interface Files
45 5-1. CPU
46 5-1-1. CPU Interface Files
47 5-2. Memory
48 5-2-1. Memory Interface Files
49 5-2-2. Usage Guidelines
50 5-2-3. Memory Ownership
51 5-3. IO
52 5-3-1. IO Interface Files
53 5-3-2. Writeback
54 5-3-3. IO Latency
55 5-3-3-1. How IO Latency Throttling Works
56 5-3-3-2. IO Latency Interface Files
57 5-3-4. IO Priority
58 5-4. PID
59 5-4-1. PID Interface Files
60 5-5. Cpuset
61 5.5-1. Cpuset Interface Files
62 5-6. Device
63 5-7. RDMA
64 5-7-1. RDMA Interface Files
65 5-8. HugeTLB
66 5.8-1. HugeTLB Interface Files
67 5-8. Misc
68 5-8-1. perf_event
69 5-N. Non-normative information
70 5-N-1. CPU controller root cgroup process behaviour
71 5-N-2. IO controller root cgroup process behaviour
73 6-1. Basics
74 6-2. The Root and Views
75 6-3. Migration and setns(2)
76 6-4. Interaction with Other Namespaces
78 P-1. Filesystem Support for Writeback
81 R-1. Multiple Hierarchies
82 R-2. Thread Granularity
83 R-3. Competition Between Inner Nodes and Threads
84 R-4. Other Interface Issues
85 R-5. Controller Issues and Remedies
86 R-5-1. Memory
93 -----------
102 ---------------
108 cgroup is largely composed of two parts - the core and controllers.
124 hierarchical - if a controller is enabled on a cgroup, it affects all
126 sub-hierarchy of the cgroup. When a controller is enabled on a nested
136 --------
141 # mount -t cgroup2 none $MOUNT_POINT
151 is no longer referenced in its current hierarchy. Because per-cgroup
158 to inter-controller dependencies, other controllers may need to be
180 ignored on non-init namespace mounts. Please refer to the
190 option is ignored on non-init namespace mounts.
199 behavior but is a mount-option to avoid regressing setups
205 --------------------------------
211 A child cgroup can be created by creating a sub-directory::
216 structure. Each cgroup has a read-writable interface file
218 belong to the cgroup one-per-line. The PIDs are not ordered and the
249 0::/test-cgroup/test-cgroup-nested
256 0::/test-cgroup/test-cgroup-nested (deleted)
282 constraint - threaded controllers can be enabled on non-leaf cgroups
306 - As the cgroup will join the parent's resource domain. The parent
309 - When the parent is an unthreaded domain, it must not have any domain
313 Topology-wise, a cgroup can be in an invalid state. Please consider
316 A (threaded domain) - B (threaded) - C (domain, just created)
331 threads in the cgroup. Except that the operations are per-thread
332 instead of per-process, "cgroup.threads" has the same format and
354 between threads in a non-leaf cgroup and its child cgroups. Each
359 --------------------------
361 Each non-root cgroup has a "cgroup.events" file which contains
362 "populated" field indicating whether the cgroup's sub-hierarchy has
366 example, to start a clean-up operation after all processes of a given
367 sub-hierarchy have exited. The populated state updates and
368 notifications are recursive. Consider the following sub-hierarchy
372 A(4) - B(0) - C(1)
382 -----------------------
391 cpu io memory
396 # echo "+cpu +memory -io" > cgroup.subtree_control
405 Consider the following sub-hierarchy. The enabled controllers are
408 A(cpu,memory) - B(memory) - C()
411 As A has "cpu" and "memory" enabled, A will control the distribution
412 of CPU cycles and memory to its children, in this case, B. As B has
413 "memory" enabled but not "CPU", C and D will compete freely on CPU
418 files in the child cgroups. In the above example, enabling "cpu" on B
419 would create the "cpu." prefixed controller interface files in C and
422 controller interface files - anything which doesn't start with
426 Top-down Constraint
429 Resources are distributed top-down and a cgroup can further distribute
431 parent. This means that all non-root "cgroup.subtree_control" files
441 Non-root cgroups can distribute domain resources to their children
456 refer to the Non-normative information section in the Controllers
469 ----------
489 delegated, the user can build sub-hierarchy under the directory,
493 happens in the delegated sub-hierarchy, nothing can escape the
497 cgroups in or nesting depth of a delegated sub-hierarchy; however,
504 A delegated sub-hierarchy is contained in the sense that processes
505 can't be moved into or out of the sub-hierarchy by the delegatee.
508 requiring the following conditions for a process with a non-root euid
512 - The writer must have write access to the "cgroup.procs" file.
514 - The writer must have write access to the "cgroup.procs" file of the
518 processes around freely in the delegated sub-hierarchy it can't pull
519 in from or push out to outside the sub-hierarchy.
525 ~~~~~~~~~~~~~ - C0 - C00
528 ~~~~~~~~~~~~~ - C1 - C10
535 will be denied with -EACCES.
540 is not reachable, the migration is rejected with -ENOENT.
544 ----------
552 inherent trade-offs between migration and various hot paths in terms
558 resource structure once on start-up. Dynamic adjustments to resource
591 -------
597 work-conserving. Due to the dynamic nature, this model is usually
608 "cpu.weight" proportionally distributes CPU cycles to active children
613 ------
616 Limits can be over-committed - the sum of the limits of children can
621 As limits can be over-committed, all configuration combinations are
630 -----------
635 soft boundaries. Protections can also be over-committed in which case
642 As protections can be over-committed, all configuration combinations
646 "memory.low" implements best-effort memory protection and is an
651 -----------
654 resource. Allocations can't be over-committed - the sum of the
661 As allocations can't be over-committed, some configuration
666 "cpu.rt.max" hard-allocates realtime slices and is an example of this
674 ------
679 New-line separated values
687 (when read-only or multiple values can be written at once)
713 -----------
715 - Settings for a single feature should be contained in a single file.
717 - The root cgroup should be exempt from resource control and thus
720 - The default time unit is microseconds. If a different unit is ever
723 - A parts-per quantity should use a percentage decimal with at least
724 two digit fractional part - e.g. 13.40.
726 - If a controller implements weight based resource distribution, its
732 - If a controller implements an absolute resource guarantee and/or
741 - If a setting has a configurable default value and keyed specific
755 # cat cgroup-example-interface-file
761 # echo 125 > cgroup-example-interface-file
765 # echo "default 125" > cgroup-example-interface-file
769 # echo "8:16 170" > cgroup-example-interface-file
773 # echo "8:0 default" > cgroup-example-interface-file
774 # cat cgroup-example-interface-file
778 - For events which are not very high frequency, an interface file
785 --------------------
791 A read-write single value file which exists on non-root
797 - "domain" : A normal valid domain cgroup.
799 - "domain threaded" : A threaded domain cgroup which is
802 - "domain invalid" : A cgroup which is in an invalid state.
806 - "threaded" : A threaded cgroup which is a member of a
813 A read-write new-line separated values file which exists on
817 the cgroup one-per-line. The PIDs are not ordered and the
826 - It must have write access to the "cgroup.procs" file.
828 - It must have write access to the "cgroup.procs" file of the
831 When delegating a sub-hierarchy, write access to this file
839 A read-write new-line separated values file which exists on
843 the cgroup one-per-line. The TIDs are not ordered and the
852 - It must have write access to the "cgroup.threads" file.
854 - The cgroup that the thread is currently in must be in the
857 - It must have write access to the "cgroup.procs" file of the
860 When delegating a sub-hierarchy, write access to this file
864 A read-only space separated values file which exists on all
871 A read-write space separated values file which exists on all
878 Space separated list of controllers prefixed with '+' or '-'
880 name prefixed with '+' enables the controller and '-'
886 A read-only flat-keyed file which exists on non-root cgroups.
898 A read-write single value files. The default is "max".
905 A read-write single value files. The default is "max".
912 A read-only flat-keyed file with the following entries:
930 A read-write single value file which exists on non-root cgroups.
953 create new sub-cgroups.
958 CPU section in Controllers
959 ---
961 The "cpu" controllers regulates distribution of CPU cycles. This
970 provided by a CPU, as well as the maximum desired frequency, which should not
971 be exceeded by a CPU.
974 the cpu controller can only be enabled when all RT processes are in
978 before the cpu controller can be enabled.
981 CPU Interface Files argument
986 cpu.stat
987 A read-only flat-keyed file.
992 - usage_usec
993 - user_usec
994 - system_usec
998 - nr_periods
999 - nr_throttled
1000 - throttled_usec
1002 cpu.weight
1003 A read-write single value file which exists on non-root
1008 cpu.weight.nice
1009 A read-write single value file which exists on non-root
1012 The nice value is in the range [-20, 19].
1015 "cpu.weight" and allows reading and setting weight using the
1020 cpu.max
1021 A read-write two value file which exists on non-root cgroups.
1032 cpu.pressure
1033 A read-only nested-key file which exists on non-root cgroups.
1035 Shows pressure stall information for CPU. See
1038 cpu.uclamp.min
1039 A read-write single value file which exists on non-root cgroups.
1051 `cpu.uclamp.max`.
1053 cpu.uclamp.max
1054 A read-write single value file which exists on non-root cgroups.
1067 ------
1075 While not completely water-tight, all major memory usages by a given
1080 - Userland memory - page cache and anonymous memory.
1082 - Kernel data structures such as dentries and inodes.
1084 - TCP socket buffers.
1097 A read-only single value file which exists on non-root
1104 A read-write single value file which exists on non-root
1130 A read-write single value file which exists on non-root
1133 Best-effort memory protection. If the memory usage of a
1153 A read-write single value file which exists on non-root
1165 A read-write single value file which exists on non-root
1174 In default configuration regular 0-order allocations always
1179 as -ENOMEM or silently ignore in cases like disk readahead.
1186 A read-write single value file which exists on non-root
1196 Tasks with the OOM protection (oom_score_adj set to -1000)
1204 A read-only flat-keyed file which exists on non-root cgroups.
1218 boundary is over-committed.
1238 considered as an option, e.g. for failed high-order
1251 A read-only flat-keyed file which exists on non-root cgroups.
1254 types of memory, type-specific details, and other information
1263 If the entry has no per-node counter(or not show in the
1264 mempry.numa_stat). We use 'npn'(non-per-node) as the tag
1279 Amount of memory used for storing per-cpu kernel
1286 Amount of cached filesystem data that is swap-backed,
1305 Amount of memory, swap-backed and filesystem-backed,
1311 the value for the foo counter, since the foo counter is type-based, not
1312 list-based.
1323 Amount of memory used for storing in-kernel data
1388 A read-only nested-keyed file which exists on non-root cgroups.
1391 types of memory, type-specific details, and other information
1398 application's CPU allocation.
1413 A read-only single value file which exists on non-root
1420 A read-write single value file which exists on non-root
1425 allow userspace to implement custom out-of-memory procedures.
1436 A read-write single value file which exists on non-root
1443 A read-only flat-keyed file which exists on non-root cgroups.
1459 because of running out of swap system-wide or max
1468 A read-only nested-key file which exists on non-root cgroups.
1478 Over-committing on high limit (sum of high limits > available memory)
1492 pressure - how much the workload is being impacted due to lack of
1493 memory - is necessary to determine whether a workload needs more
1507 To which cgroup the area will be charged is in-deterministic; however,
1518 --
1523 only if cfq-iosched is in use and neither scheme is available for
1524 blk-mq devices.
1531 A read-only nested-keyed file.
1551 A read-write nested-keyed file with exists only on the root
1563 enable Weight-based control enable
1595 devices which show wide temporary behavior changes - e.g. a
1606 A read-write nested-keyed file with exists only on the root
1619 model The cost model in use - "linear"
1645 generate device-specific coefficients.
1648 A read-write flat-keyed file which exists on non-root cgroups.
1668 A read-write nested-keyed file which exists on non-root
1682 When writing, any number of nested key-value pairs can be
1707 A read-only nested-key file which exists on non-root cgroups.
1726 writes out dirty pages for the memory domain. Both system-wide and
1727 per-cgroup dirty memory states are examined and the more restrictive
1765 memory controller and system-wide clean memory.
1798 your real setting, setting at 10-15% higher than the value in io.stat.
1808 - Queue depth throttling. This is the number of outstanding IO's a group is
1812 - Artificial delay induction. There are certain types of IO that cannot be
1859 no-change
1862 none-to-rt
1867 restrict-to-be
1878 +-------------+---+
1879 | no-change | 0 |
1880 +-------------+---+
1881 | none-to-rt | 1 |
1882 +-------------+---+
1883 | rt-to-be | 2 |
1884 +-------------+---+
1885 | all-to-idle | 3 |
1886 +-------------+---+
1890 +-------------------------------+---+
1892 +-------------------------------+---+
1893 | IOPRIO_CLASS_RT (real-time) | 1 |
1894 +-------------------------------+---+
1896 +-------------------------------+---+
1898 +-------------------------------+---+
1902 - Translate the I/O priority class policy into a number.
1903 - Change the request I/O priority class into the maximum of the I/O priority
1907 ---
1926 A read-write single value file which exists on non-root
1932 A read-only single value file which exists on all cgroups.
1942 through fork() or clone(). These will return -EAGAIN if the creation
1947 ------
1950 the CPU and memory node placement of tasks to only the resources
1954 memory placement to reduce cross-node memory access and contention
1965 A read-write multiple values file which exists on non-root
1966 cpuset-enabled cgroups.
1973 The CPU numbers are comma-separated numbers or ranges.
1977 0-4,6,8-10
1980 setting as the nearest cgroup ancestor with a non-empty
1984 and won't be affected by any CPU hotplug events.
1987 A read-only multiple values file which exists on all
1988 cpuset-enabled cgroups.
2001 Its value will be affected by CPU hotplug events.
2004 A read-write multiple values file which exists on non-root
2005 cpuset-enabled cgroups.
2012 The memory node numbers are comma-separated numbers or ranges.
2016 0-1,3
2019 setting as the nearest cgroup ancestor with a non-empty
2027 A read-only multiple values file which exists on all
2028 cpuset-enabled cgroups.
2043 A read-write single value file which exists on non-root
2044 cpuset-enabled cgroups. This flag is owned by the parent cgroup
2049 "root" - a partition root
2050 "member" - a non-root member of a partition
2077 child partitions. There must be at least one cpu left in the
2087 "cpuset.cpus" or cpu hotplug can cause the state of the partition
2091 "member" Non-root member of a partition
2096 above are true and at least one CPU from "cpuset.cpus" is
2104 The cpu affinity of all the tasks in the cgroup will then be
2109 can now be granted by its parent. In this case, the cpu
2117 -----------------
2128 the attempt will succeed or fail with -EPERM.
2133 If the program returns 0, the attempt fails with -EPERM, otherwise
2141 ----
2150 A readwrite nested-keyed file that exists for all the cgroups
2171 A read-only file that describes current resource usage.
2180 -------
2197 A read-only flat-keyed file which exists on non-root cgroups.
2208 ----
2219 Non-normative information
2220 -------------------------
2226 CPU controller root cgroup process behaviour
2229 When distributing CPU cycles in the root cgroup each thread in this
2236 appropriately so the neutral - nice 0 - value is 100 instead of 1024).
2252 ------
2271 The path '/batchjobs/container_id1' can be considered as system-data
2276 # ls -l /proc/self/ns/cgroup
2277 lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2283 # ls -l /proc/self/ns/cgroup
2284 lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2288 When some thread from a multi-threaded process unshares its cgroup
2300 ------------------
2311 # ~/unshare -c # unshare cgroupns in some cgroup
2319 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2350 ----------------------
2379 ---------------------------------
2382 running inside a non-init cgroup namespace::
2384 # mount -t cgroup2 none $MOUNT_POINT
2391 the view of cgroup hierarchy by namespace-private cgroupfs mount
2404 --------------------------------
2407 address_space_operations->writepage[s]() to annotate bio's using the
2424 super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
2441 - Multiple hierarchies including named ones are not supported.
2443 - All v1 mount options are not supported.
2445 - The "tasks" file is removed and "cgroup.procs" is not sorted.
2447 - "cgroup.clone_children" is removed.
2449 - /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
2457 --------------------
2475 as the cpu and cpuacct controllers, made sense to be put on the same
2503 be collapsed from leaf towards root when viewed from specific
2506 to control how CPU cycles are distributed.
2510 ------------------
2518 Generally, in-process knowledge is available only to the process
2519 itself; thus, unlike service-level organization of processes,
2526 sub-hierarchies and control resource distributions along them. This
2527 effectively raised cgroup to the status of a syscall-like API exposed
2537 that the process would actually be operating on its own sub-hierarchy.
2541 system-management pseudo filesystem. cgroup ended up with interface
2544 individual applications through the ill-defined delegation mechanism
2554 -------------------------------------------
2562 The cpu controller considered threads and cgroups as equivalents and
2564 fell flat when children wanted to be allocated specific ratios of CPU
2565 cycles and the number of internal threads fluctuated - the ratios
2581 clearly defined. There were attempts to add ad-hoc behaviors and
2595 ----------------------
2599 was how an empty cgroup was notified - a userland helper binary was
2602 to in-kernel event delivery filtering mechanism further complicating
2624 ------------------------------
2631 global reclaim prefers is opt-in, rather than opt-out. The costs for
2641 becomes self-defeating.
2643 The memory.low boundary on the other hand is a top-down allocated
2681 new limit is met - or the task writing to memory.max is killed.
2690 groups can sabotage swapping by other means - such as referencing its
2691 anonymous memory in a tight loop - and an admin can not assume full