xref: /OK3568_Linux_fs/kernel/Documentation/admin-guide/cgroup-v1/freezer-subsystem.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun==============
2*4882a593SmuzhiyunCgroup Freezer
3*4882a593Smuzhiyun==============
4*4882a593Smuzhiyun
5*4882a593SmuzhiyunThe cgroup freezer is useful to batch job management system which start
6*4882a593Smuzhiyunand stop sets of tasks in order to schedule the resources of a machine
7*4882a593Smuzhiyunaccording to the desires of a system administrator. This sort of program
8*4882a593Smuzhiyunis often used on HPC clusters to schedule access to the cluster as a
9*4882a593Smuzhiyunwhole. The cgroup freezer uses cgroups to describe the set of tasks to
10*4882a593Smuzhiyunbe started/stopped by the batch job management system. It also provides
11*4882a593Smuzhiyuna means to start and stop the tasks composing the job.
12*4882a593Smuzhiyun
13*4882a593SmuzhiyunThe cgroup freezer will also be useful for checkpointing running groups
14*4882a593Smuzhiyunof tasks. The freezer allows the checkpoint code to obtain a consistent
15*4882a593Smuzhiyunimage of the tasks by attempting to force the tasks in a cgroup into a
16*4882a593Smuzhiyunquiescent state. Once the tasks are quiescent another task can
17*4882a593Smuzhiyunwalk /proc or invoke a kernel interface to gather information about the
18*4882a593Smuzhiyunquiesced tasks. Checkpointed tasks can be restarted later should a
19*4882a593Smuzhiyunrecoverable error occur. This also allows the checkpointed tasks to be
20*4882a593Smuzhiyunmigrated between nodes in a cluster by copying the gathered information
21*4882a593Smuzhiyunto another node and restarting the tasks there.
22*4882a593Smuzhiyun
23*4882a593SmuzhiyunSequences of SIGSTOP and SIGCONT are not always sufficient for stopping
24*4882a593Smuzhiyunand resuming tasks in userspace. Both of these signals are observable
25*4882a593Smuzhiyunfrom within the tasks we wish to freeze. While SIGSTOP cannot be caught,
26*4882a593Smuzhiyunblocked, or ignored it can be seen by waiting or ptracing parent tasks.
27*4882a593SmuzhiyunSIGCONT is especially unsuitable since it can be caught by the task. Any
28*4882a593Smuzhiyunprograms designed to watch for SIGSTOP and SIGCONT could be broken by
29*4882a593Smuzhiyunattempting to use SIGSTOP and SIGCONT to stop and resume tasks. We can
30*4882a593Smuzhiyundemonstrate this problem using nested bash shells::
31*4882a593Smuzhiyun
32*4882a593Smuzhiyun	$ echo $$
33*4882a593Smuzhiyun	16644
34*4882a593Smuzhiyun	$ bash
35*4882a593Smuzhiyun	$ echo $$
36*4882a593Smuzhiyun	16690
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun	From a second, unrelated bash shell:
39*4882a593Smuzhiyun	$ kill -SIGSTOP 16690
40*4882a593Smuzhiyun	$ kill -SIGCONT 16690
41*4882a593Smuzhiyun
42*4882a593Smuzhiyun	<at this point 16690 exits and causes 16644 to exit too>
43*4882a593Smuzhiyun
44*4882a593SmuzhiyunThis happens because bash can observe both signals and choose how it
45*4882a593Smuzhiyunresponds to them.
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunAnother example of a program which catches and responds to these
48*4882a593Smuzhiyunsignals is gdb. In fact any program designed to use ptrace is likely to
49*4882a593Smuzhiyunhave a problem with this method of stopping and resuming tasks.
50*4882a593Smuzhiyun
51*4882a593SmuzhiyunIn contrast, the cgroup freezer uses the kernel freezer code to
52*4882a593Smuzhiyunprevent the freeze/unfreeze cycle from becoming visible to the tasks
53*4882a593Smuzhiyunbeing frozen. This allows the bash example above and gdb to run as
54*4882a593Smuzhiyunexpected.
55*4882a593Smuzhiyun
56*4882a593SmuzhiyunThe cgroup freezer is hierarchical. Freezing a cgroup freezes all
57*4882a593Smuzhiyuntasks belonging to the cgroup and all its descendant cgroups. Each
58*4882a593Smuzhiyuncgroup has its own state (self-state) and the state inherited from the
59*4882a593Smuzhiyunparent (parent-state). Iff both states are THAWED, the cgroup is
60*4882a593SmuzhiyunTHAWED.
61*4882a593Smuzhiyun
62*4882a593SmuzhiyunThe following cgroupfs files are created by cgroup freezer.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyun* freezer.state: Read-write.
65*4882a593Smuzhiyun
66*4882a593Smuzhiyun  When read, returns the effective state of the cgroup - "THAWED",
67*4882a593Smuzhiyun  "FREEZING" or "FROZEN". This is the combined self and parent-states.
68*4882a593Smuzhiyun  If any is freezing, the cgroup is freezing (FREEZING or FROZEN).
69*4882a593Smuzhiyun
70*4882a593Smuzhiyun  FREEZING cgroup transitions into FROZEN state when all tasks
71*4882a593Smuzhiyun  belonging to the cgroup and its descendants become frozen. Note that
72*4882a593Smuzhiyun  a cgroup reverts to FREEZING from FROZEN after a new task is added
73*4882a593Smuzhiyun  to the cgroup or one of its descendant cgroups until the new task is
74*4882a593Smuzhiyun  frozen.
75*4882a593Smuzhiyun
76*4882a593Smuzhiyun  When written, sets the self-state of the cgroup. Two values are
77*4882a593Smuzhiyun  allowed - "FROZEN" and "THAWED". If FROZEN is written, the cgroup,
78*4882a593Smuzhiyun  if not already freezing, enters FREEZING state along with all its
79*4882a593Smuzhiyun  descendant cgroups.
80*4882a593Smuzhiyun
81*4882a593Smuzhiyun  If THAWED is written, the self-state of the cgroup is changed to
82*4882a593Smuzhiyun  THAWED.  Note that the effective state may not change to THAWED if
83*4882a593Smuzhiyun  the parent-state is still freezing. If a cgroup's effective state
84*4882a593Smuzhiyun  becomes THAWED, all its descendants which are freezing because of
85*4882a593Smuzhiyun  the cgroup also leave the freezing state.
86*4882a593Smuzhiyun
87*4882a593Smuzhiyun* freezer.self_freezing: Read only.
88*4882a593Smuzhiyun
89*4882a593Smuzhiyun  Shows the self-state. 0 if the self-state is THAWED; otherwise, 1.
90*4882a593Smuzhiyun  This value is 1 iff the last write to freezer.state was "FROZEN".
91*4882a593Smuzhiyun
92*4882a593Smuzhiyun* freezer.parent_freezing: Read only.
93*4882a593Smuzhiyun
94*4882a593Smuzhiyun  Shows the parent-state.  0 if none of the cgroup's ancestors is
95*4882a593Smuzhiyun  frozen; otherwise, 1.
96*4882a593Smuzhiyun
97*4882a593SmuzhiyunThe root cgroup is non-freezable and the above interface files don't
98*4882a593Smuzhiyunexist.
99*4882a593Smuzhiyun
100*4882a593Smuzhiyun* Examples of usage::
101*4882a593Smuzhiyun
102*4882a593Smuzhiyun   # mkdir /sys/fs/cgroup/freezer
103*4882a593Smuzhiyun   # mount -t cgroup -ofreezer freezer /sys/fs/cgroup/freezer
104*4882a593Smuzhiyun   # mkdir /sys/fs/cgroup/freezer/0
105*4882a593Smuzhiyun   # echo $some_pid > /sys/fs/cgroup/freezer/0/tasks
106*4882a593Smuzhiyun
107*4882a593Smuzhiyunto get status of the freezer subsystem::
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun   # cat /sys/fs/cgroup/freezer/0/freezer.state
110*4882a593Smuzhiyun   THAWED
111*4882a593Smuzhiyun
112*4882a593Smuzhiyunto freeze all tasks in the container::
113*4882a593Smuzhiyun
114*4882a593Smuzhiyun   # echo FROZEN > /sys/fs/cgroup/freezer/0/freezer.state
115*4882a593Smuzhiyun   # cat /sys/fs/cgroup/freezer/0/freezer.state
116*4882a593Smuzhiyun   FREEZING
117*4882a593Smuzhiyun   # cat /sys/fs/cgroup/freezer/0/freezer.state
118*4882a593Smuzhiyun   FROZEN
119*4882a593Smuzhiyun
120*4882a593Smuzhiyunto unfreeze all tasks in the container::
121*4882a593Smuzhiyun
122*4882a593Smuzhiyun   # echo THAWED > /sys/fs/cgroup/freezer/0/freezer.state
123*4882a593Smuzhiyun   # cat /sys/fs/cgroup/freezer/0/freezer.state
124*4882a593Smuzhiyun   THAWED
125*4882a593Smuzhiyun
126*4882a593SmuzhiyunThis is the basic mechanism which should do the right thing for user space task
127*4882a593Smuzhiyunin a simple scenario.
128