xref: /OK3568_Linux_fs/kernel/Documentation/networking/xfrm_sync.rst (revision 4882a59341e53eb6f0b4789bf948001014eff981)
1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0
2*4882a593Smuzhiyun
3*4882a593Smuzhiyun====
4*4882a593SmuzhiyunXFRM
5*4882a593Smuzhiyun====
6*4882a593Smuzhiyun
7*4882a593SmuzhiyunThe sync patches work is based on initial patches from
8*4882a593SmuzhiyunKrisztian <hidden@balabit.hu> and others and additional patches
9*4882a593Smuzhiyunfrom Jamal <hadi@cyberus.ca>.
10*4882a593Smuzhiyun
11*4882a593SmuzhiyunThe end goal for syncing is to be able to insert attributes + generate
12*4882a593Smuzhiyunevents so that the SA can be safely moved from one machine to another
13*4882a593Smuzhiyunfor HA purposes.
14*4882a593SmuzhiyunThe idea is to synchronize the SA so that the takeover machine can do
15*4882a593Smuzhiyunthe processing of the SA as accurate as possible if it has access to it.
16*4882a593Smuzhiyun
17*4882a593SmuzhiyunWe already have the ability to generate SA add/del/upd events.
18*4882a593SmuzhiyunThese patches add ability to sync and have accurate lifetime byte (to
19*4882a593Smuzhiyunensure proper decay of SAs) and replay counters to avoid replay attacks
20*4882a593Smuzhiyunwith as minimal loss at failover time.
21*4882a593SmuzhiyunThis way a backup stays as closely up-to-date as an active member.
22*4882a593Smuzhiyun
23*4882a593SmuzhiyunBecause the above items change for every packet the SA receives,
24*4882a593Smuzhiyunit is possible for a lot of the events to be generated.
25*4882a593SmuzhiyunFor this reason, we also add a nagle-like algorithm to restrict
26*4882a593Smuzhiyunthe events. i.e we are going to set thresholds to say "let me
27*4882a593Smuzhiyunknow if the replay sequence threshold is reached or 10 secs have passed"
28*4882a593SmuzhiyunThese thresholds are set system-wide via sysctls or can be updated
29*4882a593Smuzhiyunper SA.
30*4882a593Smuzhiyun
31*4882a593SmuzhiyunThe identified items that need to be synchronized are:
32*4882a593Smuzhiyun- the lifetime byte counter
33*4882a593Smuzhiyunnote that: lifetime time limit is not important if you assume the failover
34*4882a593Smuzhiyunmachine is known ahead of time since the decay of the time countdown
35*4882a593Smuzhiyunis not driven by packet arrival.
36*4882a593Smuzhiyun- the replay sequence for both inbound and outbound
37*4882a593Smuzhiyun
38*4882a593Smuzhiyun1) Message Structure
39*4882a593Smuzhiyun----------------------
40*4882a593Smuzhiyun
41*4882a593Smuzhiyunnlmsghdr:aevent_id:optional-TLVs.
42*4882a593Smuzhiyun
43*4882a593SmuzhiyunThe netlink message types are:
44*4882a593Smuzhiyun
45*4882a593SmuzhiyunXFRM_MSG_NEWAE and XFRM_MSG_GETAE.
46*4882a593Smuzhiyun
47*4882a593SmuzhiyunA XFRM_MSG_GETAE does not have TLVs.
48*4882a593Smuzhiyun
49*4882a593SmuzhiyunA XFRM_MSG_NEWAE will have at least two TLVs (as is
50*4882a593Smuzhiyundiscussed further below).
51*4882a593Smuzhiyun
52*4882a593Smuzhiyunaevent_id structure looks like::
53*4882a593Smuzhiyun
54*4882a593Smuzhiyun   struct xfrm_aevent_id {
55*4882a593Smuzhiyun	     struct xfrm_usersa_id           sa_id;
56*4882a593Smuzhiyun	     xfrm_address_t                  saddr;
57*4882a593Smuzhiyun	     __u32                           flags;
58*4882a593Smuzhiyun	     __u32                           reqid;
59*4882a593Smuzhiyun   };
60*4882a593Smuzhiyun
61*4882a593SmuzhiyunThe unique SA is identified by the combination of xfrm_usersa_id,
62*4882a593Smuzhiyunreqid and saddr.
63*4882a593Smuzhiyun
64*4882a593Smuzhiyunflags are used to indicate different things. The possible
65*4882a593Smuzhiyunflags are::
66*4882a593Smuzhiyun
67*4882a593Smuzhiyun	XFRM_AE_RTHR=1, /* replay threshold*/
68*4882a593Smuzhiyun	XFRM_AE_RVAL=2, /* replay value */
69*4882a593Smuzhiyun	XFRM_AE_LVAL=4, /* lifetime value */
70*4882a593Smuzhiyun	XFRM_AE_ETHR=8, /* expiry timer threshold */
71*4882a593Smuzhiyun	XFRM_AE_CR=16, /* Event cause is replay update */
72*4882a593Smuzhiyun	XFRM_AE_CE=32, /* Event cause is timer expiry */
73*4882a593Smuzhiyun	XFRM_AE_CU=64, /* Event cause is policy update */
74*4882a593Smuzhiyun
75*4882a593SmuzhiyunHow these flags are used is dependent on the direction of the
76*4882a593Smuzhiyunmessage (kernel<->user) as well the cause (config, query or event).
77*4882a593SmuzhiyunThis is described below in the different messages.
78*4882a593Smuzhiyun
79*4882a593SmuzhiyunThe pid will be set appropriately in netlink to recognize direction
80*4882a593Smuzhiyun(0 to the kernel and pid = processid that created the event
81*4882a593Smuzhiyunwhen going from kernel to user space)
82*4882a593Smuzhiyun
83*4882a593SmuzhiyunA program needs to subscribe to multicast group XFRMNLGRP_AEVENTS
84*4882a593Smuzhiyunto get notified of these events.
85*4882a593Smuzhiyun
86*4882a593Smuzhiyun2) TLVS reflect the different parameters:
87*4882a593Smuzhiyun-----------------------------------------
88*4882a593Smuzhiyun
89*4882a593Smuzhiyuna) byte value (XFRMA_LTIME_VAL)
90*4882a593Smuzhiyun
91*4882a593SmuzhiyunThis TLV carries the running/current counter for byte lifetime since
92*4882a593Smuzhiyunlast event.
93*4882a593Smuzhiyun
94*4882a593Smuzhiyunb)replay value (XFRMA_REPLAY_VAL)
95*4882a593Smuzhiyun
96*4882a593SmuzhiyunThis TLV carries the running/current counter for replay sequence since
97*4882a593Smuzhiyunlast event.
98*4882a593Smuzhiyun
99*4882a593Smuzhiyunc)replay threshold (XFRMA_REPLAY_THRESH)
100*4882a593Smuzhiyun
101*4882a593SmuzhiyunThis TLV carries the threshold being used by the kernel to trigger events
102*4882a593Smuzhiyunwhen the replay sequence is exceeded.
103*4882a593Smuzhiyun
104*4882a593Smuzhiyund) expiry timer (XFRMA_ETIMER_THRESH)
105*4882a593Smuzhiyun
106*4882a593SmuzhiyunThis is a timer value in milliseconds which is used as the nagle
107*4882a593Smuzhiyunvalue to rate limit the events.
108*4882a593Smuzhiyun
109*4882a593Smuzhiyun3) Default configurations for the parameters:
110*4882a593Smuzhiyun---------------------------------------------
111*4882a593Smuzhiyun
112*4882a593SmuzhiyunBy default these events should be turned off unless there is
113*4882a593Smuzhiyunat least one listener registered to listen to the multicast
114*4882a593Smuzhiyungroup XFRMNLGRP_AEVENTS.
115*4882a593Smuzhiyun
116*4882a593SmuzhiyunPrograms installing SAs will need to specify the two thresholds, however,
117*4882a593Smuzhiyunin order to not change existing applications such as racoon
118*4882a593Smuzhiyunwe also provide default threshold values for these different parameters
119*4882a593Smuzhiyunin case they are not specified.
120*4882a593Smuzhiyun
121*4882a593Smuzhiyunthe two sysctls/proc entries are:
122*4882a593Smuzhiyun
123*4882a593Smuzhiyuna) /proc/sys/net/core/sysctl_xfrm_aevent_etime
124*4882a593Smuzhiyunused to provide default values for the XFRMA_ETIMER_THRESH in incremental
125*4882a593Smuzhiyununits of time of 100ms. The default is 10 (1 second)
126*4882a593Smuzhiyun
127*4882a593Smuzhiyunb) /proc/sys/net/core/sysctl_xfrm_aevent_rseqth
128*4882a593Smuzhiyunused to provide default values for XFRMA_REPLAY_THRESH parameter
129*4882a593Smuzhiyunin incremental packet count. The default is two packets.
130*4882a593Smuzhiyun
131*4882a593Smuzhiyun4) Message types
132*4882a593Smuzhiyun----------------
133*4882a593Smuzhiyun
134*4882a593Smuzhiyuna) XFRM_MSG_GETAE issued by user-->kernel.
135*4882a593Smuzhiyun   XFRM_MSG_GETAE does not carry any TLVs.
136*4882a593Smuzhiyun
137*4882a593SmuzhiyunThe response is a XFRM_MSG_NEWAE which is formatted based on what
138*4882a593SmuzhiyunXFRM_MSG_GETAE queried for.
139*4882a593Smuzhiyun
140*4882a593SmuzhiyunThe response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
141*4882a593Smuzhiyun* if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved
142*4882a593Smuzhiyun* if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved
143*4882a593Smuzhiyun
144*4882a593Smuzhiyunb) XFRM_MSG_NEWAE is issued by either user space to configure
145*4882a593Smuzhiyun   or kernel to announce events or respond to a XFRM_MSG_GETAE.
146*4882a593Smuzhiyun
147*4882a593Smuzhiyuni) user --> kernel to configure a specific SA.
148*4882a593Smuzhiyun
149*4882a593Smuzhiyunany of the values or threshold parameters can be updated by passing the
150*4882a593Smuzhiyunappropriate TLV.
151*4882a593Smuzhiyun
152*4882a593SmuzhiyunA response is issued back to the sender in user space to indicate success
153*4882a593Smuzhiyunor failure.
154*4882a593Smuzhiyun
155*4882a593SmuzhiyunIn the case of success, additionally an event with
156*4882a593SmuzhiyunXFRM_MSG_NEWAE is also issued to any listeners as described in iii).
157*4882a593Smuzhiyun
158*4882a593Smuzhiyunii) kernel->user direction as a response to XFRM_MSG_GETAE
159*4882a593Smuzhiyun
160*4882a593SmuzhiyunThe response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
161*4882a593Smuzhiyun
162*4882a593SmuzhiyunThe threshold TLVs will be included if explicitly requested in
163*4882a593Smuzhiyunthe XFRM_MSG_GETAE message.
164*4882a593Smuzhiyun
165*4882a593Smuzhiyuniii) kernel->user to report as event if someone sets any values or
166*4882a593Smuzhiyun     thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above).
167*4882a593Smuzhiyun     In such a case XFRM_AE_CU flag is set to inform the user that
168*4882a593Smuzhiyun     the change happened as a result of an update.
169*4882a593Smuzhiyun     The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
170*4882a593Smuzhiyun
171*4882a593Smuzhiyuniv) kernel->user to report event when replay threshold or a timeout
172*4882a593Smuzhiyun    is exceeded.
173*4882a593Smuzhiyun
174*4882a593SmuzhiyunIn such a case either XFRM_AE_CR (replay exceeded) or XFRM_AE_CE (timeout
175*4882a593Smuzhiyunhappened) is set to inform the user what happened.
176*4882a593SmuzhiyunNote the two flags are mutually exclusive.
177*4882a593SmuzhiyunThe message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs.
178*4882a593Smuzhiyun
179*4882a593SmuzhiyunExceptions to threshold settings
180*4882a593Smuzhiyun--------------------------------
181*4882a593Smuzhiyun
182*4882a593SmuzhiyunIf you have an SA that is getting hit by traffic in bursts such that
183*4882a593Smuzhiyunthere is a period where the timer threshold expires with no packets
184*4882a593Smuzhiyunseen, then an odd behavior is seen as follows:
185*4882a593SmuzhiyunThe first packet arrival after a timer expiry will trigger a timeout
186*4882a593Smuzhiyunevent; i.e we don't wait for a timeout period or a packet threshold
187*4882a593Smuzhiyunto be reached. This is done for simplicity and efficiency reasons.
188*4882a593Smuzhiyun
189*4882a593Smuzhiyun-JHS
190