1*4882a593Smuzhiyun.. SPDX-License-Identifier: GPL-2.0 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun==== 4*4882a593SmuzhiyunXFRM 5*4882a593Smuzhiyun==== 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunThe sync patches work is based on initial patches from 8*4882a593SmuzhiyunKrisztian <hidden@balabit.hu> and others and additional patches 9*4882a593Smuzhiyunfrom Jamal <hadi@cyberus.ca>. 10*4882a593Smuzhiyun 11*4882a593SmuzhiyunThe end goal for syncing is to be able to insert attributes + generate 12*4882a593Smuzhiyunevents so that the SA can be safely moved from one machine to another 13*4882a593Smuzhiyunfor HA purposes. 14*4882a593SmuzhiyunThe idea is to synchronize the SA so that the takeover machine can do 15*4882a593Smuzhiyunthe processing of the SA as accurate as possible if it has access to it. 16*4882a593Smuzhiyun 17*4882a593SmuzhiyunWe already have the ability to generate SA add/del/upd events. 18*4882a593SmuzhiyunThese patches add ability to sync and have accurate lifetime byte (to 19*4882a593Smuzhiyunensure proper decay of SAs) and replay counters to avoid replay attacks 20*4882a593Smuzhiyunwith as minimal loss at failover time. 21*4882a593SmuzhiyunThis way a backup stays as closely up-to-date as an active member. 22*4882a593Smuzhiyun 23*4882a593SmuzhiyunBecause the above items change for every packet the SA receives, 24*4882a593Smuzhiyunit is possible for a lot of the events to be generated. 25*4882a593SmuzhiyunFor this reason, we also add a nagle-like algorithm to restrict 26*4882a593Smuzhiyunthe events. i.e we are going to set thresholds to say "let me 27*4882a593Smuzhiyunknow if the replay sequence threshold is reached or 10 secs have passed" 28*4882a593SmuzhiyunThese thresholds are set system-wide via sysctls or can be updated 29*4882a593Smuzhiyunper SA. 30*4882a593Smuzhiyun 31*4882a593SmuzhiyunThe identified items that need to be synchronized are: 32*4882a593Smuzhiyun- the lifetime byte counter 33*4882a593Smuzhiyunnote that: lifetime time limit is not important if you assume the failover 34*4882a593Smuzhiyunmachine is known ahead of time since the decay of the time countdown 35*4882a593Smuzhiyunis not driven by packet arrival. 36*4882a593Smuzhiyun- the replay sequence for both inbound and outbound 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun1) Message Structure 39*4882a593Smuzhiyun---------------------- 40*4882a593Smuzhiyun 41*4882a593Smuzhiyunnlmsghdr:aevent_id:optional-TLVs. 42*4882a593Smuzhiyun 43*4882a593SmuzhiyunThe netlink message types are: 44*4882a593Smuzhiyun 45*4882a593SmuzhiyunXFRM_MSG_NEWAE and XFRM_MSG_GETAE. 46*4882a593Smuzhiyun 47*4882a593SmuzhiyunA XFRM_MSG_GETAE does not have TLVs. 48*4882a593Smuzhiyun 49*4882a593SmuzhiyunA XFRM_MSG_NEWAE will have at least two TLVs (as is 50*4882a593Smuzhiyundiscussed further below). 51*4882a593Smuzhiyun 52*4882a593Smuzhiyunaevent_id structure looks like:: 53*4882a593Smuzhiyun 54*4882a593Smuzhiyun struct xfrm_aevent_id { 55*4882a593Smuzhiyun struct xfrm_usersa_id sa_id; 56*4882a593Smuzhiyun xfrm_address_t saddr; 57*4882a593Smuzhiyun __u32 flags; 58*4882a593Smuzhiyun __u32 reqid; 59*4882a593Smuzhiyun }; 60*4882a593Smuzhiyun 61*4882a593SmuzhiyunThe unique SA is identified by the combination of xfrm_usersa_id, 62*4882a593Smuzhiyunreqid and saddr. 63*4882a593Smuzhiyun 64*4882a593Smuzhiyunflags are used to indicate different things. The possible 65*4882a593Smuzhiyunflags are:: 66*4882a593Smuzhiyun 67*4882a593Smuzhiyun XFRM_AE_RTHR=1, /* replay threshold*/ 68*4882a593Smuzhiyun XFRM_AE_RVAL=2, /* replay value */ 69*4882a593Smuzhiyun XFRM_AE_LVAL=4, /* lifetime value */ 70*4882a593Smuzhiyun XFRM_AE_ETHR=8, /* expiry timer threshold */ 71*4882a593Smuzhiyun XFRM_AE_CR=16, /* Event cause is replay update */ 72*4882a593Smuzhiyun XFRM_AE_CE=32, /* Event cause is timer expiry */ 73*4882a593Smuzhiyun XFRM_AE_CU=64, /* Event cause is policy update */ 74*4882a593Smuzhiyun 75*4882a593SmuzhiyunHow these flags are used is dependent on the direction of the 76*4882a593Smuzhiyunmessage (kernel<->user) as well the cause (config, query or event). 77*4882a593SmuzhiyunThis is described below in the different messages. 78*4882a593Smuzhiyun 79*4882a593SmuzhiyunThe pid will be set appropriately in netlink to recognize direction 80*4882a593Smuzhiyun(0 to the kernel and pid = processid that created the event 81*4882a593Smuzhiyunwhen going from kernel to user space) 82*4882a593Smuzhiyun 83*4882a593SmuzhiyunA program needs to subscribe to multicast group XFRMNLGRP_AEVENTS 84*4882a593Smuzhiyunto get notified of these events. 85*4882a593Smuzhiyun 86*4882a593Smuzhiyun2) TLVS reflect the different parameters: 87*4882a593Smuzhiyun----------------------------------------- 88*4882a593Smuzhiyun 89*4882a593Smuzhiyuna) byte value (XFRMA_LTIME_VAL) 90*4882a593Smuzhiyun 91*4882a593SmuzhiyunThis TLV carries the running/current counter for byte lifetime since 92*4882a593Smuzhiyunlast event. 93*4882a593Smuzhiyun 94*4882a593Smuzhiyunb)replay value (XFRMA_REPLAY_VAL) 95*4882a593Smuzhiyun 96*4882a593SmuzhiyunThis TLV carries the running/current counter for replay sequence since 97*4882a593Smuzhiyunlast event. 98*4882a593Smuzhiyun 99*4882a593Smuzhiyunc)replay threshold (XFRMA_REPLAY_THRESH) 100*4882a593Smuzhiyun 101*4882a593SmuzhiyunThis TLV carries the threshold being used by the kernel to trigger events 102*4882a593Smuzhiyunwhen the replay sequence is exceeded. 103*4882a593Smuzhiyun 104*4882a593Smuzhiyund) expiry timer (XFRMA_ETIMER_THRESH) 105*4882a593Smuzhiyun 106*4882a593SmuzhiyunThis is a timer value in milliseconds which is used as the nagle 107*4882a593Smuzhiyunvalue to rate limit the events. 108*4882a593Smuzhiyun 109*4882a593Smuzhiyun3) Default configurations for the parameters: 110*4882a593Smuzhiyun--------------------------------------------- 111*4882a593Smuzhiyun 112*4882a593SmuzhiyunBy default these events should be turned off unless there is 113*4882a593Smuzhiyunat least one listener registered to listen to the multicast 114*4882a593Smuzhiyungroup XFRMNLGRP_AEVENTS. 115*4882a593Smuzhiyun 116*4882a593SmuzhiyunPrograms installing SAs will need to specify the two thresholds, however, 117*4882a593Smuzhiyunin order to not change existing applications such as racoon 118*4882a593Smuzhiyunwe also provide default threshold values for these different parameters 119*4882a593Smuzhiyunin case they are not specified. 120*4882a593Smuzhiyun 121*4882a593Smuzhiyunthe two sysctls/proc entries are: 122*4882a593Smuzhiyun 123*4882a593Smuzhiyuna) /proc/sys/net/core/sysctl_xfrm_aevent_etime 124*4882a593Smuzhiyunused to provide default values for the XFRMA_ETIMER_THRESH in incremental 125*4882a593Smuzhiyununits of time of 100ms. The default is 10 (1 second) 126*4882a593Smuzhiyun 127*4882a593Smuzhiyunb) /proc/sys/net/core/sysctl_xfrm_aevent_rseqth 128*4882a593Smuzhiyunused to provide default values for XFRMA_REPLAY_THRESH parameter 129*4882a593Smuzhiyunin incremental packet count. The default is two packets. 130*4882a593Smuzhiyun 131*4882a593Smuzhiyun4) Message types 132*4882a593Smuzhiyun---------------- 133*4882a593Smuzhiyun 134*4882a593Smuzhiyuna) XFRM_MSG_GETAE issued by user-->kernel. 135*4882a593Smuzhiyun XFRM_MSG_GETAE does not carry any TLVs. 136*4882a593Smuzhiyun 137*4882a593SmuzhiyunThe response is a XFRM_MSG_NEWAE which is formatted based on what 138*4882a593SmuzhiyunXFRM_MSG_GETAE queried for. 139*4882a593Smuzhiyun 140*4882a593SmuzhiyunThe response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. 141*4882a593Smuzhiyun* if XFRM_AE_RTHR flag is set, then XFRMA_REPLAY_THRESH is also retrieved 142*4882a593Smuzhiyun* if XFRM_AE_ETHR flag is set, then XFRMA_ETIMER_THRESH is also retrieved 143*4882a593Smuzhiyun 144*4882a593Smuzhiyunb) XFRM_MSG_NEWAE is issued by either user space to configure 145*4882a593Smuzhiyun or kernel to announce events or respond to a XFRM_MSG_GETAE. 146*4882a593Smuzhiyun 147*4882a593Smuzhiyuni) user --> kernel to configure a specific SA. 148*4882a593Smuzhiyun 149*4882a593Smuzhiyunany of the values or threshold parameters can be updated by passing the 150*4882a593Smuzhiyunappropriate TLV. 151*4882a593Smuzhiyun 152*4882a593SmuzhiyunA response is issued back to the sender in user space to indicate success 153*4882a593Smuzhiyunor failure. 154*4882a593Smuzhiyun 155*4882a593SmuzhiyunIn the case of success, additionally an event with 156*4882a593SmuzhiyunXFRM_MSG_NEWAE is also issued to any listeners as described in iii). 157*4882a593Smuzhiyun 158*4882a593Smuzhiyunii) kernel->user direction as a response to XFRM_MSG_GETAE 159*4882a593Smuzhiyun 160*4882a593SmuzhiyunThe response will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. 161*4882a593Smuzhiyun 162*4882a593SmuzhiyunThe threshold TLVs will be included if explicitly requested in 163*4882a593Smuzhiyunthe XFRM_MSG_GETAE message. 164*4882a593Smuzhiyun 165*4882a593Smuzhiyuniii) kernel->user to report as event if someone sets any values or 166*4882a593Smuzhiyun thresholds for an SA using XFRM_MSG_NEWAE (as described in #i above). 167*4882a593Smuzhiyun In such a case XFRM_AE_CU flag is set to inform the user that 168*4882a593Smuzhiyun the change happened as a result of an update. 169*4882a593Smuzhiyun The message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. 170*4882a593Smuzhiyun 171*4882a593Smuzhiyuniv) kernel->user to report event when replay threshold or a timeout 172*4882a593Smuzhiyun is exceeded. 173*4882a593Smuzhiyun 174*4882a593SmuzhiyunIn such a case either XFRM_AE_CR (replay exceeded) or XFRM_AE_CE (timeout 175*4882a593Smuzhiyunhappened) is set to inform the user what happened. 176*4882a593SmuzhiyunNote the two flags are mutually exclusive. 177*4882a593SmuzhiyunThe message will always have XFRMA_LTIME_VAL and XFRMA_REPLAY_VAL TLVs. 178*4882a593Smuzhiyun 179*4882a593SmuzhiyunExceptions to threshold settings 180*4882a593Smuzhiyun-------------------------------- 181*4882a593Smuzhiyun 182*4882a593SmuzhiyunIf you have an SA that is getting hit by traffic in bursts such that 183*4882a593Smuzhiyunthere is a period where the timer threshold expires with no packets 184*4882a593Smuzhiyunseen, then an odd behavior is seen as follows: 185*4882a593SmuzhiyunThe first packet arrival after a timer expiry will trigger a timeout 186*4882a593Smuzhiyunevent; i.e we don't wait for a timeout period or a packet threshold 187*4882a593Smuzhiyunto be reached. This is done for simplicity and efficiency reasons. 188*4882a593Smuzhiyun 189*4882a593Smuzhiyun-JHS 190