1*4882a593Smuzhiyun# RK3588 NPU SRAM使用说明 2*4882a593Smuzhiyun 3*4882a593Smuzhiyun* RK3588 SOC内部含有1MB的SRAM,其中有956KB可供给SOC上各个IP所使用,已支持为RKNPU指定分配使用 4*4882a593Smuzhiyun* SRAM可以帮助RKNPU应用减轻DDR带宽压力,目前支持为Internal和Weight两种类型内存指定分配SRAM 5*4882a593Smuzhiyun 6*4882a593Smuzhiyun--- 7*4882a593Smuzhiyun一、板端环境要求 8*4882a593Smuzhiyun--- 9*4882a593Smuzhiyun1、内核环境要求 10*4882a593Smuzhiyun* RKNPU驱动版本>=0.8.0 11*4882a593Smuzhiyun* 内核config需要开启CONFIG_ROCKCHIP_RKNPU_SRAM=y 12*4882a593Smuzhiyun * Android系统config路径如下: 13*4882a593Smuzhiyun ```shell 14*4882a593Smuzhiyun <path-to-your-kernel>/arch/arm64/configs/rockchip_defconfig 15*4882a593Smuzhiyun ``` 16*4882a593Smuzhiyun * Linux系统config路径如下: 17*4882a593Smuzhiyun ``` 18*4882a593Smuzhiyun <path-to-your-kernel>/arch/arm64/configs/rockchip_linux_defconfig 19*4882a593Smuzhiyun ``` 20*4882a593Smuzhiyun* 内核相应DTS需要从系统SRAM中分配给RKNPU使用 21*4882a593Smuzhiyun * 从系统分配需求大小的SRAM给RKNPU,最大可分配956KB,且大小需要4K对齐 22*4882a593Smuzhiyun * 注意:默认系统中可能已为其他IP分配SRAM,比如编解码模块,各IP分配的SRAM区域不能重叠,否则会存在同时读写出现数据错乱现象 23*4882a593Smuzhiyun * 如下为956KB全部分配给RKNPU的例子: 24*4882a593Smuzhiyun ```dts 25*4882a593Smuzhiyun syssram: sram@ff001000 { 26*4882a593Smuzhiyun compatible = "mmio-sram"; 27*4882a593Smuzhiyun reg = <0x0 0xff001000 0x0 0xef000>; 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun #address-cells = <1>; 30*4882a593Smuzhiyun #size-cells = <1>; 31*4882a593Smuzhiyun ranges = <0x0 0x0 0xff001000 0xef000>; 32*4882a593Smuzhiyun /* 分配RKNPU SRAM */ 33*4882a593Smuzhiyun /* start address and size should be 4k algin */ 34*4882a593Smuzhiyun rknpu_sram: rknpu_sram@0 { 35*4882a593Smuzhiyun reg = <0x0 0xef000>; // 956KB 36*4882a593Smuzhiyun }; 37*4882a593Smuzhiyun }; 38*4882a593Smuzhiyun ``` 39*4882a593Smuzhiyun * 把分配的SRAM挂到RKNPU节点,修改如下所示的dtsi文件: 40*4882a593Smuzhiyun ```shell 41*4882a593Smuzhiyun <path-to-your-kernel>/arch/arm64/boot/dts/rockchip/rk3588s.dtsi 42*4882a593Smuzhiyun ``` 43*4882a593Smuzhiyun ```dts 44*4882a593Smuzhiyun rknpu: npu@fdab0000 { 45*4882a593Smuzhiyun compatible = "rockchip,rk3588-rknpu"; 46*4882a593Smuzhiyun /* ... */ 47*4882a593Smuzhiyun /* 增加RKNPU sram的引用 */ 48*4882a593Smuzhiyun rockchip,sram = <&rknpu_sram>; 49*4882a593Smuzhiyun status = "disabled"; 50*4882a593Smuzhiyun }; 51*4882a593Smuzhiyun ``` 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun2、RKNN SDK版本要求 54*4882a593Smuzhiyun* RKNPU Runtime库(librknnrt.so)版本>=1.3.4b14 55*4882a593Smuzhiyun 56*4882a593Smuzhiyun--- 57*4882a593Smuzhiyun二、使用方法 58*4882a593Smuzhiyun--- 59*4882a593Smuzhiyun1、指定Internal使用SRAM: 60*4882a593Smuzhiyun* 自动大小方式,将尝试从系统分配剩余足够的SRAM给Internal使用 61*4882a593Smuzhiyun * **export RKNN_INTERNAL_MEM_TYPE=sram** 62*4882a593Smuzhiyun* 指定大小方式,将尝试从系统分配指定256KB大小的SRAM给Internal使用 63*4882a593Smuzhiyun * **export RKNN_INTERNAL_MEM_TYPE=sram#256** 64*4882a593Smuzhiyun 65*4882a593Smuzhiyun2、指定Weight使用SRAM: 66*4882a593Smuzhiyun* 自动大小方式,将尝试从系统分配剩余足够的SRAM给Weight使用 67*4882a593Smuzhiyun * **export RKNN_SEPARATE_WEIGHT_MEM=1** 68*4882a593Smuzhiyun * **export RKNN_WEIGHT_MEM_TYPE=sram** 69*4882a593Smuzhiyun* 指定大小方式,将尝试从系统分配指定128KB大小的SRAM给Weight使用 70*4882a593Smuzhiyun * **export RKNN_SEPARATE_WEIGHT_MEM=1** 71*4882a593Smuzhiyun * **export RKNN_WEIGHT_MEM_TYPE=sram#128** 72*4882a593Smuzhiyun 73*4882a593Smuzhiyun3、混合指定 74*4882a593Smuzhiyun* RKNPU驱动支持对SRAM内存管理,支持同时指定SRAM给Internal和Weight同时使用,如下: 75*4882a593Smuzhiyun * **export RKNN_INTERNAL_MEM_TYPE=sram#256** 76*4882a593Smuzhiyun * **export RKNN_SEPARATE_WEIGHT_MEM=1** 77*4882a593Smuzhiyun * **export RKNN_WEIGHT_MEM_TYPE=sram#128** 78*4882a593Smuzhiyun 79*4882a593Smuzhiyun--- 80*4882a593Smuzhiyun三、调试方法 81*4882a593Smuzhiyun--- 82*4882a593Smuzhiyun1、SRAM是否启用查询 83*4882a593Smuzhiyun* 通过开机串口日志查看SRAM是否启用,包含为RKNPU指定SRAM的地址范围和大小信息,如下所示: 84*4882a593Smuzhiyun```shell 85*4882a593Smuzhiyunrk3588_s:/ # dmesg | grep rknpu -i 86*4882a593SmuzhiyunRKNPU fdab0000.npu: RKNPU: sram region: [0x00000000ff001000, 0x00000000ff0f0000), sram size: 0xef000 87*4882a593Smuzhiyun``` 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun2、SRAM使用情况查询 90*4882a593Smuzhiyun* 可通过节点查询SRAM的使用情况 91*4882a593Smuzhiyun* 如下为未使用SRAM的位图表,每个点表示4K大小 92*4882a593Smuzhiyun```shell 93*4882a593Smuzhiyunrk3588_s:/ # cat /sys/kernel/debug/rknpu/mm 94*4882a593SmuzhiyunSRAM bitmap: "*" - used, "." - free (1bit = 4KB) 95*4882a593Smuzhiyun[000] [................................] 96*4882a593Smuzhiyun[001] [................................] 97*4882a593Smuzhiyun[002] [................................] 98*4882a593Smuzhiyun[003] [................................] 99*4882a593Smuzhiyun[004] [................................] 100*4882a593Smuzhiyun[005] [................................] 101*4882a593Smuzhiyun[006] [................................] 102*4882a593Smuzhiyun[007] [...............] 103*4882a593SmuzhiyunSRAM total size: 978944, used: 0, free: 978944 104*4882a593Smuzhiyun``` 105*4882a593Smuzhiyun* 如下为分配使用512KB后的SRAM位图表 106*4882a593Smuzhiyun```shell 107*4882a593Smuzhiyunrk3588_s:/ # cat /sys/kernel/debug/rknpu/mm 108*4882a593SmuzhiyunSRAM bitmap: "*" - used, "." - free (1bit = 4KB) 109*4882a593Smuzhiyun[000] [********************************] 110*4882a593Smuzhiyun[001] [********************************] 111*4882a593Smuzhiyun[002] [********************************] 112*4882a593Smuzhiyun[003] [********************************] 113*4882a593Smuzhiyun[004] [................................] 114*4882a593Smuzhiyun[005] [................................] 115*4882a593Smuzhiyun[006] [................................] 116*4882a593Smuzhiyun[007] [...............] 117*4882a593SmuzhiyunSRAM total size: 978944, used: 524288, free: 454656 118*4882a593Smuzhiyun``` 119*4882a593Smuzhiyun 120*4882a593Smuzhiyun3、通过RKNN API查询SRAM大小 121*4882a593Smuzhiyun* 通过rknn_query的RKNN_QUERY_MEM_SIZE接口查询SRAM大小信息 122*4882a593Smuzhiyun```C++ 123*4882a593Smuzhiyuntypedef struct _rknn_mem_size { 124*4882a593Smuzhiyun uint32_t total_weight_size; 125*4882a593Smuzhiyun uint32_t total_internal_size; 126*4882a593Smuzhiyun uint64_t total_dma_allocated_size; 127*4882a593Smuzhiyun uint32_t total_sram_size; 128*4882a593Smuzhiyun uint32_t free_sram_size; 129*4882a593Smuzhiyun uint32_t reserved[10]; 130*4882a593Smuzhiyun} rknn_mem_size; 131*4882a593Smuzhiyun``` 132*4882a593Smuzhiyun* 其中,total_sram_size表示:系统给RKNPU分配的SRAM总大小 133*4882a593Smuzhiyun* free_sram_size表示:剩余RKNPU能使用的SRAM大小 134*4882a593Smuzhiyun 135*4882a593Smuzhiyun4、查看网络SRAM的占用情况 136*4882a593Smuzhiyun* 板端环境中,RKNN应用运行前设置如下环境变量,可打印SRAM使用预测情况: 137*4882a593Smuzhiyun```shell 138*4882a593Smuzhiyunexport RKNN_LOG_LEVEL=3 139*4882a593Smuzhiyun``` 140*4882a593Smuzhiyun* Internal分配SRAM的逐层占用情况,如下日志所示: 141*4882a593Smuzhiyun```shell 142*4882a593Smuzhiyun--------------------------------------------------------------------------- 143*4882a593SmuzhiyunTotal allocated Internal SRAM Size: 524288, Addr: [0xff3e0000, 0xff460000) 144*4882a593Smuzhiyun--------------------------------------------------------------------------- 145*4882a593Smuzhiyun---------------------------------------------------------------------+----------------------------------+----------- 146*4882a593SmuzhiyunID User Tensor DataType OrigShape NativeShape | [Start End) Size | SramHit 147*4882a593Smuzhiyun---------------------------------------------------------------------+----------------------------------+----------- 148*4882a593Smuzhiyun1 ConvRelu input0 INT8 (1,3,224,224) (1,1,224,224,3) | 0xff3b0000 0xff3d4c00 0x00024c00 | \ 149*4882a593Smuzhiyun2 ConvRelu output2 INT8 (1,32,112,112) (1,2,112,112,16) | 0xff404c00 0xff466c00 0x00062000 | 0x0005b400 150*4882a593Smuzhiyun3 ConvRelu output4 INT8 (1,32,112,112) (1,4,112,112,16) | 0xff466c00 0xff52ac00 0x000c4000 | 0x00000000 151*4882a593Smuzhiyun4 ConvRelu output6 INT8 (1,64,112,112) (1,4,112,112,16) | 0xff52ac00*0xff5eec00 0x000c4000 | 0x00000000 152*4882a593Smuzhiyun5 ConvRelu output8 INT8 (1,64,56,56) (1,4,56,56,16) | 0xff3e0000 0xff411000 0x00031000 | 0x00031000 153*4882a593Smuzhiyun6 ConvRelu output10 INT8 (1,128,56,56) (1,8,56,56,16) | 0xff411000 0xff473000 0x00062000 | 0x0004f000 154*4882a593Smuzhiyun7 ConvRelu output12 INT8 (1,128,56,56) (1,8,56,56,16) | 0xff473000 0xff4d5000 0x00062000 | 0x00000000 155*4882a593Smuzhiyun8 ConvRelu output14 INT8 (1,128,56,56) (1,8,56,56,16) | 0xff3e0000 0xff442000 0x00062000 | 0x00062000 156*4882a593Smuzhiyun9 ConvRelu output16 INT8 (1,128,28,28) (1,8,28,28,16) | 0xff442000 0xff45a800 0x00018800 | 0x00018800 157*4882a593Smuzhiyun10 ConvRelu output18 INT8 (1,256,28,28) (1,16,28,28,16) | 0xff3e0000 0xff411000 0x00031000 | 0x00031000 158*4882a593Smuzhiyun11 ConvRelu output20 INT8 (1,256,28,28) (1,16,28,28,16) | 0xff411000 0xff442000 0x00031000 | 0x00031000 159*4882a593Smuzhiyun12 ConvRelu output22 INT8 (1,256,28,28) (1,16,28,28,16) | 0xff3e0000 0xff411000 0x00031000 | 0x00031000 160*4882a593Smuzhiyun13 ConvRelu output24 INT8 (1,256,14,14) (1,16,14,14,16) | 0xff411000 0xff41d400 0x0000c400 | 0x0000c400 161*4882a593Smuzhiyun14 ConvRelu output26 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800 162*4882a593Smuzhiyun15 ConvRelu output28 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3f8800 0xff411000 0x00018800 | 0x00018800 163*4882a593Smuzhiyun16 ConvRelu output30 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800 164*4882a593Smuzhiyun17 ConvRelu output32 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3f8800 0xff411000 0x00018800 | 0x00018800 165*4882a593Smuzhiyun18 ConvRelu output34 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800 166*4882a593Smuzhiyun19 ConvRelu output36 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3f8800 0xff411000 0x00018800 | 0x00018800 167*4882a593Smuzhiyun20 ConvRelu output38 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800 168*4882a593Smuzhiyun21 ConvRelu output40 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3f8800 0xff411000 0x00018800 | 0x00018800 169*4882a593Smuzhiyun22 ConvRelu output42 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800 170*4882a593Smuzhiyun23 ConvRelu output44 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3f8800 0xff411000 0x00018800 | 0x00018800 171*4882a593Smuzhiyun24 ConvRelu output46 INT8 (1,512,14,14) (1,32,14,14,16) | 0xff3e0000 0xff3f8800 0x00018800 | 0x00018800 172*4882a593Smuzhiyun25 ConvRelu output48 INT8 (1,512,7,7) (1,33,7,7,16) | 0xff3f8800 0xff3ff000 0x00006800 | 0x00006800 173*4882a593Smuzhiyun26 ConvRelu output50 INT8 (1,1024,7,7) (1,67,7,7,16) | 0xff3e0000 0xff3ed000 0x0000d000 | 0x0000d000 174*4882a593Smuzhiyun27 ConvRelu output52 INT8 (1,1024,7,7) (1,67,7,7,16) | 0xff3ed000 0xff3fa000 0x0000d000 | 0x0000d000 175*4882a593Smuzhiyun28 AveragePool output54 INT8 (1,1024,7,7) (1,67,7,7,16) | 0xff3e0000 0xff3ed000 0x0000d000 | 0x0000d000 176*4882a593Smuzhiyun29 Conv output55 INT8 (1,1024,1,1) (1,64,1,1,16) | 0xff3ed000 0xff3ed400 0x00000400 | 0x00000400 177*4882a593Smuzhiyun30 Softmax output56 INT8 (1,1000,1,1) (1,64,1,1,16) | 0xff3e0000 0xff3e0400 0x00000400 | 0x00000400 178*4882a593Smuzhiyun31 OutputOperator output57 FLOAT (1,1000,1,1) (1,1000,1,1) | 0xff3ae000 0xff3aefa0 0x00000fa0 | \ 179*4882a593Smuzhiyun---------------------------------------------------------------------+----------------------------------+----------- 180*4882a593Smuzhiyun---------------------------------------- 181*4882a593SmuzhiyunTotal Weight Memory Size: 4260864 182*4882a593SmuzhiyunTotal Internal Memory Size: 2157568 183*4882a593SmuzhiyunPredict Internal Memory RW Amount: 11068320 184*4882a593SmuzhiyunPredict Weight Memory RW Amount: 4260832 185*4882a593SmuzhiyunPredict SRAM Hit RW Amount: 6688768 186*4882a593Smuzhiyun---------------------------------------- 187*4882a593Smuzhiyun``` 188*4882a593Smuzhiyun* 其中上面文本图表中的SramHit为当前层Tensor所占用的SRAM大小,一般情况下将会节省当前大小的读+写的带宽 189*4882a593Smuzhiyun* Predict SRAM Hit RW Amount表示整个网络SRAM的读写预测情况,可近似估计每帧节省的带宽 190*4882a593Smuzhiyun* 注意:Linux环境日志重定向到终端,Android环境日志重定向到logcat 191