1*4882a593Smuzhiyun## RKNPU2 2*4882a593Smuzhiyun RKNPU2 provides an advanced interface to access Rockchip NPU. 3*4882a593Smuzhiyun 4*4882a593Smuzhiyun## Support Platform 5*4882a593Smuzhiyun - RK3566/RK3568 6*4882a593Smuzhiyun - RK3588/RK3588S 7*4882a593Smuzhiyun - RV1103/RV1106 8*4882a593Smuzhiyun - RK3562 9*4882a593Smuzhiyun 10*4882a593SmuzhiyunNote: 11*4882a593Smuzhiyun The rknn model must be generated using RKNN Toolkit 2: https://github.com/rockchip-linux/rknn-toolkit2 12*4882a593Smuzhiyun 13*4882a593Smuzhiyun **For RK1808/RV1109/RV1126/RK3399Pro, please use:** 14*4882a593Smuzhiyun 15*4882a593Smuzhiyun https://github.com/rockchip-linux/rknn-toolkit 16*4882a593Smuzhiyun 17*4882a593Smuzhiyun https://github.com/rockchip-linux/rknpu 18*4882a593Smuzhiyun 19*4882a593Smuzhiyun https://github.com/airockchip/RK3399Pro_npu 20*4882a593Smuzhiyun 21*4882a593Smuzhiyun## ReleaseLog 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun# 1.5.0 24*4882a593Smuzhiyun 25*4882a593Smuzhiyun- Support RK3562 26*4882a593Smuzhiyun- Support more NPU operator fuse, such as Conv-Silu/Conv-Swish/Conv-Hardswish/Conv-sigmoid/Conv-HardSwish/Conv-Gelu .. 27*4882a593Smuzhiyun- Improve support for NHWC output layout 28*4882a593Smuzhiyun- RK3568/RK3588:The maximum input resolution up to 8192 29*4882a593Smuzhiyun- Improve support for Swish/DataConvert/Softmax/Lstm/LayerNorm/Gather/Transpose/Mul/Maxpool/Sigmoid/Pad 30*4882a593Smuzhiyun- Improve support for CPU operators (Cast, Sin, Cos, RMSNorm, ScalerND, GRU) 31*4882a593Smuzhiyun- Limited support for dynamic resolution 32*4882a593Smuzhiyun- Provide MATMUL API 33*4882a593Smuzhiyun- Add RV1103/RV1106 rknn_server application as proxy between PC and board 34*4882a593Smuzhiyun- Add more examples such as rknn_dynamic_shape_input_demo and video demo for yolov5 35*4882a593Smuzhiyun- Bug fix 36*4882a593Smuzhiyun 37*4882a593Smuzhiyun 38*4882a593Smuzhiyun 39*4882a593Smuzhiyun### 1.4.0 40*4882a593Smuzhiyun 41*4882a593Smuzhiyun- Support more NPU operators, such as Reshape、Transpose、MatMul、 Max、Min、exGelu、exSoftmax13、Resize etc. 42*4882a593Smuzhiyun- Add **Weight Share** function, reduce memory usage. 43*4882a593Smuzhiyun- Add **Weight Compression** function, reduce memory and bandwidth usage.(RK3588/RV1103/RV1106) 44*4882a593Smuzhiyun- RK3588 supports storing weights or feature maps on SRAM, reducing system bandwidth consumption. 45*4882a593Smuzhiyun- RK3588 adds the function of running a single model on multiple cores at the same time. 46*4882a593Smuzhiyun- Add new output layout NHWC (C has alignment restrictions) . 47*4882a593Smuzhiyun- Improve support for non-4D input. 48*4882a593Smuzhiyun- Add more examples such as rknn_yolov5_android_apk_demo and rknn_internal_mem_reuse_demo. 49*4882a593Smuzhiyun- Bug fix. 50*4882a593Smuzhiyun 51*4882a593Smuzhiyun### 1.3.0 52*4882a593Smuzhiyun 53*4882a593Smuzhiyun- Support RV1103/RV1106(Beta SDK) 54*4882a593Smuzhiyun- rknn_tensor_attr support w_stride(rename from stride) and h_stride 55*4882a593Smuzhiyun- Rename rknn_destroy_mem() 56*4882a593Smuzhiyun- Support more NPU operators, such as Where, Resize, Pad, Reshape, Transpose etc. 57*4882a593Smuzhiyun- RK3588 support multi-batch multi-core mode 58*4882a593Smuzhiyun- When RKNN_LOG_LEVEL=4, it supports to display the MACs utilization and bandwidth occupation of each layer. 59*4882a593Smuzhiyun- Bug fix 60*4882a593Smuzhiyun 61*4882a593Smuzhiyun### 1.2.0 62*4882a593Smuzhiyun 63*4882a593Smuzhiyun- Support RK3588 64*4882a593Smuzhiyun- Support more operators, such as GRU、Swish、LayerNorm etc. 65*4882a593Smuzhiyun- Reduce memory usage 66*4882a593Smuzhiyun- Improve zero-copy interface implementation 67*4882a593Smuzhiyun- Bug fix 68*4882a593Smuzhiyun 69*4882a593Smuzhiyun### 1.1.0 70*4882a593Smuzhiyun 71*4882a593Smuzhiyun - Support INT8+FP16 mixed quantization to improve model accuracy 72*4882a593Smuzhiyun - Support specifying input and output dtype, which can be solidified into the model 73*4882a593Smuzhiyun - Support multiple inputs of the model with different channel mean/std 74*4882a593Smuzhiyun - Improve the stability of multi-thread + multi-process runtime 75*4882a593Smuzhiyun - Support flashing cache for fd pointed to internal tensor memory which are allocated by users 76*4882a593Smuzhiyun - Improve dumping internal layer results of the model 77*4882a593Smuzhiyun - Add rknn_server application as proxy between PC and board 78*4882a593Smuzhiyun - Support more operators, such as HardSigmoid、HardSwish、Gather、ReduceMax、Elu 79*4882a593Smuzhiyun - Add LSTM support (structure cifg and peephole are not supported, function: layernormal, clip is not supported) 80*4882a593Smuzhiyun - Bug fix 81*4882a593Smuzhiyun 82*4882a593Smuzhiyun 83*4882a593Smuzhiyun### 1.0 84*4882a593Smuzhiyun - Optimize the performance of rknn_inputs_set() 85*4882a593Smuzhiyun - Add more functions for zero-copy 86*4882a593Smuzhiyun - Add new OP support, see OP support list document for details. 87*4882a593Smuzhiyun - Add multi-process support 88*4882a593Smuzhiyun - Support per-channel quantitative model 89*4882a593Smuzhiyun - Bug fix 90*4882a593Smuzhiyun 91*4882a593Smuzhiyun 92*4882a593Smuzhiyun### 0.7 93*4882a593Smuzhiyun - Optimize the performance of rknn_inputs_set(), especially for models whose input width is 8-byte aligned. 94*4882a593Smuzhiyun - Add new OP support, see OP support list document for details. 95*4882a593Smuzhiyun - Bug fix 96*4882a593Smuzhiyun 97*4882a593Smuzhiyun### 0.6 98*4882a593Smuzhiyun - Initial version 99*4882a593Smuzhiyun 100