1*4882a593SmuzhiyunAMCC suggested to set the PMU bit to 0 for best performace on the 2*4882a593SmuzhiyunPPC440 DDR controller. The 440er common DDR setup files (sdram.c & 3*4882a593Smuzhiyunspd_sdram.c) are changed accordingly. So all 440er boards using 4*4882a593Smuzhiyunthese setup routines will automatically receive this performance 5*4882a593Smuzhiyunincrease. 6*4882a593Smuzhiyun 7*4882a593SmuzhiyunPlease see below some benchmarks done by AMCC to demonstrate this 8*4882a593Smuzhiyunperformance changes: 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun 11*4882a593Smuzhiyun---------------------------------------- 12*4882a593SmuzhiyunSDRAM0_CFG0[PMU] = 1 (U-Boot default for Bamboo, Yosemite and Yellowstone) 13*4882a593Smuzhiyun---------------------------------------- 14*4882a593SmuzhiyunStream benchmark results 15*4882a593Smuzhiyun------------------------------------------------------------- 16*4882a593SmuzhiyunThis system uses 8 bytes per DOUBLE PRECISION word. 17*4882a593Smuzhiyun------------------------------------------------------------- 18*4882a593SmuzhiyunArray size = 2000000, Offset = 0 19*4882a593SmuzhiyunTotal memory required = 45.8 MB. 20*4882a593SmuzhiyunEach test is run 10 times, but only 21*4882a593Smuzhiyunthe *best* time for each is used. 22*4882a593Smuzhiyun------------------------------------------------------------- 23*4882a593SmuzhiyunYour clock granularity/precision appears to be 1 microseconds. 24*4882a593SmuzhiyunEach test below will take on the order of 112345 microseconds. 25*4882a593Smuzhiyun (= 112345 clock ticks) 26*4882a593SmuzhiyunIncrease the size of the arrays if this shows that you are not getting 27*4882a593Smuzhiyunat least 20 clock ticks per test. 28*4882a593Smuzhiyun------------------------------------------------------------- 29*4882a593SmuzhiyunWARNING -- The above is only a rough guideline. 30*4882a593SmuzhiyunFor best results, please be sure you know the precision of your system 31*4882a593Smuzhiyuntimer. 32*4882a593Smuzhiyun------------------------------------------------------------- 33*4882a593SmuzhiyunFunction Rate (MB/s) RMS time Min time Max time 34*4882a593SmuzhiyunCopy: 256.7683 0.1248 0.1246 0.1250 35*4882a593SmuzhiyunScale: 246.0157 0.1302 0.1301 0.1302 36*4882a593SmuzhiyunAdd: 255.0316 0.1883 0.1882 0.1885 37*4882a593SmuzhiyunTriad: 253.1245 0.1897 0.1896 0.1899 38*4882a593Smuzhiyun 39*4882a593Smuzhiyun 40*4882a593SmuzhiyunTTCP Benchmark Results 41*4882a593Smuzhiyunttcp-t: socket 42*4882a593Smuzhiyunttcp-t: connect 43*4882a593Smuzhiyunttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000 tcp -> 44*4882a593Smuzhiyunlocalhost 45*4882a593Smuzhiyunttcp-t: 16777216 bytes in 0.28 real seconds = 454.29 Mbit/sec +++ 46*4882a593Smuzhiyunttcp-t: 2048 I/O calls, msec/call = 0.14, calls/sec = 7268.57 47*4882a593Smuzhiyunttcp-t: 0.0user 0.1sys 0:00real 60% 0i+0d 0maxrss 0+2pf 3+1506csw 48*4882a593Smuzhiyun 49*4882a593Smuzhiyun---------------------------------------- 50*4882a593SmuzhiyunSDRAM0_CFG0[PMU] = 0 (Suggested modification) 51*4882a593SmuzhiyunSetting PMU = 0 provides a noticeable performance improvement *2% to 52*4882a593Smuzhiyun5% improvement in memory performance. 53*4882a593Smuzhiyun*Improves the Mbit/sec for TTCP benchmark by almost 76%. 54*4882a593Smuzhiyun---------------------------------------- 55*4882a593SmuzhiyunStream benchmark results 56*4882a593Smuzhiyun------------------------------------------------------------- 57*4882a593SmuzhiyunThis system uses 8 bytes per DOUBLE PRECISION word. 58*4882a593Smuzhiyun------------------------------------------------------------- 59*4882a593SmuzhiyunArray size = 2000000, Offset = 0 60*4882a593SmuzhiyunTotal memory required = 45.8 MB. 61*4882a593SmuzhiyunEach test is run 10 times, but only 62*4882a593Smuzhiyunthe *best* time for each is used. 63*4882a593Smuzhiyun------------------------------------------------------------- 64*4882a593SmuzhiyunYour clock granularity/precision appears to be 1 microseconds. 65*4882a593SmuzhiyunEach test below will take on the order of 120066 microseconds. 66*4882a593Smuzhiyun (= 120066 clock ticks) 67*4882a593SmuzhiyunIncrease the size of the arrays if this shows that you are not getting 68*4882a593Smuzhiyunat least 20 clock ticks per test. 69*4882a593Smuzhiyun------------------------------------------------------------- 70*4882a593SmuzhiyunWARNING -- The above is only a rough guideline. 71*4882a593SmuzhiyunFor best results, please be sure you know the precision of your system 72*4882a593Smuzhiyuntimer. 73*4882a593Smuzhiyun------------------------------------------------------------- 74*4882a593SmuzhiyunFunction Rate (MB/s) RMS time Min time Max time 75*4882a593SmuzhiyunCopy: 262.5167 0.1221 0.1219 0.1223 76*4882a593SmuzhiyunScale: 258.4856 0.1238 0.1238 0.1240 77*4882a593SmuzhiyunAdd: 262.5404 0.1829 0.1828 0.1831 78*4882a593SmuzhiyunTriad: 266.8594 0.1800 0.1799 0.1802 79*4882a593Smuzhiyun 80*4882a593SmuzhiyunTTCP Benchmark Results 81*4882a593Smuzhiyunttcp-t: socket 82*4882a593Smuzhiyunttcp-t: connect 83*4882a593Smuzhiyunttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=5000 tcp -> 84*4882a593Smuzhiyunlocalhost 85*4882a593Smuzhiyunttcp-t: 16777216 bytes in 0.16 real seconds = 804.06 Mbit/sec +++ 86*4882a593Smuzhiyunttcp-t: 2048 I/O calls, msec/call = 0.08, calls/sec = 12864.89 87*4882a593Smuzhiyunttcp-t: 0.0user 0.0sys 0:00real 46% 0i+0d 0maxrss 0+2pf 120+1csw 88*4882a593Smuzhiyun 89*4882a593Smuzhiyun 90*4882a593Smuzhiyun2006-07-28, Stefan Roese <sr@denx.de> 91