1*4882a593SmuzhiyunThe most frequent cause of problems when porting U-Boot to new 2*4882a593Smuzhiyunhardware, or when using a sloppy port on some board, is memory errors. 3*4882a593SmuzhiyunIn most cases these are not caused by failing hardware, but by 4*4882a593Smuzhiyunincorrect initialization of the memory controller. So it appears to 5*4882a593Smuzhiyunbe a good idea to always test if the memory is working correctly, 6*4882a593Smuzhiyunbefore looking for any other potential causes of any problems. 7*4882a593Smuzhiyun 8*4882a593SmuzhiyunU-Boot implements 3 different approaches to perform memory tests: 9*4882a593Smuzhiyun 10*4882a593Smuzhiyun1. The get_ram_size() function (see "common/memsize.c"). 11*4882a593Smuzhiyun 12*4882a593Smuzhiyun This function is supposed to be used in each and every U-Boot port 13*4882a593Smuzhiyun determine the presence and actual size of each of the potential 14*4882a593Smuzhiyun memory banks on this piece of hardware. The code is supposed to be 15*4882a593Smuzhiyun very fast, so running it for each reboot does not hurt. It is a 16*4882a593Smuzhiyun little known and generally underrated fact that this code will also 17*4882a593Smuzhiyun catch 99% of hardware related (i. e. reliably reproducible) memory 18*4882a593Smuzhiyun errors. It is strongly recommended to always use this function, in 19*4882a593Smuzhiyun each and every port of U-Boot. 20*4882a593Smuzhiyun 21*4882a593Smuzhiyun2. The "mtest" command. 22*4882a593Smuzhiyun 23*4882a593Smuzhiyun This is probably the best known memory test utility in U-Boot. 24*4882a593Smuzhiyun Unfortunately, it is also the most problematic, and the most 25*4882a593Smuzhiyun useless one. 26*4882a593Smuzhiyun 27*4882a593Smuzhiyun There are a number of serious problems with this command: 28*4882a593Smuzhiyun 29*4882a593Smuzhiyun - It is terribly slow. Running "mtest" on the whole system RAM 30*4882a593Smuzhiyun takes a _long_ time before there is any significance in the fact 31*4882a593Smuzhiyun that no errors have been found so far. 32*4882a593Smuzhiyun 33*4882a593Smuzhiyun - It is difficult to configure, and to use. And any errors here 34*4882a593Smuzhiyun will reliably crash or hang your system. "mtest" is dumb and has 35*4882a593Smuzhiyun no knowledge about memory ranges that may be in use for other 36*4882a593Smuzhiyun purposes, like exception code, U-Boot code and data, stack, 37*4882a593Smuzhiyun malloc arena, video buffer, log buffer, etc. If you let it, it 38*4882a593Smuzhiyun will happily "test" all such areas, which of course will cause 39*4882a593Smuzhiyun some problems. 40*4882a593Smuzhiyun 41*4882a593Smuzhiyun - It is not easy to configure and use, and a large number of 42*4882a593Smuzhiyun systems are seriously misconfigured. The original idea was to 43*4882a593Smuzhiyun test basically the whole system RAM, with only exempting the 44*4882a593Smuzhiyun areas used by U-Boot itself - on most systems these are the areas 45*4882a593Smuzhiyun used for the exception vectors (usually at the very lower end of 46*4882a593Smuzhiyun system memory) and for U-Boot (code, data, etc. - see above; 47*4882a593Smuzhiyun these are usually at the very upper end of system memory). But 48*4882a593Smuzhiyun experience has shown that a very large number of ports use 49*4882a593Smuzhiyun pretty much bogus settings of CONFIG_SYS_MEMTEST_START and 50*4882a593Smuzhiyun CONFIG_SYS_MEMTEST_END; this results in useless tests (because 51*4882a593Smuzhiyun the ranges is too small and/or badly located) or in critical 52*4882a593Smuzhiyun failures (system crashes). 53*4882a593Smuzhiyun 54*4882a593Smuzhiyun Because of these issues, the "mtest" command is considered depre- 55*4882a593Smuzhiyun cated. It should not be enabled in most normal ports of U-Boot, 56*4882a593Smuzhiyun especially not in production. If you really need a memory test, 57*4882a593Smuzhiyun then see 1. and 3. above resp. below. 58*4882a593Smuzhiyun 59*4882a593Smuzhiyun3. The most thorough memory test facility is available as part of the 60*4882a593Smuzhiyun POST (Power-On Self Test) sub-system, see "post/drivers/memory.c". 61*4882a593Smuzhiyun 62*4882a593Smuzhiyun If you really need to perform memory tests (for example, because 63*4882a593Smuzhiyun it is mandatory part of your requirement specification), then 64*4882a593Smuzhiyun enable this test which is generic and should work on all archi- 65*4882a593Smuzhiyun tectures. 66*4882a593Smuzhiyun 67*4882a593SmuzhiyunWARNING: 68*4882a593Smuzhiyun 69*4882a593SmuzhiyunIt should pointed out that _all_ these memory tests have one 70*4882a593Smuzhiyunfundamental, unfixable design flaw: they are based on the assumption 71*4882a593Smuzhiyunthat memory errors can be found by writing to and reading from memory. 72*4882a593SmuzhiyunUnfortunately, this is only true for the relatively harmless, usually 73*4882a593Smuzhiyunstatic errors like shorts between data or address lines, unconnected 74*4882a593Smuzhiyunpins, etc. All the really nasty errors which will first turn your 75*4882a593Smuzhiyunhair gray, only to make you tear it out later, are dynamical errors, 76*4882a593Smuzhiyunwhich usually happen not with simple read or write cycles on the bus, 77*4882a593Smuzhiyunbut when performing back-to-back data transfers in burst mode. Such 78*4882a593Smuzhiyunaccesses usually happen only for certain DMA operations, or for heavy 79*4882a593Smuzhiyuncache use (instruction fetching, cache flushing). So far I am not 80*4882a593Smuzhiyunaware of any freely available code that implements a generic, and 81*4882a593Smuzhiyunefficient, memory test like that. The best known test case to stress 82*4882a593Smuzhiyuna system like that is to boot Linux with root file system mounted over 83*4882a593SmuzhiyunNFS, and then build some larger software package natively (say, 84*4882a593Smuzhiyuncompile a Linux kernel on the system) - this will cause enough context 85*4882a593Smuzhiyunswitches, network traffic (and thus DMA transfers from the network 86*4882a593Smuzhiyuncontroller), varying RAM use, etc. to trigger any weak spots in this 87*4882a593Smuzhiyunarea. 88*4882a593Smuzhiyun 89*4882a593SmuzhiyunNote: An attempt was made once to implement such a test to catch 90*4882a593Smuzhiyunmemory problems on a specific board. The code is pretty much board 91*4882a593Smuzhiyunspecific (for example, it includes setting specific GPIO signals to 92*4882a593Smuzhiyunprovide triggers for an attached logic analyzer), but you can get an 93*4882a593Smuzhiyunidea how it works: see "examples/standalone/test_burst*". 94*4882a593Smuzhiyun 95*4882a593SmuzhiyunNote 2: Ironically enough, the "test_burst" did not catch any RAM 96*4882a593Smuzhiyunerrors, not a single one ever. The problems this code was supposed 97*4882a593Smuzhiyunto catch did not happen when accessing the RAM, but when reading from 98*4882a593SmuzhiyunNOR flash. 99