» Assessing the damage

Assessing the damage

In AVR, Hardware on May 12, 2011 at 00:01

Yesterday’s post was about wearing out the EEPROM memory on an ATmega168. It took over 6.7 million writes to address zero to make it fail. As mentioned, this is a limited test, because reading out the value was done right after writing it. But hey, at least I didn’t have to write 100,000 times to each EEPROM address and wait 20 years @ 85°C to find out whether each byte was still correct…

So we got a failure, now what?

Well, re-running the test caused it to fail after a mere 6,200 write cycles, so that EEPROM definitely isn’t up to spec anymore.

Today, I wanted to see what effect this single massive rewrite of address zero did to the rest of the EEPROM. So I wrote another test which would go through the entire EEPROM and see how well writes + read-backs would work on this same ATmega168 with its massively-abused EEPROM:

Screen Shot 2011 05 11 at 17.23.32

All failures are counted, per EEPROM address. Whenever the maximum number of failures at any single address increases, a map is printed with the counts for each of the 512 bytes. Here’s the startup situation – no errors yet:

Screen Shot 2011 05 11 at 17.23.10

The map counts are encoded as single characters:

 0         =  "."
 1 to 9    =  "1" .. "9"
10 to 35   =  "a" .. "z"
36 to 61   =  "A" .. "Z"
62 and up  =  "?"

Every 100 cycles, the cycle count is printed, just to let me know that the sketch is still running. Unfortunately, re-writing all 512 bytes in EEPROM is way slower than yesterday’s test, over 1 second per cycle. So I also added an LED to toggle on each cycle:

Dsc 2503

Now the waiting begins… it’s going to take hours (maybe even days!) to force the next failure again.

And one thing is clear: it’s not easy to get these chips to fail quickly!

Update – 100,000 cycles later, no new failure yet…

▶ View 6 Comments

Is there any reason to believe that EEPROM locations other than 0 are broken?

Osma Suominen 12 May 2011 at 12pm
- Good Q. My hunch is that EEPROM is banked, and that single-byte writes might well do larger erase/writes underneath, so this was an experiment to test just that. If there is banking going on behind the scenes, I would expect a few bytes above address 0 to also fail occasionally.
  
  jcw 12 May 2011 at 12pm
So the memory is not defect, but just unreliable. If you would just would do a readback, compare and a rewrite in case it was incorrect you are able to use the memory untill all the smoke of that one byte is released ? (as in: “A chip is defect when the magical smoke is released”).

Milan 12 May 2011 at 1pm
If there was some load-levelling scheme implemented behind the scenes, the entire memory might be worn out, after virtual address 0 got successively remapped to every other address. Not sure how likely that kind of complexity is in this case, though.

JBeale 12 May 2011 at 11pm
Did it fail yet ?

Milan 16 May 2011 at 2pm
- No, I disconnected it after two days. Should probably redo the test with a fresh ATmega to draw conclusions.
  
  jcw 19 May 2011 at 12am

Comments are closed.

Computing stuff tied to the physical world

Assessing the damage

Recent Posts

Archives

About

Tags