Searching \ for '[EE] Unusual ARM problem looping in reset.' in subject line. ()
Make payments with PayPal - it's fast, free and secure! Help us get a faster server
FAQ page: techref.massmind.org/techref/index.htm?key=unusual+arm+problem
Search entire site for: 'Unusual ARM problem looping in reset.'.

Exact match. Not showing close matches.
PICList Thread
'[EE] Unusual ARM problem looping in reset.'
2011\10\06@103024 by David VanHorn

picon face
I am casting wide for any assistance on this issue.  I've spent
several days at Atmel HQ with their people on this, and we still have
no resolution.
I've also posted this on a couple of arm specific forums.

We are using an AT91SAM7S256, setting up the WDT to fire an internal
and external reset.

The problem is that the chip hangs in a reset loop appx every 20mS.
The WDT can be set for any timeout up to the 16 second max, and we get
the same 20mS loop.
At times the reset loop will happen "forever", or it will happen
several times then boot properly.

We thought that this was a bad PCB issue (happens on Atmels dev kit as
well), PLL issue (even happens when the PLL is not used.) Crystal
startup issue (problem also happens with other types of crystals, and
oscillator startup looks good) VDD risetime issue (verified risetime
significantly faster than required), bypassing issue (PCB runs fine
with almost all bypass caps removed, adding more does not affect the
problem).

Interestingly, the WDT values that are listed as "working" values for
the rev A silicon seem to cause the problem to appear less often than
other values.
www.ledato.de/download/SAM7S256_128_errata_%28update-13Nov%29.pdf
We initially encountered the problem on Rev B chips, and have
verified that it also happens in rev C.

We have seen this problem with the window enabled or disabled, and we
intend to run with the window disabled. If we are having a problem
with a given board, we can cause the chip to boot up properly by
either heating or cooling by a fraction of a degree, or by applying a
slight torque/twist to the PCB. The direction of the force is
important, but is not the same for two different boards. Once a given
board has booted properly, it is extremely robust. They survive power
line disturbances at 2.5kV with 10nS rise time, EMI at >190V/M, and
ESD events at 16kV while operating.

While the mechanical sensitivity would seem to indicate a PCB problem,
we have replicated this on an Atmel evaluation kit board. The amount
of flex varies, as little as 1/16th inch over 8" of board length to
take it into or out of failure. The thermal sensitivity is also
extreme, heat from a slight touch of finger on the CPU for only a
couple seconds is enough to induce or stop the rebooting. I can't
imagine the die temperature is changing more than a fraction of a
degree.

We implemented an extremely stripped down version of the code that
only flashes some LEDs to indicate that the application is running,
and this code also exhibits the problem.

The best that we have been able to do so far is to isolate that it
seems to do with when (relative to rise of /RESET) the WDT is initted.
For a given chip, if we change where in time the WDT is configured
using NOPs, we can create or eliminate the problem on that board. Some
chips don't seem to exhibit the problem but since temperature, timing,
and PCB flex all seem to be part of the equation we are only
comfortable saying that a given system "has not been observed to have
the problem".

There may be some relationship with the phase or timing of the slow
clock, at the time that the WDT is configured, but we have not been
able to find anything yet.

Out of a couple hundred of our boards, roughly 10% exhibit the problem
with a given code set. If we had three boards fall out from a batch
"A", "B", and "C" then if we change the time before the WDT is initted
and re-program the batch, maybe "A", "D", and "W" would fail. We have
a codeset that has never been observed to fail, but given the nature
of the problem, we are extremely nervous.

We have been through the errata and the data sheet extensively both by
ourselves and at Atmel in San Jose, with their technical people. So
far nobody can explain why we are seeing this.

Has anyone here seen this problem? Solved it?

Thanks.
(reposted with EE tag that I'd forgotten

2011\10\07@064205 by Geo

picon face
David VanHorn wrote:

> The problem is that the chip hangs in a reset loop appx every 20mS.

If you were in the UK (I don't know) then I would start thinking AC mains pickup.


> While the mechanical sensitivity would seem to indicate a PCB problem,
> we have replicated this on an Atmel evaluation kit board. The amount
> of flex varies, as little as 1/16th inch over 8" of board length to
> take it into or out of failure.

With fingers or long insulated prods?

 >The thermal sensitivity is also
> extreme, heat from a slight touch of finger on the CPU for only a
> couple seconds is enough to induce or stop the rebooting.

But your finger could be introducing more AC mains pick-up rather than heat..

Sorry - I know little about PICS - do you have any pin which becomes an unconnected input at any time?

George Smith

2011\10\07@072710 by Geo

picon face
Geo wrote

> I know little about PICS

Duh! - stupid (late night) - ignore that please.

George Smit

2011\10\07@081233 by alan.b.pearce

face picon face
> Geo wrote
>
>  > I know little about PICS
>
> Duh! - stupid (late night) - ignore that please.

But I suspect the same applies to other micros as well, for the same reasons.
-- Scanned by iCritical.

2011\10\07@104807 by David VanHorn

picon face
> If you were in the UK (I don't know) then I would start thinking AC
> mains pickup.

Happens on DC power or AC. 50/60 Hz, with or without large noise spikes.

> With fingers or long insulated prods?

Both.

> But your finger could be introducing more AC mains pick-up rather than heat.

Heated or chilled screwdriver also affects the chip strongly.

> Sorry - I know little about PICS - do you have any pin which becomes an
> unconnected input at any time?

Nope, all I/O is set up for outputs except those which by design are
inputs, and they have pullups

2011\10\07@140350 by David VanHorn

picon face
>> But your finger could be introducing more AC mains pick-up rather than heat.
>
> Heated or chilled screwdriver also affects the chip strongly.

Gently breathing on it also will show the thermal effect, which is
quite impressive

2011\10\07@141845 by Josh Koffman

face picon face
On Fri, Oct 7, 2011 at 2:03 PM, David VanHorn <spam_OUTmicrobrixTakeThisOuTspamgmail.com> wrote:
>>> But your finger could be introducing more AC mains pick-up rather than heat.
>>
>> Heated or chilled screwdriver also affects the chip strongly.
>
> Gently breathing on it also will show the thermal effect, which is
> quite impressive.

Hi Dave,

I'm curious what Atmel think. Could this be some sort of issue with
their packaging process? You said that it only happens about 10% of
the time, correct? What if you swap the chip out on a defective board,
does it continue or stop being a problem?

Josh
-- A common mistake that people make when trying to design something
completely foolproof is to underestimate the ingenuity of complete
fools.
        -Douglas Adams

2011\10\07@145937 by Dave

picon face

It varies by the code loaded, and happens on three die revs.  A given chip will have the problem with one code rev and not the other. It doesn't seem to isolate to packages.

Josh Koffman <.....joshybearKILLspamspam@spam@gmail.com> wrote:

{Quote hidden}

>

2011\10\07@152750 by IVP

face picon face

> Gently breathing on it also will show the thermal effect, which is
> quite impressive.

I had this girlfriend once .......

If you can change the behaviour of the chip like that (and the WDT
initiation timing) it must indicate a physical problem - ie the silicon or
the bonding perhaps. I've had a project where the order of setting
various module configurations in the power-up section, all prefectly
legitimate ways according to the datasheet, could stop the ADC
working properly

And you might have missed a post of mine a couple of months back.
How the SPI module (very suspect construction IMVHO) on a dsPIC
was used affected even the RETURN instruction

Which is just nutty. Silicon nutty. I went through both problems with
Microchip - just as you have with Atmel - they could offer no
explanations or solutions, so I had to find my own

Jo

2011\10\07@160530 by David VanHorn

picon face
> Which is just nutty. Silicon nutty. I went through both problems with
> Microchip - just as you have with Atmel - they could offer no
> explanations or solutions, so I had to find my own


That's what we're suspecting now.  We have a code version that has not
exhibited the problem, but the only difference is in the exact
ordering of a few instructions that should not affect anything.  We
aren't very happy with that as a solution.

Devices that exhibit the problem have been run back thru final test at
the factory, and found to pass

2011\10\07@161702 by Carey Fisher

face picon face
On Fri, Oct 7, 2011 at 4:05 PM, David VanHorn <.....microbrixKILLspamspam.....gmail.com> wrote:

> > Which is just nutty. Silicon nutty. I went through both problems with
> > Microchip - just as you have with Atmel - they could offer no
> > explanations or solutions, so I had to find my own
>
>
> That's what we're suspecting now.  We have a code version that has not
> exhibited the problem, but the only difference is in the exact
> ordering of a few instructions that should not affect anything.  We
> aren't very happy with that as a solution.
>
> Devices that exhibit the problem have been run back thru final test at
> the factory, and found to pass.
>
>
Maybe it's a design problem - a marginal race condition or something in the
chip's logic.

Carey Fisher
Chief Technical Officer
New Communications Solutions, LLC
678-999-3956
EraseMEcareyfisherspam_OUTspamTakeThisOuTncsradio.co

2011\10\07@162438 by IVP

face picon face
> Silicon nutty

> That's what we're suspecting now

I think sometimes you just have to roll your eyes and put it down
as a Life Lesson. I'm sure occassionally chip manufacturers can't/
won't admit/find a problem for various reasons, which we could
probably have a good guess at

It's an arms race and mistakes will be made

Jo

2011\10\07@162817 by David VanHorn

picon face
> Maybe it's a design problem - a marginal race condition or something in the
> chip's logic.


Could be..  I have one here now, that failed here, and at Atmel.  It's
stubbornly refusing to fail with exactly the same code loaded on it
that it had when it was failing. .

I've been prodding it trying to excite the failure, so that I can
further narrow the box

2011\10\07@163032 by David VanHorn

picon face
> It's an arms race and mistakes will be made

Indeed, and I'm not laying any blame.  As I told them, "All chips got
warts".  But we need at least a solid workaround.

My hope here, was that some of you guys might have seen this, and know
where it lives.
I've also posted on a couple of arm boards, one is arm arch specific,
and the other is atmels AT91SAM7, no responses on either of those yet

More... (looser matching)
- Last day of these posts
- In 2011 , 2012 only
- Today
- New search...