Truncated match.
PICList
Thread
'16bit power of 10'
1998\12\12@162553
by
Sean Breheny
Hi all,
Here is my first completely PIC posting in a long time!
A while back I wrote a routine to give 16 bit powers of ten. I then
optimized it as best I could and here it is. I am submitting it for the
famous PICLIST size/speed optimization challenge.
This is intended to work only if it doesn't cross a 256-word boundary. (Its
short enough that I don't consider this a problem). rC and rD are regular
PIC 8-bit registers, and together they form a single 16-bit register pair,
with rC being the MSB. The routine is also isosynchronous. I admit that
this isn't exactly the hardest routine to write <G>, but I figured if there
were any optimazations to be made, they'd be found among the guru's on this
list.
; POW10
; Given W, where 0<=W<=4
; outputs rCrD=10^W
; destroys rCrD,W,Flags
pow10 movwf rD
call $+2
goto nxt
addwf PCL,F
retlw 0x00
retlw 0x00
retlw 0x00
retlw 0x03
retlw 0x27
nxt movwf rC
movf rD,W
call $+3
movwf rD
return
addwf PCL,F
retlw 0x01
retlw 0x0A
retlw 0x64
retlw 0xE8
retlw 0x10
END
Thanks,
Sean
+-------------------------------+
| Sean Breheny |
| Amateur Radio Callsign: KA3YXM|
| Electrical Engineering Student|
+-------------------------------+
Save lives, please look at http://www.all.org
Personal page: http://www.people.cornell.edu/pages/shb7
spam_OUTshb7TakeThisOuT
cornell.edu Phone(USA): (607) 253-0315 ICQ #: 3329174
1998\12\13@203930
by
Regulus Berdin
Hi,
The first part can be written as:
pow10 movwf rD
addlw -3
movlw 0
skpnc
movlw 0x27
skpnz
movlw 0x3
movwf rC
Call and goto are removed thus saving some cycles and code.
regards,
Reggie
Sean Breheny wrote:
{Quote hidden}> ; POW10
> ; Given W, where 0<=W<=4
> ; outputs rCrD=10^W
> ; destroys rCrD,W,Flags
>
> pow10 movwf rD
> call $+2
> goto nxt
> addwf PCL,F
> retlw 0x00
> retlw 0x00
> retlw 0x00
> retlw 0x03
> retlw 0x27
> nxt movwf rC
> movf rD,W
> call $+3
> movwf rD
> return
> addwf PCL,F
> retlw 0x01
> retlw 0x0A
> retlw 0x64
> retlw 0xE8
> retlw 0x10
> END
>
> Thanks,
>
> Sean
>
> +-------------------------------+
> | Sean Breheny |
> | Amateur Radio Callsign: KA3YXM|
> | Electrical Engineering Student|
> +-------------------------------+
> Save lives, please look at
http://www.all.org
> Personal page:
http://www.people.cornell.edu/pages/shb7
>
.....shb7KILLspam
@spam@cornell.edu Phone(USA): (607) 253-0315 ICQ #: 3329174
1998\12\13@205842
by
Sean Breheny
Hi Reggie,
Thanks! If I did the calculation right, your changes save 2 words and 1 cycle.
Anybody have further optimizations??
Thanks again,
Sean
{Quote hidden}>The first part can be written as:
>
>pow10 movwf rD
> addlw -3
> movlw 0
> skpnc
> movlw 0x27
> skpnz
> movlw 0x3
> movwf rC
>
+-------------------------------+
| Sean Breheny |
| Amateur Radio Callsign: KA3YXM|
| Electrical Engineering Student|
+-------------------------------+
Save lives, please look at http://www.all.org
Personal page: http://www.people.cornell.edu/pages/shb7
shb7
KILLspamcornell.edu Phone(USA): (607) 253-0315 ICQ #: 3329174
1998\12\13@210102
by
Scott Dattalo
|
On Sat, 12 Dec 1998, Sean Breheny wrote:
{Quote hidden}> Hi all,
>
> Here is my first completely PIC posting in a long time!
>
> A while back I wrote a routine to give 16 bit powers of ten. I then
> optimized it as best I could and here it is. I am submitting it for the
> famous PICLIST size/speed optimization challenge.
>
> This is intended to work only if it doesn't cross a 256-word boundary. (Its
> short enough that I don't consider this a problem). rC and rD are regular
> PIC 8-bit registers, and together they form a single 16-bit register pair,
> with rC being the MSB. The routine is also isosynchronous. I admit that
> this isn't exactly the hardest routine to write <G>, but I figured if there
> were any optimazations to be made, they'd be found among the guru's on this
> list.
>
>
> ; POW10
> ; Given W, where 0<=W<=4
> ; outputs rCrD=10^W
> ; destroys rCrD,W,Flags
>
> pow10 movwf rD
> call $+2
> goto nxt
> addwf PCL,F
> retlw 0x00
> retlw 0x00
> retlw 0x00
> retlw 0x03
> retlw 0x27
> nxt movwf rC
> movf rD,W
> call $+3
> movwf rD
> return
> addwf PCL,F
> retlw 0x01
> retlw 0x0A
> retlw 0x64
> retlw 0xE8
> retlw 0x10
> END
>
I count 20 cycles.
First, just so things are clear here's the C-code:
pow10_array[5] = {1, 0x0a, 0x64, 0x3e8, 0x2710};
unsigned int POW10(unsigned int power)
{
if(power<5)
return(pow10_array[power])
return(0xffff); /* error */
}
At the risk of being embarassed by Dmitry, how about this:
movwf rC
incf rC,f
btfsc rC,2
goto greater_than_2
movwf rC
movlw 1 ;Assume 10^0
; at this point, the bit pattern in rC
;either 00, 01, or 10
btfsc rC,0 ;if bit 0 is set then
movlw 0x0a ;10^1
btfsc rC,1 ;if bit 1 is set then
movlw 0x64 ;10^2
clrf rC ; for all three case, rC is zero
movwf rD
return
greater_than_2:
; at this point, the bit pattern in rC
;either 100 or 101
movlw 0x27 ;assume 10^4
btfss rC,0
goto ten_to_the_4th
movlw 2
movwf rC
movlw 0xe8
movwf rD
return
ten_to_the_4th
movwf rC
movlw 0x10
movwf rD
return
14 isochronous cycles. untested
You can save an instruction if you're willing to use an addlw:
addlw -2
skpnc
goto greater_than_2
;W = 0xfd, oxfe, of 0xff
movwf rC
movlw 1 ;Assume 10^0
; at this point, the bit pattern in rC
;either 11111110, 11111111, or 00000000
btfsc rC,0 ;if bit 0 is clear then
movlw 0x0a ;10^1
btfss rC,1 ;if bit 1 is set then
movlw 0x64 ;10^2
clrf rC ; for all three case, rC is zero
movwf rD
return
greater_than_2
movwf rC
; at this point, the bit pattern in rC
;either 00 or 0001
movlw 0x27 ;assume 10^4
btfsc rC,0
goto ten_to_the_4th
bsf rC,1
movlw 0xe8
movwf rD
nop
return
ten_to_the_4th
movwf rC
movlw 0x10
movwf rD
return
13 cycles isochronous. untested...
There's probably another trick or two hiding in there
Scott
1998\12\13@213810
by
Regulus Berdin
Hi Sean and Scott,
Here is a 10 cycle version (untested):
pow10 addwf PCL,f
goto p0
goto p1
goto p2
goto p3
p4 goto $+1
movlw 0x27
movwf rC
movlw 0x10
movwf rD
return
p0 clrf rC
nop
movlw 1
movwf rD
return
p1 clrf rC
nop
movlw 0x0A
movwf rD
return
p2 clrf rC
nop
movlw 0x64
movwf rD
return
p3 movlw 3
movwf rC
movlw 0xE8
movwf rD
return
This routine is fast but consumes more code space.
regards,
Reggie
1998\12\13@214430
by
Regulus Berdin
Sean Breheny wrote:
> Thanks! If I did the calculation right, your changes save 2 words and 1 cycle.
Actually 2 cycles. The first part is 10 cycles while my first routine is
8 cycles.
> Anybody have further optimizations??
See Scott's and my 2nd posting. It has only 10 cycles but consumes more
code space.
regards,
Reggie
1998\12\13@214434
by
Sean Breheny
Hi Scott,
Thanks very much! More great optimization! I just have a couple of
questions if you don't mind:
At 05:59 PM 12/13/98 -0800, you wrote:
[SNIP]
>I count 20 cycles.
I assume you are including the (unlisted) call pow10 instruction which
calls the routine? If you aren't, I count only 18 cycles to my original
routine.
>14 isochronous cycles. untested
>13 cycles isochronous. untested...
So the word is "isochronous"? I think I have seen several versions and I
have always wondered which was correct. Your version sounds more correct
than the "isoSYNchronous" that I used in my original post.
Thanks again,
Sean
+-------------------------------+
| Sean Breheny |
| Amateur Radio Callsign: KA3YXM|
| Electrical Engineering Student|
+-------------------------------+
Save lives, please look at http://www.all.org
Personal page: http://www.people.cornell.edu/pages/shb7
.....shb7KILLspam
.....cornell.edu Phone(USA): (607) 253-0315 ICQ #: 3329174
1998\12\13@215920
by
Sean Breheny
Hi Reggie and Scott,
yeah, I was wrong on the cycles. I was counting ADDWF PCL,F as only one cycle.
Thanks,
Sean
At 10:45 AM 12/14/98 +0800, you wrote:
>Sean Breheny wrote:
>> Thanks! If I did the calculation right, your changes save 2 words and 1
cycle.
>Actually 2 cycles. The first part is 10 cycles while my first routine is
>8 cycles.
>
>> Anybody have further optimizations??
>See Scott's and my 2nd posting. It has only 10 cycles but consumes more
>code space.
>
>regards,
>Reggie
>
+-------------------------------+
| Sean Breheny |
| Amateur Radio Callsign: KA3YXM|
| Electrical Engineering Student|
+-------------------------------+
Save lives, please look at http://www.all.org
Personal page: http://www.people.cornell.edu/pages/shb7
EraseMEshb7spam_OUT
TakeThisOuTcornell.edu Phone(USA): (607) 253-0315 ICQ #: 3329174
1998\12\13@222219
by
Scott Dattalo
On Mon, 14 Dec 1998, Regulus Berdin wrote:
>
> Here is a 10 cycle version (untested):
which can be shortened 1 cycle
>
> pow10 addwf PCL,f
> goto p0
> goto p1
> goto p2
> goto p3
>
>; p4 goto $+1
p4 nop
{Quote hidden}> movlw 0x27
> movwf rC
> movlw 0x10
> movwf rD
> return
>
> p0 clrf rC
>; nop
> movlw 1
> movwf rD
> return
>
> p1 clrf rC
>; nop
> movlw 0x0A
> movwf rD
> return
>
> p2 clrf rC
>; nop
> movlw 0x64
> movwf rD
> return
>
>; p3 movlw 3
>p3 movwf rC
> movlw 0xE8
> movwf rD
> return
>
1998\12\14@000820
by
Mike Keitz
|
On Sat, 12 Dec 1998 16:23:50 -0500 Sean Breheny <shb7
spam_OUTCORNELL.EDU>
writes:
>Hi all,
>
>Here is my first completely PIC posting in a long time!
>
>A while back I wrote a routine to give 16 bit powers of ten.
>and together they form a single 16-bit register
>pair,
>with rC being the MSB.
There is probably a better way to do what you're trying to do (overall)
than storing powers of 10 in a table. But sometimes a table of 16-bit or
larger values is necessary. Usually for multiple-precision values I
build a single table, with multiple consecutive bytes per entry. That
keeps all the data bytes associated with each value close to each other.
It is fairly simple to multiply the index by 2,3,4 etc. to get the
address of the first byte, then add 1 to get the next bytes. Here's a
16-bit table of the first 5 powers of 10 and a way to access it, starting
with a number from 0 to 4 in W.
movwf rC ;Store index for later
clrc
rlf rC,f ;rC = index * 2
movlw low(tblpow10) ;Add start of table
addwf rC,f ; to the index. rC = PCL for
LSB of entry
; [Note if low(tblpow10) is known to be even, the computation above can
be
; considerably simplified *]
movlw high(tblpow10)
movwf PCLATH ;Set up PCLATH to match the
table.
; [Table can't cross 256-byte boundary!]
movfw rC ;Address of LSB
call gettbl ;Look up LSB
movwf rD ;Store in RAM.
incf rC,w ;Address of MSB is address of LSB
+ 1.
call gettbl ;Look up MSB
movwf rC ;Store MSB
tblpow10
dt 01,00 ;10^0 = 1
dt 0A,00 ;10^1 = 10
dt 64,00 ;10^2 = 100
dt E8,03 ;10^3 = 1000
dt 10,27 ;10^4 = 10000
gettbl
movwf PCL
Note that the gettbl routine can be anywhere in a 2K block of memory, and
shared to access all tables in the 2K block. I like to place it in the
usually unused space between the reset-vector goto at address 0 and the
ISR at address 4.
* If the table is guaranteed to start at an even address, the address of
the desired element can be computed this way, replace the first 5
instructions in the routine with these 3:
addlw low(tblpow10) / 2 ;Add start address,
compensating
movwf rC ; for subsequent * 2.
rlf rC,f ;Guaranteed that C=0 from the addlw.
Again, it begs the question of why are you storing powers of 10 in the
first place. If it is for BCD conversions, the usual optimized ways
don't need stored powers of 10.
___________________________________________________________________
You don't need to buy Internet access to use free Internet e-mail.
Get completely free e-mail from Juno at http://www.juno.com/getjuno.html
or call Juno at (800) 654-JUNO [654-5866]
1998\12\14@002302
by
Sean Breheny
Hi Mike,
At 10:03 AM 12/13/98 -0500, you wrote:
>Again, it begs the question of why are you storing powers of 10 in the
>first place. If it is for BCD conversions, the usual optimized ways
>don't need stored powers of 10.
Yes, I am doing it for BCD conversion, overall, and I realize that there
are better ways of doing it(in fact, I asked the list about that very
question about a month or two ago). For my application, neither speed nor
size are critical(hence I didn't bother to use a better method, suggested
by some list members), but I just wanted to try to see how well optimized a
part of my code was, and learn about optimizing in general by posting it to
the list and seeing what people came up with.
Incidentally, while I'm sure that the better ways of doing binary to BCD
conversion are much smaller and somewhat faster, my routine is neither
unacceptably slow or large. Even with my bulky BCD routine and 16 bit add
and subtract routines included, it is 60 words long (for 16 bit binary to
BCD).
Thanks,
Sean
+-------------------------------+
| Sean Breheny |
| Amateur Radio Callsign: KA3YXM|
| Electrical Engineering Student|
+-------------------------------+
Save lives, please look at http://www.all.org
Personal page: http://www.people.cornell.edu/pages/shb7
@spam@shb7KILLspam
cornell.edu Phone(USA): (607) 253-0315 ICQ #: 3329174
1998\12\14@011533
by
Dmitry Kiryashov
Bravo Scott !
Looks like maximal optimized, speeding up more is impossible...
WBR Dmitry.
Scott Dattalo wrote:
{Quote hidden}>
> which can be shortened 1 cycle
>
> >
> > pow10 addwf PCL,f
> > goto p0
> > goto p1
> > goto p2
> > goto p3
> >
> >; p4 goto $+1
> p4 nop
> > movlw 0x27
> > movwf rC
> > movlw 0x10
> > movwf rD
> > return
> >
> > p0 clrf rC
> >; nop
> > movlw 1
> > movwf rD
> > return
> >
> > p1 clrf rC
> >; nop
> > movlw 0x0A
> > movwf rD
> > return
> >
> > p2 clrf rC
> >; nop
> > movlw 0x64
> > movwf rD
> > return
> >
> >; p3 movlw 3
> >p3 movwf rC
> > movlw 0xE8
> > movwf rD
> > return
> >
1998\12\14@141143
by
Adriano De Minicis
|
Here is another little optimization to Scott's code.
I admit it is not elegant, but Sean wanted speed/compact code... :-)
I modified the routine to return the low byte in W and not in rD.
This way I could substitute the group MOVLW x / MOVWF rD / RETURN
with RETLW x, and add a MOVWF rD outside the routine (if needed).
The space is reduced from 27 to 17 words, and the execution time
from a total of 11 to 9 cycles (including CALL).
That's 10 words and 2 cycles savings!
But you may argue "I need the result in rD"!
OK, just add a MOVWF rD in the calling place (after CALL POW10).
This wastes a word and a cycle, but it's however faster than Scott's
code (10 cycles including CALL and MOVWF).
Size: 17 words + 1 extra word for each calling point.
Adriano
; POW10
; Given W, where 0<=W<=4
; outputs rC,W = 10^W
; destroys rC,W,Flags
; 7 cycles (9 including call), 17 words, untested
;
; NOTE:
; Result is in rC,W. To have result in rC,rD just add a "movwf rD"
; in the main code after the call.
; call POW10
; movwf rD
; (Total 10 cycles, included call and movwf rD)
pow10 addwf PCL,f
goto p0
goto p1
goto p2
goto p3
p4 movlw 0x27
movwf rC
nop
retlw 0x10
p0 clrf rC
retlw 0x01
p1 clrf rC
retlw 0x0A
p2 clrf rC
retlw 0x64
p3 movwf rC ; W=3
retlw 0xE8
1998\12\14@194425
by
Regulus Berdin
Hi Adriano,
This can be done also by (2 code space less):
pow10 movwf rD
addwf rD,w ;rD*2
addwf PCL,f
p0 clrf rC
retlw 0x01
p1 clrf rC
retlw 0x0A
p2 clrf rC
retlw 0x64
p3 movwf rC ; W=3
retlw 0xE8
p4 movlw 0x27
movwf rC
nop
retlw 0x10
regards,
Reggie
Adriano De Minicis wrote:
{Quote hidden}> ; POW10
> ; Given W, where 0<=W<=4
> ; outputs rC,W = 10^W
> ; destroys rC,W,Flags
> ; 7 cycles (9 including call), 17 words, untested
> ;
> ; NOTE:
> ; Result is in rC,W. To have result in rC,rD just add a "movwf rD"
> ; in the main code after the call.
> ; call POW10
> ; movwf rD
> ; (Total 10 cycles, included call and movwf rD)
>
> pow10 addwf PCL,f
> goto p0
> goto p1
> goto p2
> goto p3
> p4 movlw 0x27
> movwf rC
> nop
> retlw 0x10
> p0 clrf rC
> retlw 0x01
> p1 clrf rC
> retlw 0x0A
> p2 clrf rC
> retlw 0x64
> p3 movwf rC ; W=3
> retlw 0xE8
1998\12\14@202639
by
Regulus Berdin
Hi,
My previous post was wrong. Should have been:
pow10 movwf rC
addwf rC,w
addwf PCL,f
p0 clrf rC
retlw 0x01
p1 clrf rC
retlw 0x0A
p2 clrf rC
retlw 0x64
p3 nop ;rC=3
retlw 0xE8
p4 movlw 0x27
movwf rC
nop
retlw 0x10
regards,
Reggie
1998\12\14@204022
by
Dmitry Kiryashov
Hello Regulus.
Look very compact & nicely ;-)
p4 branch is still not squeezing more...
10.2 cycles at average per execution.
WBR Dmitry.
> This can be done also by (2 code space less):
>
> pow10 movwf rD
> addwf rD,w ;rD*2
Is it ok to use port as temporaty cell ?
Probably some memory temp cell will be better.
{Quote hidden}> addwf PCL,f
> p0 clrf rC
> retlw 0x01
> p1 clrf rC
> retlw 0x0A
> p2 clrf rC
> retlw 0x64
> p3 movwf rC ; W=3
> retlw 0xE8
> p4 movlw 0x27
> movwf rC
;;;; nop
> retlw 0x10
1998\12\14@222910
by
Scott Dattalo
On Tue, 15 Dec 1998, Regulus Berdin wrote:
{Quote hidden}> Hi Adriano,
>
> This can be done also by (2 code space less):
>
>pow10 movwf rD
> addwf rD,w ;rD*2
> addwf PCL,f
> p0 clrf rC
> retlw 0x01
> p1 clrf rC
> retlw 0x0A
> p2 clrf rC
> retlw 0x64
> p3 movwf rC ; W=3
> retlw 0xE8
> p4 movlw 0x27
> movwf rC
> nop
> retlw 0x10
I saw that too, BUT isochronicity (is that a word?) is lost. Also,
there's a small error with case p3 (W is 6 not 3 when it is stored into
rC).. But if you're willing to sacrifice isochronicity the you might as
well get rid of the nop:
pow10 movwf rC
addwf rC,w ;rC*2
addwf PCL,f
p0 clrf rC
retlw 0x01
p1 clrf rC
retlw 0x0A
p2 clrf rC
retlw 0x64
p3 nop ;rC is 3
retlw 0xE8
p4 movlw 0x27
movwf rC
retlw 0x10
call pow10
movwf rD
But it takes 11 cycles for the 4th power (as opposed to 10 for Adriano's
solution)
More... (looser matching)
- Last day of these posts
- In 1998
, 1999 only
- Today
- New search...