Calculating x^2.
Archive:
Summary:
; assumes W contains a 4 bit number, and squares it.
; by John Payson
Sqr4:
addwf PC
db 0,1,4,9,16,25,36,49,64,81,100,121,144,169,196,225
(a+b)^2 = a^2 + b^2 + 2ab
Thus, any multiplication could be performed as three squaring operations, a subtraction, and a shift."
a*b = t(a+b) - ( t(a) + t(b) )
; calculate t(W), for 0 <= W <= 22
; by David Cary
get_t_w:
PAGESEL $+2
addwf PCL,f
db 0,1,3,6, H'0A', H'0F', H'15', H'1C', H'24', H'2D'
db H'37', H'42', H'4E', H'5B', H'69', H'78'
db H'88', H'99', H'AB', H'BE', H'D2', H'E7', H'FD'
Stuart Taylor suggested starting with a standard 16x16 bit multiply and trimming it down to a 10x10 bit multiply.
Keith Dowsett suggested "break it down into three simpler multiplications:
(a+b)^2 = a^2 + b^2 + 2ab
where b is the bottom 8 bits of the number, a is the top 2 bits, and use a conventional 8x8 bit multiply for b^2, and (for speed) use a customized 2x8 bit multiply for 2ab, and a 4 entry lookup table for a^2.
N = a*32 + b
a = 5 msb's
b = 5 lsb's
Now using Keith's observation:
N^2 = 32^2 * a^2 + 64*a*b + b^2
= (a^2)<<10 + (a*b)<<5 + b^2
which can be evaluated with one general-purpose 5x5 bit multiply and a lookup table for squaring 5 bit numbers.
Then Scott wonders if breaking the number into 3 nybbles would be even faster:
N = a*256 + b*16 + c
N^2 = (a*256)^2 + (b*16)^2 + c^2 +
2*(a*256*b*16 + a*256*c + b*16*c)
= (a^2)<<16 + (b^2)<<8 + c^2 +
((a*b)<<12 + (a*c)<<8 + b*c<<4))<<1
(See 4x4 bit multiply )
Andrew Warren speculates that a standard 10-bit x 10-bit multiplication will turn out to be faster than any of the complicated schemes above.
A 10x10 multiply IS faster than 16x16, {for calculating a square} mostly because the result is guaranteed to fit in three bytes rather than 4. If you don't have the code space to unroll the whole multiplication into inline code (as I showed in my "Solution #1"), looped code can still be made pretty fast by writing three loops:One for the first few bits of the multiplier, where the result of each addition of the multiplicand to the intermediate result is guaranteed to fit in the least-significant two bytes,
one for the next group of bits, where the addition may affect all three bytes of the result, and
one for the final few bits, where the addition is guaranteed to only affect the two most-significant bytes of the result.
John Payson implemented Andy David's tips, in a completely unrolled loop (sacrificing code space for speed) See the debugged version
; [warning: untested]
; by John Payson 1996-11-05
clrf DstH
clrf DstM
clrf DstL
movf SrcL,w
andlw $0F
btfss Src,4
addwf DstM
rrf DstM
rrf DstL
btfss Src,5
addwf DstM
rrf DstM
rrf DstL
btfss Src,6
addwf DstM
rrf DstM
rrf DstL
btfss Src,7
addwf DstM
call Sqr4
addwf DstL
swapf SrcL
andlw $0F
call Sqr4
addwf DstM
; At this point, 16-bit result is in DstM:DstH
; 25 words of code prior to this point (plus a
; 17-word table-lookup). Total execution time:
; 35 cycles up to this point.
btfss SrcH,0
goto NoBit8
movf SrcL,w
btfsc C
incf DstH
addwf DstM
btfsc C
incf DstH
incf DstH
; Another 9 words for bit 8; 3 or 9 cycles to exec.
NoBit8:
btfss SrcH,1
goto NoBit9
movlw 4
btfss SrcH,0
movlw 8
addwf DstH
rlf SrcL,w
btfsc C
incf DstH
btfsc C
incf DstH
addwf DstM
btfsc C
incf DstH
addwf DstM
btfsc C
incf DstH
; Another 17 words for bit 9; 3 or 17 cycles to execute
; Total worst-case time: 35+26 = 61 cycles.
NoBit9:
retlw 0 ; All done!
Sqr4:
addwf PC
db 0,1,4,9,16,25,36,49,64,81,100,121,144,169,196,225
Questions:
| file: /Techref/microchip/math/sq/index.htm, 7KB, , updated: 2010/4/14 23:14, local time: 2025/10/24 09:52,
216.73.216.53,10-2-207-162:LOG IN
|
| ©2025 These pages are served without commercial sponsorship. (No popup ads, etc...).Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE. Questions? <A HREF="http://techref.massmind.org/techref/microchip/math/sq/index.htm"> PIC Microcontoller Basic Math Square Methods</A> |
| Did you find what you needed? |
Welcome to massmind.org! |
Welcome to techref.massmind.org! |
.