It is currently Wed Dec 13, 2017 2:16 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 31 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Sat Jun 14, 2014 9:50 am 
Offline
User avatar

Joined: Tue Jun 10, 2014 8:15 pm
Posts: 35
Location: Canada
I have written a number of division routines in 6502 assembly, and I'm posting them here for other people to use. :)

These routines start with any value (0-255) in the accumulator and finish with the integer division result in the accumulator. They are all constant cycles and do not use X or Y. Most do require 1 temp register. I've listed all divisions from 2 to 32 below, including the trivial divide by powers of 2 cases (for sake of completion).

Code:
; Unsigned Integer Division Routines
; by Omegamatrix


;Divide by 2
;1 byte, 2 cycles
  lsr


;Divide by 3
;18 bytes, 30 cycles
  sta  temp
  lsr
  adc  #21
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr


;Divide by 4
;2 bytes, 4 cycles
  lsr
  lsr


;Divide by 5
;18 bytes, 30 cycles
  sta  temp
  lsr
  adc  #13
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr


;Divide by 6
;17 bytes, 30 cycles
  lsr
  sta  temp
  lsr
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr


;Divide by 7 (From December '84 Apple Assembly Line)
;15 bytes, 27 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr


;Divide by 8
;3 bytes, 6 cycles
  lsr
  lsr
  lsr


;Divide by 9
;17 bytes, 30 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 10
;17 bytes, 30 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr


;Divide by 11
;20 bytes, 35 cycles
  sta  temp
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 12
;17 bytes, 30 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr


; Divide by 13
; 21 bytes, 37 cycles
  sta  temp
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  clc
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 14
;1/14 = 1/7 * 1/2
;16 bytes, 29 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 15
;14 bytes, 24 cycles
  sta  temp
  lsr
  adc  #4
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 16
;4 bytes, 8 cycles
  lsr
  lsr
  lsr
  lsr


;Divide by 17
;18 bytes, 30 cycles
  sta  temp
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  adc  #0
  lsr
  lsr
  lsr
  lsr


;Divide by 18 = 1/9 * 1/2
;18 bytes, 32 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 19
;17 bytes, 30 cycles
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 20
;18 bytes, 32 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr


;Divide by 21
;20 bytes, 36 cycles
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 22
;21 bytes, 34 cycles
   lsr
   cmp  #33
   adc  #0
   sta  temp
   lsr
   adc  temp
   ror
   adc  temp
   ror
   lsr
   adc  temp
   ror
   lsr
   lsr
   lsr


;Divide by 23
;19 bytes, 34 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 24
;15 bytes, 27 cycles
   lsr
   lsr
   lsr
   sta   temp
   lsr
   lsr
   adc   temp
   ror
   lsr
   adc   temp
   ror
   lsr


;Divide by 25
;16 bytes, 29 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 26
;21 bytes, 37 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 27
;15 bytes, 27 cycles
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 28
;14 bytes, 24 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  #2
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr


;Divide by 29
;20 bytes, 36 cycles
  sta  temp
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 30
;14 bytes, 26 cycles
  sta  temp
  lsr
  lsr
  lsr
  lsr
  sec
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 31
;14 bytes, 26 cycles
  sta  temp
  lsr
  lsr
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 32
  lsr
  lsr
  lsr
  lsr
  lsr



Last edited by Omegamatrix on Sun Jun 15, 2014 3:50 pm, edited 4 times in total.

Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 10:17 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5893
Location: Canada
Cool!


Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 11:44 am 
Offline
User avatar

Joined: Wed Apr 02, 2008 2:09 pm
Posts: 1046
Didn't want to continue off topic in the other thread, but cool! The latest divide by 5 was in the atariage thread, but I do wish I had seen more of the others. Might as well say what I'm using them for. My game detects 49 slope heights. (flat, and 24 facing downward left, 24 facing downward right) I did the trig for the angles of those heights to find what percentage of the speed should be kept. Then I used whatever divide routine was the closest to what it would really be out of the ones I could find.

1/2 (0.5)
1/4(0.25)
1/8(0.125)
etc are already accounted for obviously.
Then
3/4(.75)
7/8(0.875)
Because you can do the divide, then subtract it from the original. (In fact I do this for 1/2, because I wanted a speed of 1 to not be scaled to zero.) You can do 2/5 with divide by 5 asl etc.I could probably get a little closer with the other divides, but the biggest difference is like 0.04 and I've already gotten used to the feel of the game as is.

I'm also using the divide by 5 for a bouncing ball that decays in strength. I looked up the coefficient of restitution for a basketball bouncing on concrete and it was .88ish. So I tried original-original/10, but that felt wrong to interact with in game, so now it's original-original/5.

_________________
https://kasumi.itch.io/indivisible


Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 12:15 pm 
Offline

Joined: Wed Jul 30, 2008 12:03 am
Posts: 32
Omegamatrix wrote:
I have written a number of division routines in 6502 assembly, and I'm posting them here for other people to use. :)

These routines start with any value (0-255) in the accumulator and finish with the integer division result in the accumulator. They are all constant cycles and do not use X or Y. Most do require 1 temp register. I've listed all divisions from 2 to 32 below, including the trivial divide by powers of 2 cases (for sake of completion).



I always thought it was a pity that you didn't keep
(or maybe even look for) routines of less accuracy.
eg if you know your dividend will never be >40
you might get by with less code.


Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 12:50 pm 
Offline
User avatar

Joined: Mon Feb 07, 2011 12:46 pm
Posts: 941
Might you add some for calculating modulo as well? As well as 16-bit routines?

_________________
.


Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 1:10 pm 
Offline
User avatar

Joined: Tue Jun 10, 2014 8:15 pm
Posts: 35
Location: Canada
bogax wrote:
Omegamatrix wrote:
I have written a number of division routines in 6502 assembly, and I'm posting them here for other people to use. :)

These routines start with any value (0-255) in the accumulator and finish with the integer division result in the accumulator. They are all constant cycles and do not use X or Y. Most do require 1 temp register. I've listed all divisions from 2 to 32 below, including the trivial divide by powers of 2 cases (for sake of completion).



I always thought it was a pity that you didn't keep
(or maybe even look for) routines of less accuracy.
eg if you know your dividend will never be >40
you might get by with less code.

Funny you should mention that, as I was just staring at the routines I posted and it occurred to me that in some cases I might be able to go quicker by doing a division up front to reduce the sum, and follow with a cruder approximation second. As simple as that sound it never occurred to me when I wrote them.


So... I was immediately able to make five of the routines better. :D I am going to update the first post but in summary here are the updated routines:

Old:
Code:
;Divide by 6
;1/6 = 1/3 * 1/2
;19 bytes, 32 cycles
  sta  temp
  lsr
  adc  #21
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr

;Divide by 10
;1/10 = 1/5 * 1/2
;19 bytes, 32 cycles
  sta  temp
  lsr
  adc  #13
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr

;Divide by 20
;1/20 = 1/5 * 1/4
;20 bytes, 34 cycles
  sta  temp
  lsr
  adc  #13
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr

;Divide by 24
;1/24 = 1/3 * 1/8
;21 bytes, 36 cycles
  sta  temp
  lsr
  adc  #21
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr

;Divide by 26
;1/26 = 1/13 * 1/2
;22 bytes, 39 cycles
  sta  temp
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  clc
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr



And new:
Code:
;Divide by 6
;17 bytes, 30 cycles
  lsr
  sta  temp
  lsr
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr

;Divide by 10
;17 bytes, 30 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr

;Divide by 20
;18 bytes, 32 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr

;Divide by 24
;15 bytes, 27 cycles
   lsr
   lsr
   lsr
   sta   temp
   lsr
   lsr
   adc   temp
   ror
   lsr
   adc   temp
   ror
   lsr

;Divide by 26
;21 bytes, 37 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr



I am very happy that a new solution for divide by 10 has been found. :)


Edit - Divide by 12 has also been superseded!
Double Edit - Divide by 28 too!

old:
Code:
;Divide by 12
;1/12 = 1/3 * 1/4
;20 bytes, 34 cycles
  sta  temp
  lsr
  adc  #21
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr

;Divide by 28
;1/28 = 1/7 * 1/4
;17 bytes, 31 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


new:
Code:
;Divide by 12
;17 bytes, 30 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr

;Divide by 28
;14 bytes, 24 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  #2
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr


Last edited by Omegamatrix on Sat Jun 14, 2014 2:40 pm, edited 2 times in total.

Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 1:15 pm 
Offline
User avatar

Joined: Tue Jun 10, 2014 8:15 pm
Posts: 35
Location: Canada
Kasumi wrote:
Didn't want to continue off topic in the other thread, but cool! The latest divide by 5 was in the atariage thread, but I do wish I had seen more of the others. Might as well say what I'm using them for. My game detects 49 slope heights. (flat, and 24 facing downward left, 24 facing downward right) I did the trig for the angles of those heights to find what percentage of the speed should be kept. Then I used whatever divide routine was the closest to what it would really be out of the ones I could find.

1/2 (0.5)
1/4(0.25)
1/8(0.125)
etc are already accounted for obviously.
Then
3/4(.75)
7/8(0.875)
Because you can do the divide, then subtract it from the original. (In fact I do this for 1/2, because I wanted a speed of 1 to not be scaled to zero.) You can do 2/5 with divide by 5 asl etc.I could probably get a little closer with the other divides, but the biggest difference is like 0.04 and I've already gotten used to the feel of the game as is.

I'm also using the divide by 5 for a bouncing ball that decays in strength. I looked up the coefficient of restitution for a basketball bouncing on concrete and it was .88ish. So I tried original-original/10, but that felt wrong to interact with in game, so now it's original-original/5.

Kasumi, as much as I do love writing these routines, it makes me feel even more glad to hear from people making use of them! Thank you for sharing and one day hopefully we will all be able to play your game!!


Jeff


Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 1:27 pm 
Offline
User avatar

Joined: Tue Jun 10, 2014 8:15 pm
Posts: 35
Location: Canada
zzo38 wrote:
Might you add some for calculating modulo as well? As well as 16-bit routines?

With modulo, most of the time it should be easy enough to run the division routine, times the result, and subtract from the original sum. Multiplication is generally pretty easy... but sometimes can gobble up a lot of cycles.


I have written a modulo six routine before. I know I posted one on Atariage a long time ago, but IIRC it wasn't correct and I made one a few months ago that was. I will look for it. Modulo six is useful for games simulating dice rolls. I can't remember how good or bad the routine was that I came up with.


For 16-bit routines, I have written one for 10 bit division. It is in my blog here:

http://atariage.com/forums/blog/563/ent ... ide-by-10/

After finding that new divide by 10 today I know that the 16 bit routine can also be improved. In general 16 bit division is neither fast in terms of cycles, and cheap in terms of bytes.


Edit - found my Modulo 6 routine. It looks like I'm just integer dividing by 6, times by 6, and subtracting from the original sum:
Code:
;Mod 6
;28 bytes, 43 cycles
  sta  temp
  lsr
  adc  #21
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  and  #$FC
  sta  temp2
  lsr
  adc  temp2
  sbc  temp
  eor  #$FF


Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 5:22 pm 
Offline

Joined: Sat Jun 14, 2014 2:14 pm
Posts: 6
These are great! I needed a divide by 40 and modulo 40 for a http://en.wikipedia.org/wiki/DEC_Radix-50 decoder in a game I'm writing for the Commodore PET 2001( http://pettil.tumblr.com ) and the divide by 10 here looks like a good start. Thank you!


Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 5:30 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2424
How does this work?


Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 9:41 pm 
Offline
User avatar

Joined: Tue Jun 10, 2014 8:15 pm
Posts: 35
Location: Canada
chitselb wrote:
These are great! I needed a divide by 40 and modulo 40 for a http://en.wikipedia.org/wiki/DEC_Radix-50 decoder in a game I'm writing for the Commodore PET 2001( http://pettil.tumblr.com ) and the divide by 10 here looks like a good start. Thank you!

The good thing here is that the 6502 was used by so many systems. :)

As you know, once you have divided by 10 then you are just two shifts away from a divide by 40.

Code:
;Divide by 40
;19 bytes, 34 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror 
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


But here is an alternate code. It costs 1 more byte, but saves two cycles... so use whichever one bests suites you.
Code:
;Divide by 40
;20 bytes, 32 cycles
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  #1
  adc  temp
  and  #$C0
  rol
  rol
  rol


And a modulo routine:
Code:
;Mod 40
  sta  temp
  lsr
  adc  #13
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  and  #$E0
  sta  temp2  ;x32
  lsr
  lsr         ;x8
  adc  temp2  ;x40
  sbc  temp
  eor  #$FF


Or both:
Code:
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  #1
  adc  temp
  and  #$C0
  rol
  rol
  rol

  sta  divideResult   ; divide 40 result... TAY, TAX, PHA could be used

  asl
  asl
  asl
  sta  temp2  ;x8
  asl
  asl         ;x32
  adc  temp2  ;x40
  sbc  temp
  eor  #$FF
                      ; mod 40 result


Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 10:01 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19335
Location: NE Indiana, USA (NTSC)
If you're dividing by 10 to make a binary to decimal converter, there are ways to do that other than div/mod.


Top
 Profile  
 
PostPosted: Sat Jun 14, 2014 10:09 pm 
Offline
User avatar

Joined: Tue Jun 10, 2014 8:15 pm
Posts: 35
Location: Canada
psycopathicteen wrote:
How does this work?

You are dividing and adding that result to your original sum, and then repeating the process. The key is find the correct and shortest sequence of divisions you have to do.

I had a simple idea for finding the divisions. I used a brute attack approach running every possible combination in an excel sheet. I only tested for 1/x and ignored truncation... I just looked for division combinations that were the most accurate under perfect circumstances. Those were the routines I focused on testing

From there I built a bunch of tools. I made another excel sheet to test division for all input values (0-255), some C programs to do the same thing as both excel sheets, a rom for the 2600 to test for verification and help solve correction factors for some routines. Whenever you see ADC #xx in one of the routines you are looking at a correction factor I had to introduce. Sometimes I found I could also use just CLC or SEC at certain points in the routine to do the correction.


The usage of the routines is easy. The accumulator starts with the value to be divided, and ends with the integer division result.


Top
 Profile  
 
PostPosted: Sun Jun 15, 2014 12:12 am 
Offline
User avatar

Joined: Tue Jun 10, 2014 8:15 pm
Posts: 35
Location: Canada
Final thoughts on divide by 40 before I go to bed. This is such a big number that you are approaching the limit where it is simply better to do a looped divide... if you don't mind the extra cycles and spending X or Y.


I had another idea, this time using both X and Y. This routine takes 3 more bytes then the combined Div 40 and Mod 40 routine, but it is faster (45 cycles vs 56). It does a comparison and add to get the integer divide. After that I times the result by 4, as I'm reusing the bytes in the comparison as a look up table. The values are already there and nicely spread apart by 4 bytes so why not? I had to add a dummy load at the beginning to get first look up value, that's why the extra LDA #0 is in there.


Code:
;Divide by and Mod 40 combined
;38 bytes, 45 cycles
;Y = value to be divided

InterlacedMultiplyByFortyTable:
  lda  #0        ; dummy load, #0 used in LUT
  lda  #0
  cpy  #40
  adc  #0
  cpy  #80
  adc  #0
  cpy  #120
  adc  #0
  cpy  #160
  adc  #0
  cpy  #200
  adc  #0
  cpy  #240
  adc  #0
 
  sta  divideResult      ; Integer divide 40
 
  asl
  asl
  tax
  tya
  sec
  sbc  InterlacedMultiplyByFortyTable+1,X
                         ; A = Mod 40


Right away I look at this routine and think of putting in some branching to save bytes, but that would destroy the look-up table that's built in. It's a neat if not convoluted routine on its own... I'm just not to sure if it is really useful, but I like to be creative. :lol:


Top
 Profile  
 
PostPosted: Sun Jun 15, 2014 2:48 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7314
Location: Chexbres, VD, Switzerland
The most useful in the case of the NES is probably divide-by-15, as a nametable can show 15 2x2 metatiles vertically, so you'll need to divide by 15 before knowing where to write your metatile when scrolling vertically. (yes I know there are workaround this, but still this is one of the ways to do this).

As for divide by power of two, you don't have to remind us they're shifts, I think everyone already knowns.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 31 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Drakim, lazygecko and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group