Unsigned Integer Division Routines

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

User avatar
Omegamatrix
Posts: 35
Joined: Tue Jun 10, 2014 8:15 pm
Location: Canada

Unsigned Integer Division Routines

Post by Omegamatrix »

I have written a number of division routines in 6502 assembly, and I'm posting them here for other people to use. :)

These routines start with any value (0-255) in the accumulator and finish with the integer division result in the accumulator. They are all constant cycles and do not use X or Y. Most do require 1 temp register. I've listed all divisions from 2 to 32 below, including the trivial divide by powers of 2 cases (for sake of completion).

Code: Select all

; Unsigned Integer Division Routines
; by Omegamatrix


;Divide by 2
;1 byte, 2 cycles
  lsr


;Divide by 3
;18 bytes, 30 cycles
  sta  temp
  lsr
  adc  #21
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr


;Divide by 4
;2 bytes, 4 cycles
  lsr
  lsr


;Divide by 5
;18 bytes, 30 cycles
  sta  temp
  lsr
  adc  #13
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr


;Divide by 6
;17 bytes, 30 cycles
  lsr
  sta  temp
  lsr
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr


;Divide by 7 (From December '84 Apple Assembly Line)
;15 bytes, 27 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr


;Divide by 8
;3 bytes, 6 cycles
  lsr
  lsr
  lsr


;Divide by 9
;17 bytes, 30 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 10
;17 bytes, 30 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr


;Divide by 11
;20 bytes, 35 cycles
  sta  temp
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 12
;17 bytes, 30 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr


; Divide by 13
; 21 bytes, 37 cycles
  sta  temp
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  clc
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 14
;1/14 = 1/7 * 1/2
;16 bytes, 29 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 15
;14 bytes, 24 cycles
  sta  temp
  lsr
  adc  #4
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 16
;4 bytes, 8 cycles
  lsr
  lsr
  lsr
  lsr


;Divide by 17
;18 bytes, 30 cycles
  sta  temp
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  adc  #0
  lsr
  lsr
  lsr
  lsr


;Divide by 18 = 1/9 * 1/2
;18 bytes, 32 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 19
;17 bytes, 30 cycles
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 20
;18 bytes, 32 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr


;Divide by 21
;20 bytes, 36 cycles
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 22
;21 bytes, 34 cycles
   lsr
   cmp  #33
   adc  #0
   sta  temp
   lsr
   adc  temp
   ror
   adc  temp
   ror
   lsr
   adc  temp
   ror
   lsr
   lsr
   lsr


;Divide by 23
;19 bytes, 34 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 24
;15 bytes, 27 cycles
   lsr
   lsr
   lsr
   sta   temp
   lsr
   lsr
   adc   temp
   ror
   lsr
   adc   temp
   ror
   lsr


;Divide by 25
;16 bytes, 29 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 26
;21 bytes, 37 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr


;Divide by 27
;15 bytes, 27 cycles
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 28
;14 bytes, 24 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  #2
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr


;Divide by 29
;20 bytes, 36 cycles
  sta  temp
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 30
;14 bytes, 26 cycles
  sta  temp
  lsr
  lsr
  lsr
  lsr
  sec
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 31
;14 bytes, 26 cycles
  sta  temp
  lsr
  lsr
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr


;Divide by 32
  lsr
  lsr
  lsr
  lsr
  lsr

Last edited by Omegamatrix on Sun Jun 15, 2014 3:50 pm, edited 4 times in total.
User avatar
rainwarrior
Posts: 8732
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Unsigned Integer Division Routines

Post by rainwarrior »

Cool!
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: Unsigned Integer Division Routines

Post by Kasumi »

Didn't want to continue off topic in the other thread, but cool! The latest divide by 5 was in the atariage thread, but I do wish I had seen more of the others. Might as well say what I'm using them for. My game detects 49 slope heights. (flat, and 24 facing downward left, 24 facing downward right) I did the trig for the angles of those heights to find what percentage of the speed should be kept. Then I used whatever divide routine was the closest to what it would really be out of the ones I could find.

1/2 (0.5)
1/4(0.25)
1/8(0.125)
etc are already accounted for obviously.
Then
3/4(.75)
7/8(0.875)
Because you can do the divide, then subtract it from the original. (In fact I do this for 1/2, because I wanted a speed of 1 to not be scaled to zero.) You can do 2/5 with divide by 5 asl etc.I could probably get a little closer with the other divides, but the biggest difference is like 0.04 and I've already gotten used to the feel of the game as is.

I'm also using the divide by 5 for a bouncing ball that decays in strength. I looked up the coefficient of restitution for a basketball bouncing on concrete and it was .88ish. So I tried original-original/10, but that felt wrong to interact with in game, so now it's original-original/5.
bogax
Posts: 34
Joined: Wed Jul 30, 2008 12:03 am

Re: Unsigned Integer Division Routines

Post by bogax »

Omegamatrix wrote:I have written a number of division routines in 6502 assembly, and I'm posting them here for other people to use. :)

These routines start with any value (0-255) in the accumulator and finish with the integer division result in the accumulator. They are all constant cycles and do not use X or Y. Most do require 1 temp register. I've listed all divisions from 2 to 32 below, including the trivial divide by powers of 2 cases (for sake of completion).
I always thought it was a pity that you didn't keep
(or maybe even look for) routines of less accuracy.
eg if you know your dividend will never be >40
you might get by with less code.
zzo38
Posts: 1096
Joined: Mon Feb 07, 2011 12:46 pm

Re: Unsigned Integer Division Routines

Post by zzo38 »

Might you add some for calculating modulo as well? As well as 16-bit routines?
(Free Hero Mesh - FOSS puzzle game engine)
User avatar
Omegamatrix
Posts: 35
Joined: Tue Jun 10, 2014 8:15 pm
Location: Canada

Re: Unsigned Integer Division Routines

Post by Omegamatrix »

bogax wrote:
Omegamatrix wrote:I have written a number of division routines in 6502 assembly, and I'm posting them here for other people to use. :)

These routines start with any value (0-255) in the accumulator and finish with the integer division result in the accumulator. They are all constant cycles and do not use X or Y. Most do require 1 temp register. I've listed all divisions from 2 to 32 below, including the trivial divide by powers of 2 cases (for sake of completion).
I always thought it was a pity that you didn't keep
(or maybe even look for) routines of less accuracy.
eg if you know your dividend will never be >40
you might get by with less code.
Funny you should mention that, as I was just staring at the routines I posted and it occurred to me that in some cases I might be able to go quicker by doing a division up front to reduce the sum, and follow with a cruder approximation second. As simple as that sound it never occurred to me when I wrote them.


So... I was immediately able to make five of the routines better. :D I am going to update the first post but in summary here are the updated routines:

Old:

Code: Select all

;Divide by 6
;1/6 = 1/3 * 1/2
;19 bytes, 32 cycles
  sta  temp
  lsr
  adc  #21
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr

;Divide by 10
;1/10 = 1/5 * 1/2
;19 bytes, 32 cycles
  sta  temp
  lsr
  adc  #13
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr

;Divide by 20
;1/20 = 1/5 * 1/4
;20 bytes, 34 cycles
  sta  temp
  lsr
  adc  #13
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr

;Divide by 24
;1/24 = 1/3 * 1/8
;21 bytes, 36 cycles
  sta  temp
  lsr
  adc  #21
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr

;Divide by 26
;1/26 = 1/13 * 1/2
;22 bytes, 39 cycles
  sta  temp
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  clc
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr

And new:

Code: Select all

;Divide by 6
;17 bytes, 30 cycles
  lsr
  sta  temp
  lsr
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr

;Divide by 10
;17 bytes, 30 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr

;Divide by 20
;18 bytes, 32 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr

;Divide by 24
;15 bytes, 27 cycles
   lsr
   lsr
   lsr
   sta   temp
   lsr
   lsr
   adc   temp
   ror
   lsr
   adc   temp
   ror
   lsr

;Divide by 26
;21 bytes, 37 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  adc  temp
  ror
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr

I am very happy that a new solution for divide by 10 has been found. :)


Edit - Divide by 12 has also been superseded!
Double Edit - Divide by 28 too!

old:

Code: Select all

;Divide by 12
;1/12 = 1/3 * 1/4
;20 bytes, 34 cycles
  sta  temp
  lsr
  adc  #21
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr

;Divide by 28
;1/28 = 1/7 * 1/4
;17 bytes, 31 cycles
  sta  temp
  lsr
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr
new:

Code: Select all

;Divide by 12
;17 bytes, 30 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr

;Divide by 28
;14 bytes, 24 cycles
  lsr
  lsr
  sta  temp
  lsr
  adc  #2
  lsr
  lsr
  adc  temp
  ror
  lsr
  lsr
Last edited by Omegamatrix on Sat Jun 14, 2014 2:40 pm, edited 2 times in total.
User avatar
Omegamatrix
Posts: 35
Joined: Tue Jun 10, 2014 8:15 pm
Location: Canada

Re: Unsigned Integer Division Routines

Post by Omegamatrix »

Kasumi wrote:Didn't want to continue off topic in the other thread, but cool! The latest divide by 5 was in the atariage thread, but I do wish I had seen more of the others. Might as well say what I'm using them for. My game detects 49 slope heights. (flat, and 24 facing downward left, 24 facing downward right) I did the trig for the angles of those heights to find what percentage of the speed should be kept. Then I used whatever divide routine was the closest to what it would really be out of the ones I could find.

1/2 (0.5)
1/4(0.25)
1/8(0.125)
etc are already accounted for obviously.
Then
3/4(.75)
7/8(0.875)
Because you can do the divide, then subtract it from the original. (In fact I do this for 1/2, because I wanted a speed of 1 to not be scaled to zero.) You can do 2/5 with divide by 5 asl etc.I could probably get a little closer with the other divides, but the biggest difference is like 0.04 and I've already gotten used to the feel of the game as is.

I'm also using the divide by 5 for a bouncing ball that decays in strength. I looked up the coefficient of restitution for a basketball bouncing on concrete and it was .88ish. So I tried original-original/10, but that felt wrong to interact with in game, so now it's original-original/5.
Kasumi, as much as I do love writing these routines, it makes me feel even more glad to hear from people making use of them! Thank you for sharing and one day hopefully we will all be able to play your game!!


Jeff
User avatar
Omegamatrix
Posts: 35
Joined: Tue Jun 10, 2014 8:15 pm
Location: Canada

Re: Unsigned Integer Division Routines

Post by Omegamatrix »

zzo38 wrote:Might you add some for calculating modulo as well? As well as 16-bit routines?
With modulo, most of the time it should be easy enough to run the division routine, times the result, and subtract from the original sum. Multiplication is generally pretty easy... but sometimes can gobble up a lot of cycles.


I have written a modulo six routine before. I know I posted one on Atariage a long time ago, but IIRC it wasn't correct and I made one a few months ago that was. I will look for it. Modulo six is useful for games simulating dice rolls. I can't remember how good or bad the routine was that I came up with.


For 16-bit routines, I have written one for 10 bit division. It is in my blog here:

http://atariage.com/forums/blog/563/ent ... ide-by-10/

After finding that new divide by 10 today I know that the 16 bit routine can also be improved. In general 16 bit division is neither fast in terms of cycles, and cheap in terms of bytes.


Edit - found my Modulo 6 routine. It looks like I'm just integer dividing by 6, times by 6, and subtracting from the original sum:

Code: Select all

;Mod 6
;28 bytes, 43 cycles
  sta  temp
  lsr
  adc  #21
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  lsr
  adc  temp
  ror
  and  #$FC
  sta  temp2
  lsr
  adc  temp2
  sbc  temp
  eor  #$FF
chitselb
Posts: 6
Joined: Sat Jun 14, 2014 2:14 pm

Re: Unsigned Integer Division Routines

Post by chitselb »

These are great! I needed a divide by 40 and modulo 40 for a http://en.wikipedia.org/wiki/DEC_Radix-50 decoder in a game I'm writing for the Commodore PET 2001( http://pettil.tumblr.com ) and the divide by 10 here looks like a good start. Thank you!
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: Unsigned Integer Division Routines

Post by psycopathicteen »

How does this work?
User avatar
Omegamatrix
Posts: 35
Joined: Tue Jun 10, 2014 8:15 pm
Location: Canada

Re: Unsigned Integer Division Routines

Post by Omegamatrix »

chitselb wrote:These are great! I needed a divide by 40 and modulo 40 for a http://en.wikipedia.org/wiki/DEC_Radix-50 decoder in a game I'm writing for the Commodore PET 2001( http://pettil.tumblr.com ) and the divide by 10 here looks like a good start. Thank you!
The good thing here is that the 6502 was used by so many systems. :)

As you know, once you have divided by 10 then you are just two shifts away from a divide by 40.

Code: Select all

;Divide by 40
;19 bytes, 34 cycles
  lsr
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror  
  adc  temp
  ror
  lsr
  lsr
  lsr
  lsr
But here is an alternate code. It costs 1 more byte, but saves two cycles... so use whichever one bests suites you.

Code: Select all

;Divide by 40
;20 bytes, 32 cycles
  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  #1
  adc  temp
  and  #$C0
  rol
  rol
  rol
And a modulo routine:

Code: Select all

;Mod 40
  sta  temp
  lsr
  adc  #13
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  temp
  ror
  and  #$E0
  sta  temp2  ;x32
  lsr
  lsr         ;x8
  adc  temp2  ;x40
  sbc  temp
  eor  #$FF
Or both:

Code: Select all

  sta  temp
  lsr
  adc  temp
  ror
  lsr
  lsr
  adc  temp
  ror
  adc  #1
  adc  temp
  and  #$C0
  rol
  rol
  rol

  sta  divideResult   ; divide 40 result... TAY, TAX, PHA could be used

  asl
  asl
  asl
  sta  temp2  ;x8
  asl
  asl         ;x32
  adc  temp2  ;x40
  sbc  temp
  eor  #$FF
                      ; mod 40 result
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Unsigned Integer Division Routines

Post by tepples »

If you're dividing by 10 to make a binary to decimal converter, there are ways to do that other than div/mod.
User avatar
Omegamatrix
Posts: 35
Joined: Tue Jun 10, 2014 8:15 pm
Location: Canada

Re: Unsigned Integer Division Routines

Post by Omegamatrix »

psycopathicteen wrote:How does this work?
You are dividing and adding that result to your original sum, and then repeating the process. The key is find the correct and shortest sequence of divisions you have to do.

I had a simple idea for finding the divisions. I used a brute attack approach running every possible combination in an excel sheet. I only tested for 1/x and ignored truncation... I just looked for division combinations that were the most accurate under perfect circumstances. Those were the routines I focused on testing

From there I built a bunch of tools. I made another excel sheet to test division for all input values (0-255), some C programs to do the same thing as both excel sheets, a rom for the 2600 to test for verification and help solve correction factors for some routines. Whenever you see ADC #xx in one of the routines you are looking at a correction factor I had to introduce. Sometimes I found I could also use just CLC or SEC at certain points in the routine to do the correction.


The usage of the routines is easy. The accumulator starts with the value to be divided, and ends with the integer division result.
User avatar
Omegamatrix
Posts: 35
Joined: Tue Jun 10, 2014 8:15 pm
Location: Canada

Re: Unsigned Integer Division Routines

Post by Omegamatrix »

Final thoughts on divide by 40 before I go to bed. This is such a big number that you are approaching the limit where it is simply better to do a looped divide... if you don't mind the extra cycles and spending X or Y.


I had another idea, this time using both X and Y. This routine takes 3 more bytes then the combined Div 40 and Mod 40 routine, but it is faster (45 cycles vs 56). It does a comparison and add to get the integer divide. After that I times the result by 4, as I'm reusing the bytes in the comparison as a look up table. The values are already there and nicely spread apart by 4 bytes so why not? I had to add a dummy load at the beginning to get first look up value, that's why the extra LDA #0 is in there.

Code: Select all

;Divide by and Mod 40 combined
;38 bytes, 45 cycles
;Y = value to be divided

InterlacedMultiplyByFortyTable:
  lda  #0        ; dummy load, #0 used in LUT
  lda  #0
  cpy  #40
  adc  #0
  cpy  #80
  adc  #0
  cpy  #120
  adc  #0
  cpy  #160
  adc  #0
  cpy  #200
  adc  #0
  cpy  #240
  adc  #0
  
  sta  divideResult      ; Integer divide 40
  
  asl
  asl
  tax
  tya
  sec
  sbc  InterlacedMultiplyByFortyTable+1,X
                         ; A = Mod 40
Right away I look at this routine and think of putting in some branching to save bytes, but that would destroy the look-up table that's built in. It's a neat if not convoluted routine on its own... I'm just not to sure if it is really useful, but I like to be creative. :lol:
User avatar
Bregalad
Posts: 8056
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Re: Unsigned Integer Division Routines

Post by Bregalad »

The most useful in the case of the NES is probably divide-by-15, as a nametable can show 15 2x2 metatiles vertically, so you'll need to divide by 15 before knowing where to write your metatile when scrolling vertically. (yes I know there are workaround this, but still this is one of the ways to do this).

As for divide by power of two, you don't have to remind us they're shifts, I think everyone already knowns.
Post Reply