nesdev.com
http://forums.nesdev.com/

A fast 12 bit divide by 240 (~30 cycles)
http://forums.nesdev.com/viewtopic.php?f=2&t=16911
Page 1 of 1

Author:  slembcke [ Sun Jan 07, 2018 2:47 am ]
Post subject:  A fast 12 bit divide by 240 (~30 cycles)

I'm having way more fun doing 6502 asm again rather than actually making a game. :p

This seemed useful enough to share, a fast divide by 240 with modulo. Allows you to store scroll offsets in > 8 bits, and quickly get the scroll value and starting page. 12 bits gives you lots of room to work with.

Code:
; Divides sreg (up to 12 bits) by 240.
; Returns the quotient in x, remainder in a.
.proc div240_quick
   ; Start with q = n/256
   ldx sreg + 1
   
   ; r += 16*q
   txa
   asl
   asl
   asl
   asl
   ; Overflows for > 12 bits!
   ; Assume carry cleared by asl.
   
   adc sreg
   ; Check for overflow, and adjust q and r again.
   bcc :+
      ; q += 1
      inx
      ; r += 16
      adc #15 ; +1 for carry flag
   :
   
   ; Check if r > 240 and adjust once more.
   cmp #240
   bcc :+
      inx
      sbc #240
   :
   
   rts
.endproc

Author:  tepples [ Sun Jan 07, 2018 7:29 am ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

Which classic era games use a map taller than 3840 pixels (16 screens), for which the 12-bit limit would cause a problem? I know the outdoor map in Jurassic Park for Super NES is about that tall. Metroid for NES is about twice that tall, but it's broken up into general areas that are in turn broken up into smaller sections 1 screen wide or tall. Each map in Blaster Master is 2048 by 2048 pixels tall.

The alternative technique, which I first learned about from tokumaru, is to have the scrolling engine store both the camera's Y position in world space and its Y tile position (0-29) in VRAM space.

Author:  tokumaru [ Sun Jan 07, 2018 7:43 am ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

I do indeed like to keep two versions of the Y scroll, one relative to the level map and the other relative to the name tables, wrapping around at 256 and 240, respectively.

I'm not a fan of redundant variables though, so the idea of being able to quickly convert coordinates from level space to NT space is very attractive! When I finally finish my platformer though, I expect to have a few really tall levels, so the 12 bit thing might be a problem.

Author:  slembcke [ Sun Jan 07, 2018 11:15 am ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

Yeah, storing the scroll twice did occur to me as (probably) being more efficient, but as you say, not very DRY.

After sleeping on it, I think it's possible to get a full 16 bit divide with a little more work. Since the overflow happens on a multiple of 16, I think you can just add the upper nibble to the lower one before starting... Testing now.

Author:  tokumaru [ Sun Jan 07, 2018 12:48 pm ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

Another problem I have with the limited number of bits is that I expect to have vertically wrapping levels, like is the case of some Sonic levels. In these cases, the vertical camera coordinate doesn't wrap when the level map ends, what happens is that the game starts reading/drawing mirrors of the level map, while the full coordinates are used for objects and such. The vertical coordinate only wraps after 65535, regardless of the height of the level map.

Author:  psycopathicteen [ Sun Jan 07, 2018 1:28 pm ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

This might be a little excessive, but you can have a 3rd byte for y camera coordinates that only goes from 0-14.

Author:  FrankenGraphics [ Sun Jan 07, 2018 2:04 pm ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

Quote:
Metroid for NES is about twice that tall, but it's broken up into general areas that are in turn broken up into smaller sections 1 screen wide or tall.


It's not really broken up as far as the data structure is concerned. Metroid has one unified big map where all areas are laid out, though not every "slot" in that map is used. It works by:

1) Having a room ID for each room. Each screen slot points to a room ID. The same room can be and is extensively reused multiple times. Saves a lot of space for its travel length.
2) Area switch objects - if you pass it (use the elevator), another set of room data and tiles will be used to "dress" the following rooms on the world map. This means multiple rooms with different layout across the different areas share the same room ID. While metroid doesn't expoit this, this half-intentional feature effectively means you can create position-overlapping rooms and criss-crossing corridors in layers even though the world map is a flat ractangle. Btw, you can also cross the boundary to get to the other side of the map if you wanted an effectively nonrectangular shape on your world map.
3) Palette switch objects - if you pass it the palette will switch for the next room - this puts on a facade of variation and helps the player to orient despite the monotonous levels.

As far as the game is concerned, you can fill the whole world map with one long tunnel with an endless seam; vertical or horizontal.


I think the similar engine used in kid icarus has some elevations up to 13 screens or so. between 13 and 20-21. I can't think of any other.
Megaman 2: crashman - 14 screens high, i think.

Author:  rainwarrior [ Sun Jan 07, 2018 2:57 pm ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

FrankenGraphics wrote:
Megaman 2: crashman - 14 screens high, i think.

Mega Man doesn't have vertical scrolling, though, so there's no reason it would have the "240 problem".

Author:  FrankenGraphics [ Sun Jan 07, 2018 3:01 pm ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

Haha, dang it! Well, at least kid icarus would break this limit.

Author:  psycopathicteen [ Sun Jan 07, 2018 3:10 pm ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

This is how it would work with 16-bit coordinates:

Code:
lda {bgy_hi}
ldy #$00
cmp #$f0
bcc +
sbc #$f0
iny
+;
sta {temp}
lsr #4
clc
adc {temp}
tax

asl #4
clc
adc {bgy_lo}
bcc +
inx
adc #$0f
+;
cmp #$f0
bcc +
inx
sbc #$f0
+;


Author:  slembcke [ Sun Jan 07, 2018 7:20 pm ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

psycopathicteen wrote:
This is how it would work with 16-bit coordinates:

Not sure how that is supposed to work. It just subtracts 240 from the upper byte of the dividend?

This ruby snippet works for all valid quotient/remainders, but needs a 16 bit shift and add.
Code:
def div240_16(n)
   q = n >> 8
   r = n & 0xFF
   
   r += q << 4 # 16 bit shit and add
   r_hi = r >> 8
   r_lo = r & 0xFF
   
   q += r_hi
   r = r_lo + (r_hi << 4)
   
   if r >= 240 then
      q += 1
      r -= 240
   end
   
   return q, r
end


My first attempt to rewrite that in asm failed, but I think it's mostly right, and ran in ~80 cycles. Will have to debug later though. Time for house chores and relaxing!

Author:  slembcke [ Wed Jan 17, 2018 3:23 pm ]
Post subject:  Re: A fast 12 bit divide by 240 (~30 cycles)

For posterity, I did get a full ~80 cycle divide working here:
viewtopic.php?f=2&t=16933

Not really sure if it's worth it at 80 cycles except maybe when used from C code that's slow anyway.

Page 1 of 1 All times are UTC - 7 hours
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/