A fast 12 bit divide by 240 (~30 cycles)

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

Post Reply
User avatar
slembcke
Posts: 171
Joined: Fri Nov 24, 2017 2:40 pm
Location: Minnesota

A fast 12 bit divide by 240 (~30 cycles)

Post by slembcke » Sun Jan 07, 2018 2:47 am

I'm having way more fun doing 6502 asm again rather than actually making a game. :p

This seemed useful enough to share, a fast divide by 240 with modulo. Allows you to store scroll offsets in > 8 bits, and quickly get the scroll value and starting page. 12 bits gives you lots of room to work with.

Code: Select all

; Divides sreg (up to 12 bits) by 240.
; Returns the quotient in x, remainder in a.
.proc div240_quick
	; Start with q = n/256
	ldx sreg + 1
	
	; r += 16*q
	txa
	asl
	asl
	asl
	asl
	; Overflows for > 12 bits!
	; Assume carry cleared by asl.
	
	adc sreg
	; Check for overflow, and adjust q and r again.
	bcc :+
		; q += 1
		inx
		; r += 16
		adc #15 ; +1 for carry flag
	:
	
	; Check if r > 240 and adjust once more.
	cmp #240
	bcc :+
		inx
		sbc #240
	:
	
	rts
.endproc

tepples
Posts: 21985
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by tepples » Sun Jan 07, 2018 7:29 am

Which classic era games use a map taller than 3840 pixels (16 screens), for which the 12-bit limit would cause a problem? I know the outdoor map in Jurassic Park for Super NES is about that tall. Metroid for NES is about twice that tall, but it's broken up into general areas that are in turn broken up into smaller sections 1 screen wide or tall. Each map in Blaster Master is 2048 by 2048 pixels tall.

The alternative technique, which I first learned about from tokumaru, is to have the scrolling engine store both the camera's Y position in world space and its Y tile position (0-29) in VRAM space.

User avatar
tokumaru
Posts: 11712
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by tokumaru » Sun Jan 07, 2018 7:43 am

I do indeed like to keep two versions of the Y scroll, one relative to the level map and the other relative to the name tables, wrapping around at 256 and 240, respectively.

I'm not a fan of redundant variables though, so the idea of being able to quickly convert coordinates from level space to NT space is very attractive! When I finally finish my platformer though, I expect to have a few really tall levels, so the 12 bit thing might be a problem.

User avatar
slembcke
Posts: 171
Joined: Fri Nov 24, 2017 2:40 pm
Location: Minnesota

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by slembcke » Sun Jan 07, 2018 11:15 am

Yeah, storing the scroll twice did occur to me as (probably) being more efficient, but as you say, not very DRY.

After sleeping on it, I think it's possible to get a full 16 bit divide with a little more work. Since the overflow happens on a multiple of 16, I think you can just add the upper nibble to the lower one before starting... Testing now.

User avatar
tokumaru
Posts: 11712
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by tokumaru » Sun Jan 07, 2018 12:48 pm

Another problem I have with the limited number of bits is that I expect to have vertically wrapping levels, like is the case of some Sonic levels. In these cases, the vertical camera coordinate doesn't wrap when the level map ends, what happens is that the game starts reading/drawing mirrors of the level map, while the full coordinates are used for objects and such. The vertical coordinate only wraps after 65535, regardless of the height of the level map.

psycopathicteen
Posts: 2935
Joined: Wed May 19, 2010 6:12 pm

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by psycopathicteen » Sun Jan 07, 2018 1:28 pm

This might be a little excessive, but you can have a 3rd byte for y camera coordinates that only goes from 0-14.

User avatar
FrankenGraphics
Formerly WheelInventor
Posts: 2032
Joined: Thu Apr 14, 2016 2:55 am
Location: Gothenburg, Sweden
Contact:

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by FrankenGraphics » Sun Jan 07, 2018 2:04 pm

Metroid for NES is about twice that tall, but it's broken up into general areas that are in turn broken up into smaller sections 1 screen wide or tall.
It's not really broken up as far as the data structure is concerned. Metroid has one unified big map where all areas are laid out, though not every "slot" in that map is used. It works by:

1) Having a room ID for each room. Each screen slot points to a room ID. The same room can be and is extensively reused multiple times. Saves a lot of space for its travel length.
2) Area switch objects - if you pass it (use the elevator), another set of room data and tiles will be used to "dress" the following rooms on the world map. This means multiple rooms with different layout across the different areas share the same room ID. While metroid doesn't expoit this, this half-intentional feature effectively means you can create position-overlapping rooms and criss-crossing corridors in layers even though the world map is a flat ractangle. Btw, you can also cross the boundary to get to the other side of the map if you wanted an effectively nonrectangular shape on your world map.
3) Palette switch objects - if you pass it the palette will switch for the next room - this puts on a facade of variation and helps the player to orient despite the monotonous levels.

As far as the game is concerned, you can fill the whole world map with one long tunnel with an endless seam; vertical or horizontal.


I think the similar engine used in kid icarus has some elevations up to 13 screens or so. between 13 and 20-21. I can't think of any other.
Megaman 2: crashman - 14 screens high, i think.
http://www.frankengraphics.com - personal NES blog

User avatar
rainwarrior
Posts: 7818
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by rainwarrior » Sun Jan 07, 2018 2:57 pm

FrankenGraphics wrote:Megaman 2: crashman - 14 screens high, i think.
Mega Man doesn't have vertical scrolling, though, so there's no reason it would have the "240 problem".

User avatar
FrankenGraphics
Formerly WheelInventor
Posts: 2032
Joined: Thu Apr 14, 2016 2:55 am
Location: Gothenburg, Sweden
Contact:

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by FrankenGraphics » Sun Jan 07, 2018 3:01 pm

Haha, dang it! Well, at least kid icarus would break this limit.
http://www.frankengraphics.com - personal NES blog

psycopathicteen
Posts: 2935
Joined: Wed May 19, 2010 6:12 pm

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by psycopathicteen » Sun Jan 07, 2018 3:10 pm

This is how it would work with 16-bit coordinates:

Code: Select all

lda {bgy_hi}
ldy #$00
cmp #$f0
bcc +
sbc #$f0
iny
+;
sta {temp}
lsr #4
clc
adc {temp}
tax

asl #4
clc
adc {bgy_lo}
bcc +
inx
adc #$0f
+;
cmp #$f0
bcc +
inx
sbc #$f0
+;

Last edited by psycopathicteen on Sun Jan 07, 2018 11:17 pm, edited 3 times in total.

User avatar
slembcke
Posts: 171
Joined: Fri Nov 24, 2017 2:40 pm
Location: Minnesota

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by slembcke » Sun Jan 07, 2018 7:20 pm

psycopathicteen wrote:This is how it would work with 16-bit coordinates:
Not sure how that is supposed to work. It just subtracts 240 from the upper byte of the dividend?

This ruby snippet works for all valid quotient/remainders, but needs a 16 bit shift and add.

Code: Select all

def div240_16(n)
	q = n >> 8
	r = n & 0xFF
	
	r += q << 4 # 16 bit shit and add
	r_hi = r >> 8
	r_lo = r & 0xFF
	
	q += r_hi
	r = r_lo + (r_hi << 4)
	
	if r >= 240 then
		q += 1
		r -= 240
	end
	
	return q, r
end
My first attempt to rewrite that in asm failed, but I think it's mostly right, and ran in ~80 cycles. Will have to debug later though. Time for house chores and relaxing!

User avatar
slembcke
Posts: 171
Joined: Fri Nov 24, 2017 2:40 pm
Location: Minnesota

Re: A fast 12 bit divide by 240 (~30 cycles)

Post by slembcke » Wed Jan 17, 2018 3:23 pm

For posterity, I did get a full ~80 cycle divide working here:
viewtopic.php?f=2&t=16933

Not really sure if it's worth it at 80 cycles except maybe when used from C code that's slow anyway.

Post Reply