Rem Demo

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
User avatar
Señor Ventura
Posts: 135
Joined: Sat Aug 20, 2016 3:58 am

Re: Rem Demo

Post by Señor Ventura » Wed Sep 09, 2020 3:52 am

Nikku4211 wrote:
Tue Sep 08, 2020 6:42 pm
Señor Ventura wrote:
Tue Sep 08, 2020 11:09 am
Doesn't NTSC at 50fps results the same in terms of DMA bandwidth than PAL?.
PAL also has 100 more lines than NTSC, so I don't think so.
Right, i wasn't thinking about this. My mistake.
calima wrote:
Wed Sep 09, 2020 12:09 am
Unlike PAL60, NTSC50 is not a standard and almost no TV can display it properly.
Not 50hz, but 50fps. But as nikku said, pal has a lot more scanlines at 170.5 Bytes each.

creaothceann
Posts: 253
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Rem Demo

Post by creaothceann » Wed Sep 09, 2020 3:56 am

none wrote:
Wed Sep 09, 2020 2:07 am
the right-most column is blackened with sprites to hide the tiles that are scrolling in, like its done in some NES games
Are you planning to use the window registers for something? If not you could use them to clip the screen, to eliminate the possibility of sprite flickering in the future.
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10

none
Posts: 38
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Wed Sep 09, 2020 4:03 am

Yes, I wanted to save the window registers for animating some stuff.

I thought about using larger sprites for the screen border though to save sprites.

Its a pity the SNES doesn't have the PPUMASK bits like the NES does.

nocash
Posts: 1228
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Rem Demo

Post by nocash » Wed Sep 09, 2020 6:15 am

Using 64x32 tiles per BG for horizontal scrolling is wasting 8Kbytes of VRAM (when doing it on all four BG layers), and it's a bit uncomfortable to update tiles in two map-halves... but it might be within scope.
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

none
Posts: 38
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Wed Sep 09, 2020 7:36 am

I've tried out using HDMA for adding a wave effect to the light layer.

https://github.com/rmn0/rem/commit/1705 ... 6e65eef5c3

Image

It helps a little in hiding the blocky look in some places, at least for more vertical angles.

I've also tried animating the wave pattern, but that immediately gives that underwater / hot air feel...

User avatar
Nikku4211
Posts: 102
Joined: Sun Dec 15, 2019 1:28 pm
Location: Bronx, New York
Contact:

Re: Rem Demo

Post by Nikku4211 » Wed Sep 09, 2020 7:50 am

none wrote:
Wed Sep 09, 2020 2:07 am
Yeah, I didn't really plan on supporting NTSC from the start, because of the limited DMA bandwidth. Maybe an NTSC version would be possible with some trickery, or just reducing update frequency.
Oops, again i misremembered. I looked it up, actually I was in fact planning with a 262 scanline frame as opposed to the 312 scanlines available on PAL. So making an NTSC version should work without any big changes, just using 224 scanline mode instead of 240 to increase vblank time a bit.
There's literally no reason to use 256x239 in NTSC because the extra scanlines don't even display on an NTSC TV since it's in the overscan area.
none wrote:
Wed Sep 09, 2020 2:07 am
The lighting itself only uses a 32x32 tile layer, the right-most column is blackened with sprites to hide the tiles that are scrolling in, like its done in some NES games (this has also other optimization reasons apart from DMA and probably a 64x32 layer could also be used instead).

That means at the moment, just under 1kb needs to be transferred for the lighting.

In total, with sprite animations and scrolling updates, its using around 15-20 scanlines now, depending on some factors scrolling speed, and if a teleport needs to be done (the teleport updates the full screen progressively over the course of 7 frames).
That's cool and all, but why not just use windowing to hide the right-most column? No OAM entries required.
creaothceann wrote:
Wed Sep 09, 2020 3:56 am
Are you planning to use the window registers for something? If not you could use them to clip the screen, to eliminate the possibility of sprite flickering in the future.
This. I don't want this game to flicker like an NES game.
none wrote:
Wed Sep 09, 2020 4:03 am
Yes, I wanted to save the window registers for animating some stuff.
Like what? I'm pretty sure I'd rather use sprites with a translucent palette for lights and explosions than windowing. Windowing that's more than just columns would require a channel of H/DMA being taken up, and I'd rather save the DMA channel for something better.
I have an ASD, so empathy is not natural for me. If I hurt you, I apologise.

none
Posts: 38
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Wed Sep 09, 2020 8:23 am

One idea was to use the window register to animate large flat-shaded enemies / bosses, like this one (concept art):

Image

The idea was to animate most of it using the window registers and use sprites in a few places for spots of color or where there are too many spans in one scanline.

All the window register animation could be done through hdma and thus would be basically free, as opposed to lots of vram updates for updating the sprite animation, and the animation would be small in ROM (4 bytes per scanline -> 800 bytes per frame of animation for a 200 pixel tall monster)

Also, i still wanted to try the "hard shadow edges" shadow casting on top of the current lighting in a few places.

I think I will mostly be able to avoid sprite flicker because I don't intend to put many enemies into one spot, and adding combat difficulty through more varying / intelligent enemy behaviors instead.

calima
Posts: 1186
Joined: Tue Oct 06, 2015 10:16 am

Re: Rem Demo

Post by calima » Wed Sep 09, 2020 10:52 am

I don't think you could draw that boss via the window. Isn't it limited to one continuous area? The legs and the hanging head would have two or three.

lidnariq
Posts: 9659
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Rem Demo

Post by lidnariq » Wed Sep 09, 2020 11:00 am

The two windowing registers have left and right edges, so you can draw a five-pointed star, or this fellow's legs and folded head. I'm pretty certain I only see five regions on each scanline.

none
Posts: 38
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Wed Sep 09, 2020 11:52 am

A problem would be that the horizontal scrolling and clipping would need to be "brute forced", i.e. each window coordinate needs to be scrolled and clipped individually (800 times in the above example). That would cost a bunch of frame time (not vblank time), something like roughly 30 ~ 40 scanlines.

I wonder if it can be done with 16 bit instructions (somehow scrolling two coordinates at once) but I don't think it's viable because doing it via LUT would require a 24bit LUT and doing it arithmetically (i.e., with adc) would make the result of the addition of the low bytes overflow into the high byte and even if that could be fixed with masking out the first bit (essentially halving horizontal resolution), there is still the clipping problem and I have no idea how to solve that.

In 8 bits, my best ideas yet were just using a 64k LUT to do the addition and clipping at once, or using addition and clamping with the carry flag (using the buffer prefill / skipping method just like with the lighting). The good thing is that only one screen border (left OR right) needs to be clipped against.

Edit: In fact the bit 1 mask problem is no problem at all.... for example when clipping against the right screeen border: When the left border is clipped, the right one will be clipped to, so no problem.

Edit 2: These are the best solutions I could come up with...

for 16 bits

Code: Select all

	tya					; (2) y = monster horizontal position
	clc					; (2)
	adc 	window_right_left, x		; (5) 

	bcs	_clip_both_window_borders	; (2)

	bit 	window_right_mask, x		; (6)
	bne 	_write				; (2)

	ora 	#$00ff				; (3*)

_write:
	xba					; (2)
	sta 	hdma_table, x			; (6)

_clip_both_window_borders:

	iny					; (2)

						; total 27 / 2 = 13.5 cycles per coordinate
for 8 bits

Code: Select all


	txa					; (2) x = monster horizontal position
	clc					; (2)
	adc 	[window], y			; (6)
	bcc 	:+				; (2)
	lda	#$ff				; (2*)
	:
	sta	reg_wmdata			; (4)
	iny					; (2)

						; total 18 cycles per coordinate

the 16-bits solution requires double the amount of rom for the masking part though...

none
Posts: 38
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Wed Sep 09, 2020 3:42 pm

In my previous attempt I totally missed that I could exploit the fact that the if the rightmost coordinate isn't clipped, the others wont be as well. Also I screwed up a bit (16 bit version is buggy and cycle counts aren't right)...

Here's new and improved versions that update all 4 coordinates in a scanline at once. The 16bit version now relies on the monsters not being wider than 128 pixels so that the highest bit is always zero. That bit can then be used for testing and overflow will never occur. When I know that I must clip against the right screen border, the scrolling coordinate will always be greater than 128 so I just use the low seven bits from the scrolling coordinate and the high bit is patched in with or.

I'll try it out tomorrow I think.

16 bits:

Code: Select all

	tya					; 2
	adc	window + $2, x			; 5
	bpl	clip_none			; 3
	bit	#$80
	beq	clip_one
	...

clip_none:
	ora	#$8080				; 3
	sta	hdma_table + $2, x		; 5
	lda	window + $0, x			; 5
	ora	#$8080				; 3
	sta	hdma_table + $0, x		; 5


	inx					; 2
	inx					; 2
	inx					; 2
	inx					; 2
	
						; total 39

8 bits:

Code: Select all

	tya					; 2
	clc					; 2
	adc	window + $3, x			; 4
	bcc	clip_none			; 3
	
	tya
	clc
	adc	window + $2, x
	bcc	clip_one
	...
	
clip_none:
	lda	window + $0, x			; 4
	sta	reg_wmdata			; 3
	lda	window + $1, x			; 4
	sta	reg_wmdata			; 3
	lda	window + $2, x			; 4
	sta	reg_wmdata			; 3
	lda	window + $3, x			; 4
	sta	reg_wmdata			; 3
	
	inx					; 2
	inx					; 2
	inx					; 2
	inx					; 2


						; total 47

none
Posts: 38
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Thu Sep 10, 2020 10:25 am

I've managed to put together a working POC. No vertical positioning and clipping only against the right screen border for now. Loops are completely unrolled for easier comparison.

https://github.com/rmn0/rem/commit/37f7 ... 83d29049f4

The performance depends a bit on how much clipping does occur. If much clipping occurs, that means it becomes slower in the 16-bit version and faster in the 8-bit version. The test image is 160 lines tall.

The 16-bit version takes ~32..35 scanlines.
The 8 bit version takes ~25..38 scanlines.

Maybe the 8-bit version could be better if it was looking at a window position in the center first (making it behave more like a binary search), but then again, it needs to write serially into wmdata so at least one position needs to be reloaded at some point, so idk if that is worthwhile.

So, if nothing comes to mind how the 16-bit version could be greatly improved, the 8-bit version it is as is doesn't have the 128 pixel max width restriction.

Any thoughts on how this could be improved upon?

nocash
Posts: 1228
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Rem Demo

Post by nocash » Thu Sep 10, 2020 11:07 am

Using multiplel loops, one that defaults to clipping, one that defaults to partial clipping, and one for complete clipping?
That, with the conditional jumps jumping to the other loops when needed, and otherwise staying in the current loop.

Doing that with completely unrolled loops would take up a lot of memory (and exceed the conditional jump range).
But it should work with semi-unrolled loops, like processing 8 unrolled lines at once (which might also help on processing vertical clipping).
Last edited by nocash on Fri Sep 11, 2020 8:40 am, edited 1 time in total.
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

none
Posts: 38
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Thu Sep 10, 2020 1:45 pm

Great idea, i've tried it out with the 8-bit version.

https://github.com/rmn0/rem/commit/7595 ... 9be8195891

It does level out the performance quite a bit so it is mostly around 28 to 30 scanlines now, but it sometimes still jumps to 40 or so, when it has to deal with lots of the tricky situations (then it is worse). Still, this makes it better overall.

Having the partial clipping variation of the inner loop doesn't seem to gain anything. Also, it doesn't seem to matter if scanline group size is 8 or 16 (what is gained in precision, is lost in setup time).

Edit: there was a bug with the partial clipping, I have fixed it:
https://github.com/rmn0/rem/commit/40e1 ... c7ad582f40

With partial clipping, the performance spikes don't occur anymore and it is always below 35 scanlines now.


I'm not sure though if it would be faster to just put a bounding rectangle around everything, as there is some overhead (there's an additonal pointer involved and the registers are all used up, requiring an awkward stack allocation)

About vertical clipping, clipping to the top border is not a problem, it can be done just by adjusting the starting offsets. For clipping to the bottom border, it should be fine to just let it render a few excess elements and patch in the hdma table end byte afterwards (having one large unrolled loop is just for testing it out).

none
Posts: 38
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Fri Sep 11, 2020 9:31 am

I've cleaned everything up a bit and made a few further optimizations....

https://github.com/rmn0/rem/blob/featur ... c/window.s

Post Reply