Rem Demo

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
nocash
Posts: 1258
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Rem Demo

Post by nocash » Sun Sep 06, 2020 4:55 am

ADC does also affect the N flag, and also the C flag, that would work for overrun checking, too.
Btw. if INC is meant to be short for INC A... then CLC+ADC+INC would be same as SEC+ADC.

When using double buffered brightness values, instead of setting "current=new", you could update them as "current=(current+new)/2" for the effect with slowly changing to target level.

Instead of updating brightness only each second frame, it's probably better to update one half of it every each frame (in case you haven't already planned to do it like that).

PS. I like the snow effect. That's also something that you could keep in there for use in different realities. Some with snow, and some with other effects.
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

none
Posts: 51
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Sun Sep 06, 2020 6:34 am

I've tried out the high-byte solution (still taking the pre baked lighting into account). Code changes are here (quick and dirty, but you get the idea):

https://github.com/rmn0/rem/commit/c26c ... aa8f1e2498

Image

For comparison, I've also tried out removing the pre baked lighting and just using one column of tiles with more carefully chosen dithering patterns (still using the low byte):

https://github.com/rmn0/rem/commit/0a57 ... ac26fb6569

Image

There is a major problem with the high-byte approach as it is now. I needed to crank up the brightness of the background palettes a lot in order to make it look better. The reason is subtractive blending in combination with the limited color resolution (only 32 colors). This means that because of less choices for dithering patterns with only 4 tiles, dark colors in the background will fade away very quickly and abruptly. I can try using only two colors per tile and make four dithering patterns, but, because with four palette choices, thats 5 shades in total, instead of 4 now, so not much gained over the low byte solution.

Still, I need to try the idea of combining both and having 5 bits shadow. This would give 4 choices of dithering patterns and 9 color gradients (32 shades in total).

none
Posts: 51
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Sun Sep 06, 2020 7:36 am

The 5 bit solution works and makes for a very smooth color transition.

https://github.com/rmn0/rem/commit/b79f ... 2468a5a651

Image

Again a quick solution, it gets the tile outside the light radius wrong but that can be fixed.

KungFuFurby
Posts: 263
Joined: Wed Jul 09, 2008 8:46 pm

Re: Rem Demo

Post by KungFuFurby » Sun Sep 06, 2020 10:54 am

Did you also do the music for this game? You've got me quite curious.

none
Posts: 51
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Sun Sep 06, 2020 11:58 am

I have implemented the new lighting loop that does away with the lookup table.

https://github.com/rmn0/rem/commit/e9be ... da42b258ed

It fixes the visual glitches. Also it does not need to calculate offscreen tiles anymore (the old loop had some overhead tiles).
It is now using two buffers, a source and a destination buffer. The destination buffer needs to be cleared every frame, but that is fine because it can be done via dma memset.

I timed it, all in all it saves about 20 scanlines. Not sure if timing was correct though.

So, all in all, visuals have been improved, some more free vram, 64k rom saved, and its faster, and there is still room for optimization because both the X and Y registers are unused now. I can imagine I don't need to do a full source buffer update from rom every frame now anymore, and instead just do scrolling updates for that. However no prebaked lighting anymore and a bit more WRAM used for the second buffer.

Thanks everyone for great thoughts on this.


...

I did the music, yes. However I did not write the music engine, I'm using SNESMOD for that. I've just made some minor modifications (fixed a small problem with sample playback and ported the whole thing so that it compiles with ca65).

creaothceann
Posts: 270
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany
Contact:

Re: Rem Demo

Post by creaothceann » Sun Sep 06, 2020 1:47 pm

none wrote:
Sun Sep 06, 2020 11:58 am
and a bit more WRAM used for the second buffer
Could WRAM be replaced by SRAM? Since CART regions can be set to be accessed at ~3.58 MHz...
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10

none
Posts: 51
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Mon Sep 07, 2020 9:03 am

Could WRAM be replaced by SRAM?
I've tried it out. Currently, only the destination buffer can be in SRAM, because DMA is used to copy the buffer from ROM to WRAM. This means that in the lighting routine, a far pointer into the source buffer needs to be used, resulting in a net loss of about 2~3 scanlines.

I plan on changing the source buffer layout however, but I am undecided on how to approach that.

Atm, the source buffer (just like the destination buffer) is always aligned to the light source, meaning the buffer address moves in memory if the light source moves, in relation to screen space (i.e. the scroll position).

A few options that i thought about:
1. Have the source buffer aligned with the scroll position. This would require indirect adressing with X register in the lighting loop. Advantage would be that the fence around the visible region does not move in memory and is never overwritten by buffer updates (because it is also aligned to the screen). So the fence could be set up once during initialization and doesn't need to be redrawn each frame anymore. I've already tried out this one, nothing gained, nothing lost in terms of speed. The buffer can be smaller this way though because it doesn't require padding anymore.
2. Have the source buffer aligned with the position of the light in world space (i.e. in absolute coordinates). This would also require the X register thing. This would allow scrolling updates for the source buffer instead of full updates. However the fence could not be just written over it anymore, because it would destroy the scrolled-in data. The fence would need to be eored with the data (twice), making it 4x slower to build it.
3. Expanding on (2) - With this, maybe the buffer updates can be fast without DMA, although they will slow down more then if the light moves fast. This would make using SRAM for both buffers possible, avoiding the far pointer.

Currently, the buffer is stitched together from four different rooms in ROM, resulting in 60 short DMA transfers for building it. Setting those up costs about ~50 cpu cycles each, with the data transfer itself, thats about 4000 cycles or so. With scrolling updates, this could be cut roughly by half. I'm not sure if that would prove to be worthwhile, because of the associated problems... also, it is quite troublesome to implement, so I'm not sure if I should already try that, only to gain a very small amount of speed at best...

On a side note, I've optimized the lighting loop a bit to use faster branches for the early out where possible, this has improved speed a bit (about 3 scanlines gained).

nocash
Posts: 1258
Joined: Fri Feb 24, 2012 12:09 pm
Contact:

Re: Rem Demo

Post by nocash » Mon Sep 07, 2020 4:35 pm

creaothceann wrote:
Sun Sep 06, 2020 1:47 pm
Could WRAM be replaced by SRAM? Since CART regions can be set to be accessed at ~3.58 MHz...
Theoretically, but the default SRAM mapping is...

Code: Select all

  HiROM ---> SRAM at 30h-3Fh,B0h-BFh:6000h-7FFFh    ;small 8K SRAM bank(s)
  LoROM ---> SRAM at 70h-7Dh,F0h-FFh:0000h-7FFFh    ;big 32K SRAM bank(s)
Fast memory could be mapped to 5000h-5FFFh (only done in satellaview bios cartridge). Hmmm, and most standard LoROM carts might be capable of using the mirror in bank F0h-FFh for fast SRAM. Rem is a HiROM game though.
The DMA registers at 43xxh could be also misused to store a few bytes of data (if you don't use all DMA channels, and need fast access to a few often used variables).

Nintendo didn't seem to care about Fast SRAM support. It would have been nice for Fast Stack (in bank 00h), and for carts with Slow ROM (and for executing code in RAM in general). But it might be less important for "normal" code: Fast RAM won't improve opcode fetches from ROM. And Fast RAM won't improve DMA transfer timings.
homepage - patreon - you can think of a bit as a bottle that is either half full or half empty

93143
Posts: 1261
Joined: Fri Jul 04, 2014 9:31 pm

Re: Rem Demo

Post by 93143 » Mon Sep 07, 2020 6:31 pm

The shadowcasting looks amazing, particularly when you're moving fast or dealing with ledges. Who would have thought the SNES could do that?

Still a bit abrupt near the edges of the light, because of the way the gradient is set up. I might have tried a slightly different approach to try to smooth that out, but it's probably more of an artistic call at this point given the resources you're juggling.

I notice the absence of light level data in the map. To me it looks more realistic this way, if a bit less atmospheric/creepy (also, the way you've arranged the gradient makes the light reach farther and seem brighter than it did in the first version). If you want a dark patch on a wall, drawing a dark patch directly on the wall is probably your best bet. If you want an area that light mysteriously doesn't illuminate, your old approach is great for that, but I think it confuses the viewer a bit and makes the shadowcasting less obvious.

Lightmap updating is obviously still jerky during slow movement, but that's because of the tile granularity, and right now I don't see a way to fix that without either using more CPU or wasting a silly amount of ROM.

...

Judging from your comments re: DMA budget, I'm guessing this is supposed to end up PAL-only. Is that correct?

none
Posts: 51
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Tue Sep 08, 2020 2:53 am

I've made some made optimizations and bugfixes:

https://github.com/rmn0/rem/commit/b5e2 ... a4693f36d8

- the short branches optimization
- uses 16 bit instructions to build the fence faster
- I didn't need to clear the whole destination buffer with dma, just the visible part
- I noticed I couldn't have pitch black anymore, I reintroduced it, this change sacrifices the last palette to achieve that
- Because the light levels are more fine grained now, for some tiles the light level needs to be incremented twice, I fixed that

Also, I figured out a way to improve on the problem with the abrupt changes when moving slow.

https://github.com/rmn0/rem/commit/6747 ... 1bb831028c

Edit: I've included the wrong build of the rom with that commit. Heres the correct version: https://github.com/rmn0/rem/blob/02f9bf ... 65/rem.sfc

It allows for sub-tile precision for the light source position. It works by adjusting the table that sets up the light paths. It doesn't need any additional computation / ram buffers.

However, it also introduces some flicker and a strange wavey effect that somehow gives me motion sickness. And it requires some variations of the light loop (currently 4 variations, each one being ~24k of code).

Because of the rather high amount of ROM needed for that, I've only implemented it for horizontal movement.
Still a bit abrupt near the edges of the light, because of the way the gradient is set up. I might have tried a slightly different approach to try to smooth that out, but it's probably more of an artistic call at this point given the resources you're juggling.
Now with the higher amount of light levels available, this can be improved upon by more careful level design.
Judging from your comments re: DMA budget, I'm guessing this is supposed to end up PAL-only. Is that correct?
Yeah, I didn't really plan on supporting NTSC from the start, because of the limited DMA bandwidth. Maybe an NTSC version would be possible with some trickery, or just reducing update frequency.
Edit: I misremembered that. In fact NTSC should be possible.
Last edited by none on Wed Sep 09, 2020 2:20 am, edited 1 time in total.

User avatar
Nikku4211
Posts: 212
Joined: Sun Dec 15, 2019 1:28 pm
Location: Bronx, New York
Contact:

Re: Rem Demo

Post by Nikku4211 » Tue Sep 08, 2020 9:32 am

none wrote:
Tue Sep 08, 2020 2:53 am
Judging from your comments re: DMA budget, I'm guessing this is supposed to end up PAL-only. Is that correct?
Yeah, I didn't really plan on supporting NTSC from the start, because of the limited DMA bandwidth. Maybe an NTSC version would be possible with some trickery, or just reducing update frequency.
As an American, that's a shame that it won't run on my console or on my TV, but that's your move.
I have an ASD, so empathy is not natural for me. If I hurt you, I apologise.

User avatar
Señor Ventura
Posts: 157
Joined: Sat Aug 20, 2016 3:58 am

Re: Rem Demo

Post by Señor Ventura » Tue Sep 08, 2020 11:09 am

Doesn't NTSC at 50fps results the same in terms of DMA bandwidth than PAL?.

User avatar
Nikku4211
Posts: 212
Joined: Sun Dec 15, 2019 1:28 pm
Location: Bronx, New York
Contact:

Re: Rem Demo

Post by Nikku4211 » Tue Sep 08, 2020 6:42 pm

Señor Ventura wrote:
Tue Sep 08, 2020 11:09 am
Doesn't NTSC at 50fps results the same in terms of DMA bandwidth than PAL?.
PAL also has 100 more lines than NTSC, so I don't think so.
I have an ASD, so empathy is not natural for me. If I hurt you, I apologise.

calima
Posts: 1238
Joined: Tue Oct 06, 2015 10:16 am

Re: Rem Demo

Post by calima » Wed Sep 09, 2020 12:09 am

Unlike PAL60, NTSC50 is not a standard and almost no TV can display it properly.

none
Posts: 51
Joined: Thu Sep 03, 2020 12:56 am

Re: Rem Demo

Post by none » Wed Sep 09, 2020 2:07 am

Yeah, I didn't really plan on supporting NTSC from the start, because of the limited DMA bandwidth. Maybe an NTSC version would be possible with some trickery, or just reducing update frequency.
Oops, again i misremembered. I looked it up, actually I was in fact planning with a 262 scanline frame as opposed to the 312 scanlines available on PAL. So making an NTSC version should work without any big changes, just using 224 scanline mode instead of 240 to increase vblank time a bit.

Edit: I was wrong again....it already is using 224 scanline mode.

Sorry for the confusion.

The lighting itself only uses a 32x32 tile layer, the right-most column is blackened with sprites to hide the tiles that are scrolling in, like its done in some NES games (this has also other optimization reasons apart from DMA and probably a 64x32 layer could also be used instead).

That means at the moment, just under 1kb needs to be transferred for the lighting.

In total, with sprite animations and scrolling updates, its using around 15-20 scanlines now, depending on some factors scrolling speed, and if a teleport needs to be done (the teleport updates the full screen progressively over the course of 7 frames).

Post Reply