- For making cartridges of your Super NES games, see Reproduction.
Btw. if INC is meant to be short for INC A... then CLC+ADC+INC would be same as SEC+ADC.
When using double buffered brightness values, instead of setting "current=new", you could update them as "current=(current+new)/2" for the effect with slowly changing to target level.
Instead of updating brightness only each second frame, it's probably better to update one half of it every each frame (in case you haven't already planned to do it like that).
PS. I like the snow effect. That's also something that you could keep in there for use in different realities. Some with snow, and some with other effects.
https://github.com/rmn0/rem/commit/c26c ... aa8f1e2498
For comparison, I've also tried out removing the pre baked lighting and just using one column of tiles with more carefully chosen dithering patterns (still using the low byte):
https://github.com/rmn0/rem/commit/0a57 ... ac26fb6569
There is a major problem with the high-byte approach as it is now. I needed to crank up the brightness of the background palettes a lot in order to make it look better. The reason is subtractive blending in combination with the limited color resolution (only 32 colors). This means that because of less choices for dithering patterns with only 4 tiles, dark colors in the background will fade away very quickly and abruptly. I can try using only two colors per tile and make four dithering patterns, but, because with four palette choices, thats 5 shades in total, instead of 4 now, so not much gained over the low byte solution.
Still, I need to try the idea of combining both and having 5 bits shadow. This would give 4 choices of dithering patterns and 9 color gradients (32 shades in total).
https://github.com/rmn0/rem/commit/b79f ... 2468a5a651
Again a quick solution, it gets the tile outside the light radius wrong but that can be fixed.
https://github.com/rmn0/rem/commit/e9be ... da42b258ed
It fixes the visual glitches. Also it does not need to calculate offscreen tiles anymore (the old loop had some overhead tiles).
It is now using two buffers, a source and a destination buffer. The destination buffer needs to be cleared every frame, but that is fine because it can be done via dma memset.
I timed it, all in all it saves about 20 scanlines. Not sure if timing was correct though.
So, all in all, visuals have been improved, some more free vram, 64k rom saved, and its faster, and there is still room for optimization because both the X and Y registers are unused now. I can imagine I don't need to do a full source buffer update from rom every frame now anymore, and instead just do scrolling updates for that. However no prebaked lighting anymore and a bit more WRAM used for the second buffer.
Thanks everyone for great thoughts on this.
I did the music, yes. However I did not write the music engine, I'm using SNESMOD for that. I've just made some minor modifications (fixed a small problem with sample playback and ported the whole thing so that it compiles with ca65).
Could WRAM be replaced by SRAM? Since CART regions can be set to be accessed at ~3.58 MHz...
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10
I've tried it out. Currently, only the destination buffer can be in SRAM, because DMA is used to copy the buffer from ROM to WRAM. This means that in the lighting routine, a far pointer into the source buffer needs to be used, resulting in a net loss of about 2~3 scanlines.Could WRAM be replaced by SRAM?
I plan on changing the source buffer layout however, but I am undecided on how to approach that.
Atm, the source buffer (just like the destination buffer) is always aligned to the light source, meaning the buffer address moves in memory if the light source moves, in relation to screen space (i.e. the scroll position).
A few options that i thought about:
1. Have the source buffer aligned with the scroll position. This would require indirect adressing with X register in the lighting loop. Advantage would be that the fence around the visible region does not move in memory and is never overwritten by buffer updates (because it is also aligned to the screen). So the fence could be set up once during initialization and doesn't need to be redrawn each frame anymore. I've already tried out this one, nothing gained, nothing lost in terms of speed. The buffer can be smaller this way though because it doesn't require padding anymore.
2. Have the source buffer aligned with the position of the light in world space (i.e. in absolute coordinates). This would also require the X register thing. This would allow scrolling updates for the source buffer instead of full updates. However the fence could not be just written over it anymore, because it would destroy the scrolled-in data. The fence would need to be eored with the data (twice), making it 4x slower to build it.
3. Expanding on (2) - With this, maybe the buffer updates can be fast without DMA, although they will slow down more then if the light moves fast. This would make using SRAM for both buffers possible, avoiding the far pointer.
Currently, the buffer is stitched together from four different rooms in ROM, resulting in 60 short DMA transfers for building it. Setting those up costs about ~50 cpu cycles each, with the data transfer itself, thats about 4000 cycles or so. With scrolling updates, this could be cut roughly by half. I'm not sure if that would prove to be worthwhile, because of the associated problems... also, it is quite troublesome to implement, so I'm not sure if I should already try that, only to gain a very small amount of speed at best...
On a side note, I've optimized the lighting loop a bit to use faster branches for the early out where possible, this has improved speed a bit (about 3 scanlines gained).
Theoretically, but the default SRAM mapping is...
Code: Select all
HiROM ---> SRAM at 30h-3Fh,B0h-BFh:6000h-7FFFh ;small 8K SRAM bank(s) LoROM ---> SRAM at 70h-7Dh,F0h-FFh:0000h-7FFFh ;big 32K SRAM bank(s)
The DMA registers at 43xxh could be also misused to store a few bytes of data (if you don't use all DMA channels, and need fast access to a few often used variables).
Nintendo didn't seem to care about Fast SRAM support. It would have been nice for Fast Stack (in bank 00h), and for carts with Slow ROM (and for executing code in RAM in general). But it might be less important for "normal" code: Fast RAM won't improve opcode fetches from ROM. And Fast RAM won't improve DMA transfer timings.
Still a bit abrupt near the edges of the light, because of the way the gradient is set up. I might have tried a slightly different approach to try to smooth that out, but it's probably more of an artistic call at this point given the resources you're juggling.
I notice the absence of light level data in the map. To me it looks more realistic this way, if a bit less atmospheric/creepy (also, the way you've arranged the gradient makes the light reach farther and seem brighter than it did in the first version). If you want a dark patch on a wall, drawing a dark patch directly on the wall is probably your best bet. If you want an area that light mysteriously doesn't illuminate, your old approach is great for that, but I think it confuses the viewer a bit and makes the shadowcasting less obvious.
Lightmap updating is obviously still jerky during slow movement, but that's because of the tile granularity, and right now I don't see a way to fix that without either using more CPU or wasting a silly amount of ROM.
Judging from your comments re: DMA budget, I'm guessing this is supposed to end up PAL-only. Is that correct?
https://github.com/rmn0/rem/commit/b5e2 ... a4693f36d8
- the short branches optimization
- uses 16 bit instructions to build the fence faster
- I didn't need to clear the whole destination buffer with dma, just the visible part
- I noticed I couldn't have pitch black anymore, I reintroduced it, this change sacrifices the last palette to achieve that
- Because the light levels are more fine grained now, for some tiles the light level needs to be incremented twice, I fixed that
Also, I figured out a way to improve on the problem with the abrupt changes when moving slow.
https://github.com/rmn0/rem/commit/6747 ... 1bb831028c
Edit: I've included the wrong build of the rom with that commit. Heres the correct version: https://github.com/rmn0/rem/blob/02f9bf ... 65/rem.sfc
It allows for sub-tile precision for the light source position. It works by adjusting the table that sets up the light paths. It doesn't need any additional computation / ram buffers.
However, it also introduces some flicker and a strange wavey effect that somehow gives me motion sickness. And it requires some variations of the light loop (currently 4 variations, each one being ~24k of code).
Because of the rather high amount of ROM needed for that, I've only implemented it for horizontal movement.
Now with the higher amount of light levels available, this can be improved upon by more careful level design.Still a bit abrupt near the edges of the light, because of the way the gradient is set up. I might have tried a slightly different approach to try to smooth that out, but it's probably more of an artistic call at this point given the resources you're juggling.
Judging from your comments re: DMA budget, I'm guessing this is supposed to end up PAL-only. Is that correct?
Edit: I misremembered that. In fact NTSC should be possible.
As an American, that's a shame that it won't run on my console or on my TV, but that's your move.none wrote: ↑Tue Sep 08, 2020 2:53 amYeah, I didn't really plan on supporting NTSC from the start, because of the limited DMA bandwidth. Maybe an NTSC version would be possible with some trickery, or just reducing update frequency.Judging from your comments re: DMA budget, I'm guessing this is supposed to end up PAL-only. Is that correct?
Oops, again i misremembered. I looked it up, actually I was in fact planning with a 262 scanline frame as opposed to the 312 scanlines available on PAL. So making an NTSC version should work without any big changes,Yeah, I didn't really plan on supporting NTSC from the start, because of the limited DMA bandwidth. Maybe an NTSC version would be possible with some trickery, or just reducing update frequency.
Edit: I was wrong again....it already is using 224 scanline mode.
Sorry for the confusion.
The lighting itself only uses a 32x32 tile layer, the right-most column is blackened with sprites to hide the tiles that are scrolling in, like its done in some NES games (this has also other optimization reasons apart from DMA and probably a 64x32 layer could also be used instead).
That means at the moment, just under 1kb needs to be transferred for the lighting.
In total, with sprite animations and scrolling updates, its using around 15-20 scanlines now, depending on some factors scrolling speed, and if a teleport needs to be done (the teleport updates the full screen progressively over the course of 7 frames).