Design guidance for 3D Engine on SuperFX

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
secondsun
Posts: 41
Joined: Tue Jul 31, 2018 9:37 am

Design guidance for 3D Engine on SuperFX

Post by secondsun »

I’m prototyping a game in Java to eventually port to the SNES1. I’m trying to make it a tactical RPG that can pan and rotate the battlefield like Final Fantasy Tactics on the PSX. Because I know I want to port this to the SNES eventually, I want my engine to reflect what it would have to do to run well on this hardware. I would like feedback on my thoughts on the graphics engine and dma limitations. If there’s some other unknown unknown I’m missing I’m welcome to that feedback as well.

I’ve never made a whole game on the SNES, but I have gone through a few hello world exercises. I have read a lot of forum posts, books, source code, etc and I believe I can make a suitably limited graphics engine in Java to prototype and carry those lessons over to the SNES engine. The Java engine is only a prototype to learn the math; the SNES will be more or less a ground up rewrite and I will be shocked if the class names survive as structure names to say nothing about literally everything else.

Graphics

Graphically the game is an isometric projected 3d modelled playfield with character (players, monsters, etc) sprites. This will be drawn by the SuperFX chip. Limited texture mapping and flat color shading are going to be supported by the engine; lighting, shadows, etc are not. Performance wise, I’m trying to target either 8 bpp colors with a 12 - 15 fps frame rate or 4bpp colors and 20 - 25 fps.

So far I’ve made a few decisions about what restrictions I can put on myself and still get good performance. From my understanding, the FX chip pixel plotter uses a caching mechanism that is optimized for long rows of pixels. I’ve implemented a scanline rasterizer so that plotting can be done on rows of pixels. To prevent overdraw and remove hidden surfaces I’m using a variation of BSP trees2. To make texturing faster, I am planning on limiting camera movement and level geometry so that I can precalculate divides3.

Another restriction is that I’m using an isometric instead of perspective projection. This has the bonus of letting me skip 3 divides per polygon. As with texturing, I’m hoping that by limiting camera movement I can precalculate the trig functions. I also hope that I can keep these geometry transformations in the 512 byte code cache, but that is Future Me’s problem.

Right now my Java prototype is using floating point math, but I know I will need to eventually switch everything to fixed point math. That’s trivial by comparison to “let’s port FFT to the SNES”.

I guess my question here is, are there some decisions that are immediately obvious that I might be missing? I’m up for general discussion even if it isn’t something that can be applicable to a isogrid tactical rpg.

DMA

I know to get data from the SuperFX framebuffer into VRAM I have to use DMA, and I can only write to VRAM under very limited circumstances. Napkin math says that full frame 192x256 4bpp will take 4 frames to transfer to VRAM if I only use VBlank. However, I’ve seen Molive’s NICC 2000 demo and know that can be made faster. I also know that you can switch the draw beam off during hblank to write full speed into vram and this breaks sprites on that scanline and perhaps other things and is still being looked into.


I guess my question here is what are my options for hitting good frame rates and what are their tradeoffs in the context of “I have a rendered frame to get into VRAM”.

TL;DR;

I’m porting Final Fantasy Tactics to the FXPack Pro. What horrible things do I need to know now while I’m prototyping the engine in Java? Feel free to get technical. If you have a bunch of cool stuff to show off and want to go into detail on that would help too. Thanks!

Appendix Background Reading

[*] SNES-NICC 200 Demo by Molive
[*] SuperFX perf discussion and demo
[*] More discussions
[*] Progress of my Java Engine
[*] Java engine source


Footnotes
1: I’m short handing here. I plan to target a cart with SuperFX running at 21Mhz and take advantage of the MSU-1 chip for data storage. Ideally it won’t need any emulator hacks beyond those, but I’m willing to live with shortcuts like overclocking the SFX chip.

2: BSP trees work great here from my understanding. They work best for draw ordering when they are paired with simple geometry. I don’t think geometry gets much simpler than early 90s console tech. exsists

3: I’ve done the napkin math and 14k of lookup table gets me ~80 camera positions and 12 “planes” I can precalculate. Precalculating divides might be irrelevant if DMA is so slow that the performance boost isn’t worthwhile or I need the rom for other things. Also I’ve lost the napkin.
User avatar
dougeff
Posts: 3079
Joined: Fri May 08, 2015 7:17 pm

Re: Design guidance for 3D Engine on SuperFX

Post by dougeff »

My math says a full screen 256x192 of 4bpp is 24576 bytes.

I think you could transfer ~10500 bytes to VRAM per frame. It would take 3 frames to transfer a full screen.

If you dropped down to 176 pixels high, you would only need 22528 bytes, and could transfer ~13000 bytes per frame, so you could do it in 2 frames.

(I think DMA does 165 bytes per scanline)
nesdoug.com -- blog/tutorial on programming for the NES
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Design guidance for 3D Engine on SuperFX

Post by lidnariq »

secondsun wrote: Thu May 21, 2020 10:36 amPerformance wise, I’m trying to target either 8 bpp colors with a 12 - 15 fps frame rate or 4bpp colors and 20 - 25 fps.
It's worth pointing out that all the textures used by PS1 FFT are 4bpp.
Another restriction is that I’m using an isometric instead of perspective projection.
It also means you might be able to use the PPU for the things it was designed for (namely, sprites and multiple layers of tiles) instead of just as a framebuffer. This might help, if the SuperFX is designed in a way such that you can actually take advantage of it...
I know to get data from the SuperFX framebuffer into VRAM I have to use DMA, and I can only write to VRAM under very limited circumstances. Napkin math says that full frame 192x256 4bpp will take 4 frames to transfer to VRAM if I only use VBlank.
You can upload 165.5 bytes per scanline that the PPU is off. The PPU consumes 128 or 256 bytes per scanline that the PPU is on. This works out to 256x144@30fps at 8bpp, or 256x184@30fps at 4bpp ... not including time for sprites, palette, or nametable updates.

(math: tepples, mine, mine again)
secondsun
Posts: 41
Joined: Tue Jul 31, 2018 9:37 am

Re: Design guidance for 3D Engine on SuperFX

Post by secondsun »

It also means you might be able to use the PPU for the things it was designed for (namely, sprites and multiple layers of tiles) instead of just as a framebuffer. This might help, if the SuperFX is designed in a way such that you can actually take advantage of it...
At a minimum the PPU will be drawing the HUD, status icons, etc the "traditional" way. I've gone back and forth with how much I want to use the PPU for drawing the character sprites and letting the SFX draw only the spinning play field. I haven't found a good way to clip the character sprites appropriately so they can be behind and/or in front of play field geometry.
Myself086
Posts: 158
Joined: Sat Nov 10, 2018 2:49 pm

Re: Design guidance for 3D Engine on SuperFX

Post by Myself086 »

character sprites should be handled by the SuperFX. Sprites on HUD by the PPU.

If your field is 4bpp and character sprites use a different palette, you can simply let the SuperFX draw a hole for the sprite and the PPU can place the character sprite behind the 3D layer.

Super Mario RPG had to deal with this in some places where priority wasn't enough, I don't remember how they did it though.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Design guidance for 3D Engine on SuperFX

Post by calima »

Probably the old NES trick? Higher priority sprite goes behind bg, blocking the lower-priority sprite's pixels, even though the lower-prio sprite is above bg.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Design guidance for 3D Engine on SuperFX

Post by Oziphantom »

Doing this with a 3D engine is not really going to work.

I think the SNES DMA limits are ~6K per VBlank assuming you have nothing else to DMA, say updating sprites for example..
You could go PAL only and that gets you ~14K + more Super FX rendering time.

Basically you should just fake this. plot the map from the 4 directions, so you have each 'tile' with 4 sides, but the tiles are mostly identical in all 4 rotations. Then you build the map for each direction so you occlude as you need to. Then sprites for the units.

FFT is basically Ogre Tactics II and Ogre Tactics is on the SNES, so you should look how it looks. It has the same isometric grid look.
User avatar
Bregalad
Posts: 8056
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Re: Design guidance for 3D Engine on SuperFX

Post by Bregalad »

First of all, such a game seems awesome ! Loving T-RPGs (Fire Emblem in particular) I'd be eager to play it.

If I understand well the problem is not clipping the sprites with a given, constant, angle such as what Tactics Ogre - Let's cling together does, but doing it at unknown angle.

I suppose the best approach is having sprites displaying properly at 0°, 90°, 180° and -90° angles, and displaying clunky at other angles which are only shown temporairly when switching from those angles; that's what FFT does. As such the sprites could be displayed as normal PPU sprites and requires no work from the SuperFX.

Making them would still be complicated, either the masking would have to be somehow computed by the SuperFX and could be acheived using either two BG layers (such as whats TO does), or using the phantom sprite technique, which sounds more flexible.

In theory there can be in the same spot an infinity of obstacles hiding an infinity of sprites, TO gets away using 2 layers because the backgrounds were pre-rendered with that in mind and not rendered on the fly.
FFT is basically Ogre Tactics II and Ogre Tactics is on the SNES
Tactics Ogre - Let's Cling together is already the sequel of Ogre Battle, and it's own sequel was released on the N64. They were supposed to be part of a 7-logy, but the other 4 games were never released.
secondsun
Posts: 41
Joined: Tue Jul 31, 2018 9:37 am

Re: Design guidance for 3D Engine on SuperFX

Post by secondsun »

Myself086 wrote: Fri May 22, 2020 1:06 pm If your field is 4bpp and character sprites use a different palette, you can simply let the SuperFX draw a hole for the sprite and the PPU can place the character sprite behind the 3D layer.
The more I think about this solution the more I like it. First, I think it solves the problem perfectly. Second, instead of sending 4bpp sprites into the SuperFX memory I can send a 1bp mask sprite and the rendering algorithm gets to stay more or less the same.
secondsun
Posts: 41
Joined: Tue Jul 31, 2018 9:37 am

Re: Design guidance for 3D Engine on SuperFX

Post by secondsun »

Bregalad wrote: Sat May 23, 2020 6:19 am First of all, such a game seems awesome ! Loving T-RPGs (Fire Emblem in particular) I'd be eager to play it.

If I understand well the problem is not clipping the sprites with a given, constant, angle such as what Tactics Ogre - Let's cling together does, but doing it at unknown angle.
Thanks! It is a long way from being playable right now, but I'll try to update this thread when I start porting the engine to the SuperFX.

You're exactly right about the problem. I think the sprite masks and clunky angles are the way to go.
93143
Posts: 1718
Joined: Fri Jul 04, 2014 9:31 pm

Re: Design guidance for 3D Engine on SuperFX

Post by 93143 »

Oziphantom wrote: Sat May 23, 2020 6:07 amI think the SNES DMA limits are ~6K per VBlank assuming you have nothing else to DMA, say updating sprites for example.
That's only if you don't use force blank to trim the top and/or bottom of the screen. You gain 165.5 bytes worth of extra DMA time for every line the PPU doesn't render (assuming you start right away - so, IRQ instead of NMI if you're trimming the bottom). This, combined with the fact that every 8-pixel reduction in screen height is one less row of tiles you need to transfer, means the achievable frame rate goes up surprisingly quickly when you letterbox.

OP is talking about a 256x192 framebuffer, presumably with black letterboxing above and below it. At 8bpp this is too large to fit in VRAM without tearing, even with fractional buffering. At 4bpp it's a very comfortable 20 fps with only extended-VBlank DMA, or an easy 30 with HBlank DMA as well (not recommended because force blank during HBlank kills sprites, and 4bpp without sprites is very limiting). It's entirely possible that the Super FX could end up being the limiting factor, as it usually is in the nominally 20 fps Star Fox...
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Design guidance for 3D Engine on SuperFX

Post by Oziphantom »

well they wrote 192x256 not 256x192. Which suggests using windows or HUD to hide the horizontal pixels.

But yes if one does 256x192 for the frame buffer, and doesn't have the lower 48 pixels filled with HUD and other status info, so you can Force Blank it. You can then get another 7K on NTSC, basically giving you PAL spec.

However I feel the limit will be the rendering, Starfox is flat shaded, while this will be textured and probably has more polygons on the screen plus masking. However this is not a fast twitch game, and sometimes the frames will remain identical for seconds. Scrolling could be done with frame buffer copies, DMA will help nicely, then you only have to render and plot the new tiles that appear. Rotating the camera will be the expensive operation, as you will need to potentially re-render the whole scene, but if it takes 8 frames to do it, then oh well. My Fire Emblem game on the Commodore 64 takes 7 frames to scroll its screen ( as it is bitmap mode) and honestly it feels fine. Sure the 2 frames on the C128 feels a lot nicer but most people play the C64 version and don't complain.

Also I feel BSP is utter overkill for this game and quadtree is all it will need.

Yes Tactics Ogre is the 2nd game in the Ogre series. Of which
March of the Black Queen was the first game released.
Then Let Us Cling Together
Then Of Lordly Caliber
Then Prince of Zenobia
Then Tactics Ogre: The Knight of Lodis (which actually is Tactics Ogre II, but it came out after FFT)

However basically speaking, Tactics Ogre and FFT was made by Yasumi Matsuno,Akihiko Yoshida,Hiroshi Minagawa,Hitoshi Sakimoto with others who didn't work on both and others who did work on both. The gameplay and style are very similar. So FFT is TO II in that it is a spiritual squeal in terms of gameplay and concepts made by most of the same team. While Of Lordly Caliber is the story sequel but has very different gameplay mechanics, but also the same team. While Ogre Tactics : The Knight of Lodis wasn't made by Matsuno.
User avatar
Nikku4211
Posts: 569
Joined: Sun Dec 15, 2019 1:28 pm
Location: Florida
Contact:

Re: Design guidance for 3D Engine on SuperFX

Post by Nikku4211 »

This is a cool idea, but I'm not so sure about prototyping on Java...

I mean, it's way too easy to miss an important limitation and then *poof* just like that, you have to deal with new limitation you never knew about and are unprepared for.

Do you expect to support the original SD2SNES, the one I have?
Myself086 wrote: Fri May 22, 2020 1:06 pm If your field is 4bpp and character sprites use a different palette, you can simply let the SuperFX draw a hole for the sprite and the PPU can place the character sprite behind the 3D layer.

Super Mario RPG had to deal with this in some places where priority wasn't enough, I don't remember how they did it though.
That's a good compromise between rendering sprites w/ PPU and rendering sprites with SuperFX. You can have proper SuperFX clipping while still having the potential for more sprite colours.

I don't get what this has to do with Super Mario RPG, though.
lidnariq wrote: Thu May 21, 2020 11:32 am
secondsun wrote: Thu May 21, 2020 10:36 amPerformance wise, I’m trying to target either 8 bpp colors with a 12 - 15 fps frame rate or 4bpp colors and 20 - 25 fps.
It's worth pointing out that all the textures used by PS1 FFT are 4bpp.
In one screen, do they all use the same 4BPP palette, or do they all use different 4BPP subpalettes?
lidnariq wrote: Thu May 21, 2020 11:32 am You can upload 165.5 bytes per scanline that the PPU is off. The PPU consumes 128 or 256 bytes per scanline that the PPU is on. This works out to 256x144@30fps at 8bpp, or 256x184@30fps at 4bpp ... not including time for sprites, palette, or nametable updates.
I know 2BPP would be only 4 colours for the entire screen, effectively limiting it to a Super Game Boy palette or a Flipnote Studio palette, but how much of the screen can you do at 30fps with this method and a 2BPP framebuffer? How much can do you at 60fps? And would limiting the FPS to 25 instead of 30 even on NTSC screens help anything?
Oziphantom wrote: Sun May 24, 2020 2:28 am well they wrote 192x256 not 256x192. Which suggests using windows or HUD to hide the horizontal pixels.

But yes if one does 256x192 for the frame buffer, and doesn't have the lower 48 pixels filled with HUD and other status info, so you can Force Blank it. You can then get another 7K on NTSC, basically giving you PAL spec.
The SNES' vertical resolution is 224 though, so it might be a typo. But yeah, if you don't mind some pillarboxing, you could have the thing output to a 192x224, which would almost look like a square if you took the 8:7 -> 4:3 stretch into account.

Letterboxing does have the benefit of being able to zoom in to the screen on a 16:9 HDTV without losing any content whatsoever, if the content itself is further letterboxed into a 16:9 frame, but 16:9 HDTV users might be able to tolerate 192x224 pillarboxing if they can already tolerate 4:3 pillarboxing.

And yeah, I know it's weird thinking about LCD TVs when we're talking about a SNES homebrew concept, but yeah, people who play retro games on LCD TVs (when they don't have CRTs) exist and should be considered.
I have an ASD, so empathy is not natural for me. If I hurt you, I apologise.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Design guidance for 3D Engine on SuperFX

Post by lidnariq »

Nikku4211 wrote: Sun May 24, 2020 5:14 pm In one screen, do [all textures in PS1 FFT] all use the same 4BPP palette, or do they all use different 4BPP subpalettes?
A bunch of things use the same palette. If you have access to the CD, you can just snoop around. There's some palette swaps too.

I wouldn't be surprised if it turns out that FFT ends up using less than 16 total master palettes. But I haven't really checked...
I know 2BPP would be only 4 colours for the entire screen, effectively limiting it to a Super Game Boy palette or a Flipnote Studio palette, but how much of the screen can you do at 30fps with this method and a 2BPP framebuffer? How much can do you at 60fps? And would limiting the FPS to 25 instead of 30 even on NTSC screens help anything?
Non-integer divisions of the source frame rate may not look good. Perhaps 2.5 would be ok (24fps), just like conventional film telecining, but the judder may be perceptible.

In any case, the math I outlined before it still true, just with different constants:

8bpp @ 30Hz:
X*256 bytes consumed per scanline during redraw
x*256/2 bytes consumed across two redraws because each byte is used twice
(524-X)*165.5 = x*128
x=295 scanlines across two redraws
x=147 scanlines each redraw

4bpp @ 30Hz:
X*128 bytes consumed per scanline during redraw
((262*2)-X)*165.5 = x*128/2
x=377 scanlines across two redraws
x=188 scanlines each redraw

2bpp @ 30Hz:
x*64 bytes consumed per scanline during redraw
((262*2)-X)*165.5 = x*64/2
x = 439 scanlines across two redraws
x = 219 scanlines each redraw

2bpp @ 60Hz:
((262*1)-X)*165.5 = x*64
x = 188 scanlines

2bpp @ 50Hz:
((312*1)-X)*165.5 = x*64
x = 224 scanlines

In general:
((scanlines per field)*(frame rate divider)-X)*165.5 = x*(bytes per scanline)÷(frame rate divider)
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Design guidance for 3D Engine on SuperFX

Post by Oziphantom »

oh yeah NTSC is only 224 not 240..

So dropping to 192 gets you 32 lines, which if you are not putting other stuff in such as HUD etc, then you get ~5K extra.. giving you 11K per frame, not even PAL spec.
Post Reply