Fast 3D blitting on Super FX

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
ARM9
Posts: 57
Joined: Sun Aug 11, 2013 6:07 am

Fast 3D blitting on Super FX

Post by ARM9 »

93143
Posts: 1715
Joined: Fri Jul 04, 2014 9:31 pm

Re: Fast 3D blitting on Super FX

Post by 93143 »

Ah. You have at least a rudimentary 3D engine running on the Super FX. Good stuff. I'm afraid 3D is still mostly voodoo to me; flat perspective in Mode 7 is about as far as I've gone...

How much compute time does that scene take? I can't imagine the chip is anywhere near pegged... Also, why does pressing Start cause the screen to momentarily black out?

Are you working on a game?
ARM9
Posts: 57
Joined: Sun Aug 11, 2013 6:07 am

Re: Fast 3D blitting on Super FX

Post by ARM9 »

The frame rate depends heavily on the amount of polygons that actually get drawn, the rest is relatively fast.
I switched to 2bpp and it's just about breaking 60fps if the objects don't get too large, you can see it dip down to 30 as they get closer to the camera.
https://www.dropbox.com/s/448s5i5wi52qq ... s.sfc?dl=0

When you press start it just resets the framebuffer.

That was actually an older engine. I wrote a better engine recently for a thing I'm working on https://www.youtube.com/watch?v=MjgdSikMXSA
It's rendering too fast since I had to record with no$sns (30fps, actual frame rate is 15). You can just set the playback to 0.5 for the full superfx experience.
This model has around 300 polygons which isn't too bad, 20fps should be doable even with simple lighting.

One problem with lighting at 4bpp is the palette limit, 15 colors just isn't cutting it for these scenes.
Since fill rate is the main bottleneck, and 8bpp doubles the amount of time that'll take, if there's a fast way to build the palette and map each frame after determining which tiles need what colors, you could do up to 127 colors at 4bpp.
Could just cut the resolution and/or start targeting PAL to offset the increased DMA.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Fast 3D blitting on Super FX

Post by Drew Sebastino »

ARM9 wrote:The frame rate depends heavily on the amount of polygons that actually get drawn, the rest is relatively fast.
Wait, really? I would have thought it would be the size. I understand that even the "relatively fast" stuff still has the potential to be enough to put it over the edge, but you also said the framerate drops when the objects are closer to the camera.

How do you program filling in flat-shaded polygons? Is it possible to look at the left side of the polygon, then the right side on the same row of pixels, do a loop to fill out the row, and then move down one and check the bounds of the polygon again?
User avatar
Bregalad
Posts: 8055
Joined: Fri Nov 12, 2004 2:49 pm
Location: Divonne-les-bains, France

Re: Fast 3D blitting on Super FX

Post by Bregalad »

Heh, it's pretty cool, congratulations ! How can you test it on the hardware ?
KungFuFurby
Posts: 275
Joined: Wed Jul 09, 2008 8:46 pm

Re: Fast 3D blitting on Super FX

Post by KungFuFurby »

Sometimes I wonder if under certain circumstances (especially if movement is small), it would be speedier to determine which pixels have actually changed and simply modify those instead? Just thinking of theory, that's all.
ARM9
Posts: 57
Joined: Sun Aug 11, 2013 6:07 am

Re: Fast 3D blitting on Super FX

Post by ARM9 »

Espozo wrote: Wait, really? I would have thought it would be the size. I understand that even the "relatively fast" stuff still has the potential to be enough to put it over the edge, but you also said the framerate drops when the objects are closer to the camera.
It'd be more efficient to render one large polygon than the same area comprised of several smaller polygons.
The larger a polygon gets the longer it'll take to draw, and with fill rate being the largest bottleneck you want to draw as few of them as possible. Given a sufficiently complex scene you're likely to draw on most of the screen and you'll have a lot of overdraw unless you discard as many polygons as possible.

It's hard to sustain 60fps at a decent resolution, I could push a few more polygons with the new engine but there's a hard cap on the amount of pixels that the hardware can draw in the allotted time.
  • Some numbers, 21mhz 224x192 2bpp @ 60fps ntsc (assuming all loops are cached):
  • You have about 257,500 cycles to work with each frame on the superfx.
  • Plotting 8 pixels without crossing a pixel cache (pcache) boundary (when x wraps to 0 in `x mod 8`) takes 10 cycles (16 with LOOP;PLOT sequence used for span fill).
  • A pcache miss stalls for 10 cycles.
  • Clearing the framebuffer takes roughly 53,760 cycles.
  • Filling the entire screen using LOOP;PLOT sequence takes about 86,000 cycles in the best case (plot x from 0-223 on each line).
The latter two consume over 50% of the cycles we're working with. But in that time you also have to copy, transform and project your polygons, do visible surface determination (which is an entire rabbit hole in itself, I've yet to find an optimal solution for arbitrary geometry on the superfx) and clipping.
Due to the nature of a 3D scene you're unlikely to reach the optimal 86,000 raster cycles even when not filling the entire screen. Lots of pcache misses, some overdraw is unavoidable without perfect vsd (zbuffer is expensive in time and impractical in space, coverage buffer is potentially expensive both in time and space).
Espozo wrote: How do you program filling in flat-shaded polygons? Is it possible to look at the left side of the polygon, then the right side on the same row of pixels, do a loop to fill out the row, and then move down one and check the bounds of the polygon again?
That's the gist of the algorithm, determine the span of each row in a polygon and plot them. I use the slope of the edges to determine the bounds.
Bregalad wrote:Heh, it's pretty cool, congratulations ! How can you test it on the hardware ?
Hey, thanks! Just sacrifice a copy of your favourite superfx game and replace the rom. The old engine only uses 32K ram so no need to download more ram.
The new engine uses 64K so far, might end up using 128K.
KungFuFurby wrote:Sometimes I wonder if under certain circumstances (especially if movement is small), it would be speedier to determine which pixels have actually changed and simply modify those instead? Just thinking of theory, that's all.
With coverage buffers you could clear just the areas that were drawn last frame. But I'm not convinced it'd be faster than a `store word` loop in my case since planar format mandates using the slightly slower PLOT instruction to clear spans, and scenes fill most of the screen. Any ideas?
93143
Posts: 1715
Joined: Fri Jul 04, 2014 9:31 pm

Re: Fast 3D blitting on Super FX

Post by 93143 »

ARM9 wrote:That was actually an older engine. I wrote a better engine recently for a thing I'm working on
This model has around 300 polygons which isn't too bad, 20fps should be doable even with simple lighting.
Now that's what I'm talking about. What sort of differences are there between the old engine and the new one?

I don't mean to disparage the older demo, because it's still cool to see a hobbyist get a real 3D engine running on the Super FX. But the fact remains that if you ignore fill rate, 300 triangles at 15 fps (or, better yet, 20 fps with lighting) is just way more impressive than 18 triangles at 30 fps... Fill rate is kinda important, though...

Anybody have reasonably hard numbers (within an order of magnitude or so would be nice) on polygon count in Star Fox 2? Virtua Racing? The answers I get from the internet seem to have a log-random distribution, though I admit I find some numbers more plausible than others...
One problem with lighting at 4bpp is the palette limit, 15 colors just isn't cutting it for these scenes.
Yeah, that's why Star Fox limits itself to orange, blue, and gray. You can have five shades of each, plus dither.

I found this out when I was messing with the idea of a TIE Fighter port. Gray, blue, red, green, and yellow, with engine glow cycle in three of those and frame rate being very important for gameplay - the lighting in that game would have looked rough...

Stunt Race FX looks much more balanced, because it's 8bpp, which is probably what murdered the frame rate...
Just sacrifice a copy of your favourite superfx game and replace the rom.
I'm hoping an FPGA Super FX gets developed at some point. I'll probably end up ruining a GSU2 cartridge (or at least hiring somebody to do it for me) to make a devcart, so I can be sure my game is real, but making additional copies would get increasingly unjustifiable without a sustainable source of chips.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Fast 3D blitting on Super FX

Post by Drew Sebastino »

93143 wrote:Yeah, that's why Star Fox limits itself to orange, blue, and gray. You can have five shades of each, plus dither.
How possible would it be to have 64 colors via a 4bpp layer and a 2bpp layer with color math (possibly even just varying shades of gray with color subtraction)? I think that would be a good compromise between the number of colors and performance, but I don't know if having two buffers like that will actually hurt performance even if it reduces the amount of data you have to transfer to vram.
Stef
Posts: 263
Joined: Mon Jul 01, 2013 11:25 am

Re: Fast 3D blitting on Super FX

Post by Stef »

ARM9 wrote:
  • Some numbers, 21mhz 224x192 2bpp @ 60fps ntsc (assuming all loops are cached):
  • You have about 257,500 cycles to work with each frame on the superfx.
  • Plotting 8 pixels without crossing a pixel cache (pcache) boundary (when x wraps to 0 in `x mod 8`) takes 10 cycles (16 with LOOP;PLOT sequence used for span fill).
  • A pcache miss stalls for 10 cycles.
  • Clearing the framebuffer takes roughly 53,760 cycles.
  • Filling the entire screen using LOOP;PLOT sequence takes about 86,000 cycles in the best case (plot x from 0-223 on each line).
Cool to have these numbers i can compare them to numbers i have for my MD bitmap rendering code.
I don't understand why you said you have only 257,500 cycles to work with on each frame ? shouldn't be 21Mhz / 60 ~350,000 cycles ? As you have double buffering i though you could use the SFX while the frame buffer is being transferred ?

In my case i'm working with a 256x160 resolution 4bpp software bitmap buffer (need bitmap --> tile conversion).
I've a bit less pixels than you (~41000 pixels compared to ~43000 pixels) but i use 4bpp which i think is a minimum and can work well with a good palette (as in Starfox). I guess than for 4bpp you need to divide your numbers by 2 (not sure about the PLOT instruction ?).

Anyway here are the raw numbers in the MD case:
  • You have about ~127800 cycles per frame at 60 FPS but realistically on NTSC systems we can assume 20 FPS as the maximum possible with software bitmap transfer to VRAM.
    --> ~383500 cycles per frame @20 FPS NTSC
  • Clearing the framebuffer takes about 43000 cycles.
  • Transferring (and converting) the framebuffer to VRAM takes about 123000 cycles.
  • Plotting 8 pixels takes 12 cycles.
  • A minimum of 200-250 cycles by polygon horizontal line (filling edges and handling lines loop)
Add to that :
  • 3D transformation calculation :
    --> can transform with 2D projection about ~10000 vertices / seconde = ~500vertices @20 FPS.
    As you also need to do the 3D rendering realistically you can't obtain this number of course ;)
  • Polygon sorting and BSP handling for correct draw order
  • Clipping (can consume a lot of multiplication / division operations)
  • All the game logic and other stuff
So if you want to keep 20 FPS, you have to work with only 383500 - (43000 + 123000) ~ 217 500 cycles per frame.
These 217 500 cycles should handle:
- 3D transformation
- polygon sorting
- polygon clipping
- polygon rendering
- game logic and other stuff

That's definitely not a lot to work with... Of course you can accept frame drop but i think it start to hurt when you go below 10 FPS.
Last edited by Stef on Mon Mar 20, 2017 9:58 am, edited 1 time in total.
93143
Posts: 1715
Joined: Fri Jul 04, 2014 9:31 pm

Re: Fast 3D blitting on Super FX

Post by 93143 »

Stef wrote:I don't understand why you said you have only 257,500 cycles to work with on each frame ? shouldn't be 21Mhz / 60 ~350,000 cycles ? As you have double buffering i though you could use the SFX while the frame buffer is being transferred ?
No. If that's what you meant specifically, I misled you.

It takes multiple frames to transfer the data in most cases, and you can have the Super FX working on the next frame before the current one is done transferring. But that only refers to the time between the actual DMA transfers, which is most of the frame, but not all of it.

There's only one RAM pool, and if the SNES is accessing it, the Super FX can't. 224x192 at 2bpp 60 fps is about 65 lines worth of DMA, or about a quarter of a frame, and the GSU can't touch the framebuffer during that time.
Last edited by 93143 on Mon Mar 20, 2017 10:03 am, edited 1 time in total.
Stef
Posts: 263
Joined: Mon Jul 01, 2013 11:25 am

Re: Fast 3D blitting on Super FX

Post by Stef »

Thanks for the clarification. I though the SFX RAM could be split in 2 banks of 32 KB so you could have one bank on SFX side while the other bank was on S-CPU side (as the word RAM in the Sega CD, that is really convenient).
So the 257,500 cycles comes from the cycles eaten by the DMA transfer i guess.
Oziphantom
Posts: 1565
Joined: Tue Feb 07, 2017 2:03 am

Re: Fast 3D blitting on Super FX

Post by Oziphantom »

Why do you clear the frame buffer, if you are rendering over the screen already then no point wasting cycles to clear it. Or if you are only rendering part of the screen adding a skybox plane to render on the overlap? Gets you a bunch of clocks back.

Can you use a SuperFX with a SA-1? If so using the SNES CPU to do AI/Game logic. Get the SA-1 to do clipping,draw order,backface culling of the triangles, then pump them to the Super-FX for rendering might give a hefty boost.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Fast 3D blitting on Super FX

Post by Drew Sebastino »

How they're hooked to the cartridge bus, I doubt you could do it. Plus, in real life, the power draw is probably too great for the SNES.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Fast 3D blitting on Super FX

Post by tepples »

If you're willing to hook up two coprocessors, why not just use an ARM SoC? There's precedent for using older ARM coprocessors in shogi games, and there are modern ones that sip well under a watt of power.

See Atmel microchips with an ARM core
Post Reply