SuperRT - realtime raytracing expansion chip

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
ShironekoBen
Posts: 6
Joined: Mon Dec 14, 2020 5:05 am

SuperRT - realtime raytracing expansion chip

Post by ShironekoBen » Tue Dec 15, 2020 7:38 am

(this is my first post, so apologies in advance and please do just let me know if I've inadvertently broken any forum rules!)

I'm something of a fan of doing silly things with retro hardware, and as the SNES holds a very special place in my heart, whilst casting around for ideas for a project to do as a way of learning FPGA design I had the idea of building a custom expansion chip... and a bit over a year later, I've finally got some results that feel like they might be worth sharing.

Here's my test code running on a SNES with the chip:
(apologies for the poor screenshot quality - I don't have a good way to capture from my SNES at the moment)

Image

Basically, it's a chip design implemented using a Cyclone V FPGA interfaced to the SNES cartridge bus. It gets sent a command list from the SNES containing a description of the scene to render, and then generates a framebuffer in PPU-compatible format that gets DMAed into VRAM (shoutouts to everyone in this thread for inspiration there!).

It supports spheres, planes and convex hulls (composed of multiple planes) as rendering primitives, with CSG supported to build more complex shapes and hierarchical culling via AABBs. The raytracing itself handles one reflection ray per pixel and one shadow-casting directional light. I've tried to at least loosely stick to the concept of "this is something that could potentially have been built in the 90s (if you didn't care too much about cartridge cost!)", so it doesn't use any external processing hardware aside from the FPGA itself.

The demo scene runs at about 20FPS, although the system as a whole can do 30FPS under the right conditions (the demo scene is actually more like 25FPS normally, but I had to add a somewhat unwanted VSync wait in the SNES-side code to work around a curious bug I was getting with interrupts during the command upload code that dragged it down).

If anyone is interested, I've posted some footage of it running and a bit more technical detail here:

Short trailer (Youtube)
Walkthrough and technical details (Youtube)
Technical details (text)

Please let me know if you have any questions/thoughts/comments/etc!

lidnariq
Posts: 10265
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: SuperRT - realtime raytracing expansion chip

Post by lidnariq » Tue Dec 15, 2020 1:43 pm

It feels delightfully like an SGI demo from 1992 :)

(Yeah, I know SGI was doing rasterization, not raytracing)

User avatar
Nikku4211
Posts: 381
Joined: Sun Dec 15, 2019 1:28 pm
Location: Bronx, New York
Contact:

Re: SuperRT - realtime raytracing expansion chip

Post by Nikku4211 » Tue Dec 15, 2020 4:01 pm

Finally, a thread in the SNESDev subforum that's actually interesting for once.

Something like this makes me have plenty of questions, so sorry if I appear annoying.

Is it possible to use this chipset as a standard rasteriser? If so, how much faster would it be at rasterisation than the Super FX 2 would be? I have heard that raytracing is much slower than rasterisation.

Is it possible to make a watered down version that's compatible with the non-Pro SD2SNES'/non-Pro FXPak's FPGA?

Is it possible to change calculations to take into account the extra stretch that the SNES does to the internal screen? It seems like each pixel on the SNES in 256x224 mode is pretty wide, with the original 8:7 internal aspect ratio being stretched to 64:49, making your spheres look less like spheres.

If you chose 200x160 as a compromise between image size and v-blank time, why not chose a resolution that uses all 256 horizontal pixels, but also uses a smaller vertical resolution? It does give you a letterbox, but hey, at least there's no pillarbox. Also, using a smaller vertical resolution like in 256x120 gives you even more v-blank time, and using that with all the horizontal resolution utilised also allows you to zoom in with a modern widescreen HDTV and not miss anything, since it's letterboxed anyway. Unless you're also using HDMA and you need more h-blank time too.

What kind of dithering does your chipset do? What do you think about making it just column-based dithering? Column-based dithering would take more advantage of how some video cables like composite have a blurry horizontal resolution, while probably not being as noticeable as checkerboard dithering.

This demo is running in mode 3, right? 200x160 does require 500 unique tiles, after all, so it can't be mode 7. If your chipset were to convert to mode 7's chunky pixel format rather than mode 3's planar format, and the resolution would be 128x80, which would have 160 unique tiles, and the tilemap is scaled using mode 7's innate scaling abilities to 256x160, would the framerate be any better since there's only 1/4 as much pixels to render and chunky 8BPP is generally faster to render to?

Would using 256 colour internally instead of 16 million be faster for your chipset, too, if possible?

I wonder how a demo with an external ROM would be like. Would you be able to do actual textures in it? If so, how would it be compared to how Super FX 2 does textures? Is it possible to do affine texture mapping with raytracing? If not, is it possible to combine rasterised textures with raytraced CSGs?

How fast is your chipset's SNES multiply unit compared to the SNES' own multiplication? I heard that you can only do native SNES multiplication during v-blank. Is that true only when you're using mode 7, or is that true for every video mode?

Thanks for reading this post.
I have an ASD, so empathy is not natural for me. If I hurt you, I apologise.

93143
Posts: 1314
Joined: Fri Jul 04, 2014 9:31 pm

Re: SuperRT - realtime raytracing expansion chip

Post by 93143 » Tue Dec 15, 2020 5:51 pm

EDIT: Never mind; apparently there's some misinformation on the internet regarding what FPGA the FXPak Pro uses...

Nikku4211 wrote:
Tue Dec 15, 2020 4:01 pm
I heard that you can only do native SNES multiplication during v-blank. Is that true only when you're using mode 7, or is that true for every video mode?
The PPU multiplier (16x8 signed, results ready immediately) should not be used during Mode 7 rendering. The most obvious reason is that the input registers are also Mode 7 matrix registers, but there may be other disruptive effects (restoring the registers before the next HBlank might work, but I haven't tried it and it might not. Others may know more).

The S-CPU has an 8x8 unsigned multiplier that takes 8 cycles to get a complete result*, and you can use it any time regardless of BGMODE. Same with the divider (16/8 unsigned, 16 cycles to result). This is because these slower math functions are part of the CPU and are not shared by any other system function.

* If you only ever multiply small numbers, you can read the result sooner, as demonstrated by (IIRC) Taz-Mania. Since we now know it uses the Booth algorithm, it may be possible to do this deliberately in homebrew, but it's still easier to wait.

User avatar
Nikku4211
Posts: 381
Joined: Sun Dec 15, 2019 1:28 pm
Location: Bronx, New York
Contact:

Re: SuperRT - realtime raytracing expansion chip

Post by Nikku4211 » Tue Dec 15, 2020 9:30 pm

93143 wrote:
Tue Dec 15, 2020 5:51 pm
The PPU multiplier (16x8 signed, results ready immediately) should not be used during Mode 7 rendering. The most obvious reason is that the input registers are also Mode 7 matrix registers, but there may be other disruptive effects (restoring the registers before the next HBlank might work, but I haven't tried it and it might not. Others may know more).
So if you're using a video mode that's not 7, you can use the PPU multiplier at any time?
I have an ASD, so empathy is not natural for me. If I hurt you, I apologise.

93143
Posts: 1314
Joined: Fri Jul 04, 2014 9:31 pm

Re: SuperRT - realtime raytracing expansion chip

Post by 93143 » Tue Dec 15, 2020 10:18 pm

Yes. Mode 7 is the only restriction.

The S-PPU's multiplication capability exists solely because it's needed for Mode 7. It isn't used for any other rendering tasks, so a lot of the time it's sitting idle. I guess someone at Nintendo figured the idle time was free real estate, and added MMIO access for the CPU...
Last edited by 93143 on Tue Dec 15, 2020 10:38 pm, edited 1 time in total.

User avatar
aa-dav
Posts: 154
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: SuperRT - realtime raytracing expansion chip

Post by aa-dav » Tue Dec 15, 2020 10:38 pm

Great job!
I want to repost one of comments under ArsTechnica article ( https://arstechnica.com/gaming/2020/12/ ... 1&unread=1 )
guai2k
One small curiosity: the photo caption cites Carter as a "Japanese engineer," while his own website describes him only as "based in Japan." If the guy is actually Japanese, I'd be curious to hear the story of how he ended up with the rather un-Japanese name "Ben Carter."
Looking at your nick here I guess your true name is japanese one. So I join to this "one small curiosity". :)

93143
Posts: 1314
Joined: Fri Jul 04, 2014 9:31 pm

Re: SuperRT - realtime raytracing expansion chip

Post by 93143 » Tue Dec 15, 2020 10:41 pm

Did you watch the video? British accent, "SNEZZZZZ", named Ben Carter... I'd say he's probably a gaijin, and I suspect I could make a fair guess at his actual national origin. "Shironeko" isn't a Japanese name AFAIK; it just means "white cat", and Shironeko Labs appears to be the name of his consulting business.
Last edited by 93143 on Wed Dec 16, 2020 1:57 am, edited 2 times in total.

User avatar
aa-dav
Posts: 154
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: SuperRT - realtime raytracing expansion chip

Post by aa-dav » Tue Dec 15, 2020 10:45 pm

93143 wrote:
Tue Dec 15, 2020 10:41 pm
Did you watch the video? British accent, "SNEZZZZZ", named Ben Carter... I'd say he's probably a gaijin. "Shironeko" isn't a Japanese name AFAIK; it just means "white cat".
My first language isn't english and I do not know japanese at all, so I had no chances to recognize these things. :)

User avatar
Gilbert
Posts: 451
Joined: Sun Dec 12, 2010 10:27 pm
Location: Hong Kong
Contact:

Re: SuperRT - realtime raytracing expansion chip

Post by Gilbert » Wed Dec 16, 2020 1:27 am

93143 wrote:
Tue Dec 15, 2020 10:41 pm
I'd say he's probably a gaijin
Also, reading his article on the technical specs should be a give away:
The Super Nintendo (technically a Super Famicom) seen here has had the case removed to make room for the cabling, but other than that is totally unmodified. Attached to it is the PCB from a copy of an awful Pachinko game I picked up for 100 yen at a local second-hand store, with the game ROM removed and replaced with a cable breakout.
1. He's actually doing the tests on a Super Famicom but intentionally calls it a SNES. If he's Japanese, just write Super Famicom or SFC.
2. He spent Yen to buy a cart from a local store, so he does live in Japan atm.

User avatar
rainwarrior
Posts: 8006
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: SuperRT - realtime raytracing expansion chip

Post by rainwarrior » Wed Dec 16, 2020 1:42 am

Interesting!

93143
Posts: 1314
Joined: Fri Jul 04, 2014 9:31 pm

Re: SuperRT - realtime raytracing expansion chip

Post by 93143 » Wed Dec 16, 2020 1:57 am

ShironekoBen wrote:
Tue Dec 15, 2020 7:38 am
Please let me know if you have any questions/thoughts/comments/etc!
I wonder if it's possible to extend this with some stuff that might make it more generally applicable.

First, would it be possible to support texture mapping? Reflectivity mapping? Normal mapping or some form of bump mapping? (Getting greedy here... but perhaps normal mapping could be a way to integrate sprites for high-detail actors without an unreasonable primitive count? It might be hard to get shadowing to work properly with a normal-mapped sprite...)

It appears to me that objects can have inherent non-black colours when shadowed. This could presumably be sufficient to create the impression of diffuse lighting, at least as far as being able to see things at all in a shadowed area. Actual diffuse lighting is probably too expensive, but what about multiple light sources? Local (inverse-square), coloured light sources? Could a light source be limited to a defined volume, to avoid the cost of running several sources for the whole scene? (This could also allow flashlights and such without having to physically obstruct the source in order to form the beam... uh, how hard would it be to add cones to the primitive list?) As I understand your scheme, right now it would break on something as simple as a tunnel in a racing game.

...

How about the ability to include partly transparent objects, both scattering/absorptive and additive? With variable intensity over the volume of the primitive, to produce realistic glow effects or fog/smoke?

An additive glow object seems like it would be easier than a scattering object because it wouldn't have to be illuminated by light sources itself; it would only adjust an observer ray passing through it. Unfortunately it would look fake if it didn't also function as a light source, which goes back to the multiple light sources question (raising the compute requirements quickly). But even without extra light sources, you could do stuff like lasers and engine glow on a starfighter, as long as it's sunny enough that you don't miss the extra illumination.

A fog/smoke object would be easy enough as far as the observer ray is concerned, since you could simply compute the length of the traversal and the distance from the centroid to get the opacity. Similarly, light source attenuation (soft shadowing) could be accomplished this way. Perhaps this would be easier for a sphere, so maybe limiting fog objects to spheres (or ellipsoids?) would be wise. Or a 2D primitive perpendicular to the ray could be used, with either an opacity map (to add detail) or a simple relationship between opacity and nearest approach to centroid. Lots of fog objects would still get expensive, I suppose... I imagine you'd want to cull light source rays based on solid occlusion before trying to apply fog... and what happens if the fog primitive intersects a solid primitive? That eliminates some easy shortcuts for the traversal length calculation...

It would be difficult to illuminate a fog object because the depth factor would seem to require multiple rays per pixel, killing the computational efficiency. Perhaps a 2D primitive facing the light source, with the result somehow mapped to a camera-facing 2D primitive? Maybe cache the illumination as a texture (1bpp? or bake in the opacity?) at a lower resolution (global grid? camera-relative?) and integrate with interpolation, applying a distance scaling factor for lights closer than infinity. But how would you handle solids intersecting the fog? Make a new texture for every primitive encountered, apply it behind the plane of maximal extent of said primitive, and attenuate the illumination based on the observer ray's traversal length? Getting ugly... Or maybe it would look okay with only one illumination check per fog object (so much for using a single huge ellipsoid to vfog a whole area)...

Are the primitives simple enough that a set of explicit geometric shadow projections (instead of low-res rasterized textures) could be built up in a stack, in such a way as to allow a relatively easy calculation of the integrated amount of illuminated fog on an observer ray, taking into account a simple analytic density distribution? With scattered light attenuated based on how much fog it has passed through if not shadowed? And what happens when this insanely complicated blob of fog gets in front of another one? Intersects another one? Maybe the cached texture idea was better...

I've got no idea what I'm on about, have I?

Surely N64-style global distance fogging would be easy? Simply adjust the colour of the ray based on how far it's travelled (at every bounce, so reflections would be correctly double-fogged)... maybe allow an adjustment for height or angle, or use some form of fast integral or lookup table to emulate atmospheric altitude effects or fog layers... (I'm a greedy bastard, and my games would run at 3 fps...)

...

Having variable frame size could be useful. The Super FX allowed multiple framebuffer sizes, and there was no requirement for the SNES to actually use the whole thing, or even for the Super FX to render to the whole thing.

If my calculations are correct, it should be possible to fractional-buffer 216x176 at 20 fps and still have a bit of VRAM left over for sprites. If you used HDMA for VRAM transfers, you could get 224x168 at 30 fps by expending all 8 HDMA channels (24 bytes per line) and restricting sprites to only 16 or so of the active scanlines. (VRAM HDMA requires you to force blank during the transfer, which kills sprites on the next line.) These numbers assume fully contiguous data, so you don't have to keep stopping to adjust the source address like you do on Super FX if you're using an image height it doesn't support...

...

It's a shame this project appears to be out of reach for the FXPak Pro (Cyclone IV if I'm not mistaken)...
Last edited by 93143 on Wed Dec 16, 2020 6:16 am, edited 10 times in total.

User avatar
Nikku4211
Posts: 381
Joined: Sun Dec 15, 2019 1:28 pm
Location: Bronx, New York
Contact:

Re: SuperRT - realtime raytracing expansion chip

Post by Nikku4211 » Wed Dec 16, 2020 3:44 am

guai2k wrote: One small curiosity: the photo caption cites Carter as a "Japanese engineer," while his own website describes him only as "based in Japan." If the guy is actually Japanese, I'd be curious to hear the story of how he ended up with the rather un-Japanese name "Ben Carter."
Oh my gosh...

When you immigrate to Japan, you are required to drop all of your citizenship for other countries, so as far as the present and the future is concerned, he's Japanese full-stop.

I've heard that actual Japanese names like 'Joji' can be transliterated into 'George', and that 'Ami' can be transliterated into 'Amy'.
93143 wrote:
Tue Dec 15, 2020 10:41 pm
Did you watch the video? British accent, "SNEZZZZZ", named Ben Carter... I'd say he's probably a gaijin, and I suspect I could make a fair guess at his actual national origin. "Shironeko" isn't a Japanese name AFAIK; it just means "white cat", and Shironeko Labs appears to be the name of his consulting business.
I match 2 of those traits and I'm from New York.

Imagine naming your kid 'Shironeko'. Only a weeb would do that, lol. It's like naming your kid 'Reddog' or 'Bluelizard'.

And lol, I wasn't able to tell if he sounded UK or Aussie.

If he did immigrate from somewhere to Japen, which is most likely to be honest, it makes me question why, but it's probably not my business, and it's probably none of yours either.
93143 wrote:
Wed Dec 16, 2020 1:57 am
If my calculations are correct, it should be possible to fractional-buffer 216x176 at 20 fps and still have a bit of VRAM left over for sprites. If you used HDMA for VRAM transfers, you could get 224x168 at 30 fps by expending all 8 HDMA channels (24 bytes per line) and restricting sprites to only 16 or so of the active scanlines. (VRAM HDMA requires you to force blank during the transfer, which kills sprites on the next line.) These numbers assume fully contiguous data, so you don't have to keep stopping to adjust the source address like you do on Super FX if you're using an image height it doesn't support...
Ooh, sprites, what are you going to do, make a HUD?

If you're talking about actually putting sprites in the 3D environment, good luck clipping them.

Unless you mean software sprites.
I have an ASD, so empathy is not natural for me. If I hurt you, I apologise.

93143
Posts: 1314
Joined: Fri Jul 04, 2014 9:31 pm

Re: SuperRT - realtime raytracing expansion chip

Post by 93143 » Wed Dec 16, 2020 4:07 am

Nikku4211 wrote:
Wed Dec 16, 2020 3:44 am
And lol, I wasn't able to tell if he sounded UK or Aussie.
On second listen... good point. I'm bad at accents. He's not from my home town, I can tell you that...
Ooh, sprites, what are you going to do, make a HUD?
Yeah. You wouldn't need much. Lives, camera status, ammo, radio messages, that sort of thing. 16 lines isn't really enough for a radar...

I suppose you could use a second BG layer, but you'd need to set aside some space for the tilemap. You've got several kilobytes left in the 224x168 30 fps case with VRAM HDMA, so that's actually pretty feasible. Bandwidth is really tight though; only a couple hundred bytes left once the palette is updated if you leave 16 lines free for sprites, or about a kilobyte if you don't...

The 216x176 20 fps case without VRAM HDMA is much tighter VRAM-wise (a couple of KB free), but HDMA is free so you can use elaborate windowing, and sprites will work on all scanlines; also there's more DMA bandwidth margin. Also you could restrict BG2 to a limited range of scanlines and turn it off outside that range, to avoid needing a large tilemap.

...

Of course, if we're trying to introduce multiple light sources and translucent particle effects with volumetric illumination, the lower the target frame rate the better. If Star Fox is any indication (or Ocarina of Time for that matter), 20 fps is plenty...

Bgonzo
Posts: 2
Joined: Wed Dec 16, 2020 8:52 am

Re: SuperRT - realtime raytracing expansion chip

Post by Bgonzo » Wed Dec 16, 2020 9:04 am

[/quote]
It's a shame this project appears to be out of reach for the FXPak Pro (Cyclone IV if I'm not mistaken)...
[/quote]

I'm pretty sure the FXpak Pro is a Cyclone V. But I don't know enough about FPGAs to know if it can be done. I see in his technical info he is using a DE10-Nano with a Cyclone V, and he says he doesn't use the ARM cores and only uses the FPGA. So I dunno.

Post Reply