SNES Doom development, Randy Linden

You can talk about almost anything that you want to on this board.

Moderator: Moderators

coto
Posts: 102
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

SNES Doom development, Randy Linden

Post by coto »

https://www.youtube.com/watch?v=P5PknJvplKg

Randy Linden interview about Doom Snes port and the SuperFX / GSU co-processor

Nonetheless it's something fascinating.
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom development, Randy Linden

Post by 93143 »

Hopefully more fascinating stuff to come: https://github.com/RandalLinden/DOOM-FX

Not much there yet but the GPL...

I kinda want to take a look at the wall rendering code. Rendering vertical lines one by one (as I believe the original Doom did) is pretty much the most pessimum way to use the Super FX, and horizontal doubling is simultaneously much better and almost as bad.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: SNES Doom development, Randy Linden

Post by tokumaru »

One thing that fascinates me in SNES Doom is how buggy the horizontal alignment of wall textures is. If you look straight into a wall and then turn left and right you can see its texture "sliding" sideways relative to the surface of the wall. This is very easy to observe in narrow wall sections, like the sides of a window... but it's everywhere, really.

I wonder what the game is doing to determine which texture columns to use, and why that changes depending on the angle the player is facing. Was that a mistake or a deliberate decision made for performance reasons? It would be cool if I could figure that out from looking at the source code.

I know absolutely nothing about Super FX programming though, so I probably won't be able to make any sense out of the code, much less understand the algorithms being used! :lol:
coto
Posts: 102
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: SNES Doom development, Randy Linden

Post by coto »

tokumaru wrote: Sat May 23, 2020 1:59 am I wonder what the game is doing to determine which texture columns to use, and why that changes depending on the angle the player is facing. Was that a mistake or a deliberate decision made for performance reasons? It would be cool if I could figure that out from looking at the source code.
I think that's the binary space partitioning (or BSP) feature, used by John Carmack (sorts data depending on a physical map X,Y position)

Also, in the video it is explained Snes Doom uses "palleted dithering" which more or less emulate lighting given a scope of tiles, and such technology predates the light reflection matrix calculations seen in Quake (then acquired by OpenGL 1.0+ spec)

https://www.youtube.com/watch?v=P5PknJvplKgt=27m10s
93143 wrote: Sat May 23, 2020 12:05 am I kinda want to take a look at the wall rendering code. Rendering vertical lines one by one (as I believe the original Doom did) is pretty much the most pessimum way to use the Super FX, and horizontal doubling is simultaneously much better and almost as bad.
The GSU chip has per dot drawing opcodes. IIRC that is argonaut side implementation. Since the original doom engine predates orthogonal matrix transformation technology, I can only guess he rendered the scenes in slices (as Randy explained somwhere in the video) and that reduced overload, if, say DMA was used to update the slices. So I guess that wasn't astoundingly demanding for the GSU
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: SNES Doom development, Randy Linden

Post by tokumaru »

coto wrote: Sat May 23, 2020 9:24 amI think that's the binary space partitioning (or BSP) feature, used by John Carmack (sorts data depending on a physical map X,Y position)
AFAIK the binary space partitioning simply helps with determining which sectors are visible from any given point in the map. Also, if this was a side effect of the BSP, this glitch would be present in other Doom versions, and I've never noticed it anywhere else. Randy Linden apparently didn't even use any of the original Doom engine, so the tricks used by John Carmack probably had little to no impact on the SNES version.
Also, in the video it is explained Snes Doom uses "palleted dithering" which more or less emulate lighting given a scope of tiles
I'm pretty sure he's taking about COLORMAPs, which's basically a distance-based table lookup of virtual color palettes sorted by brightness made from the hardware color palette. This doesn't affect how textures are positioned, just what palettes they use.

IIRC, the Game Engine Black Book: Doom says that the game renders walls by projecting the two ends of each segment and interpolating the height and the texture. The use of fixed-point math could cause errors in the interpolation process, specially if the SNES version has less fractional precision than the original, but I don't know if that could cause the huge amounts of texture shifting the SNES version has. To me it almost looks like the game is using the wrong values in trigonometric calculations (e.g. using a fisheye-corrected distance as the hypotenuse when calculating the sides of a triangle instead of the real distance or something like that) and getting the wrong results because of that.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: SNES Doom development, Randy Linden

Post by tokumaru »

Here's what I mean about the texturing issue in SNES Doom:

doom-texture.png

Textures that are properly aligned start to shift and loop around as you turn. This isn't always noticeable, since textures in Doom are pretty repetitive, but It's weird, happens everywhere, and I'd like to know what couses it.
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom development, Randy Linden

Post by 93143 »

coto wrote: Sat May 23, 2020 9:24 am The GSU chip has per dot drawing opcodes.
Yes, and the way it works is each pixel plotted goes into a primary pixel cache representing a horizontal sliver of 8 pixels from a particular tile in the framebuffer, with flag bits denoting plotted pixels. Once it's full, or you plot to coordinates outside of the sliver it represents, it transfers the data to a secondary cache, which then blits the result into the framebuffer while you're drawing new pixels to the primary cache.

Blitting pixels to the framebuffer always requires accessing all 8 pixels in a sliver, because the framebuffer is in SNES CHR format, and at 21.4 MHz both the ROM and RAM buses (according to higan, anyway) run at 5 cycles per byte.

Now, I think (I hope, and I should double-check this) that if all 8 pixels in the cache are freshly plotted, the GSU doesn't bother reading the sliver from RAM, but simply writes it. Obviously if you've only plotted one pixel it can't get away with this, so it needs to read the whole sliver, blit your one pixel into it, and then write the updated sliver back.

The upshot is that continuous vertical runs of 8-bit pixels will bottleneck at 80 cycles per pixel instead of the nominal 1 cycle for the plot opcode, and twinning them (as SNES Doom does) will still bottleneck at 40. A horizontal run, by contrast, will bottleneck at 10 cycles per pixel, or (hopefully) 5 if all eight pixels in a sliver are solid.

I'm wondering whether the SNES Doom engine contains some sort of efficiency multiplier, perhaps an attempt to draw multiple columns at the same time. It would be complicated to say the least, but the improvement in fill rate could be substantial.
coto
Posts: 102
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: SNES Doom development, Randy Linden

Post by coto »

tokumaru wrote: Sat May 23, 2020 12:17 pm AFAIK the binary space partitioning simply helps with determining which sectors are visible from any given point in the map. Also, if this was a side effect of the BSP, this glitch would be present in other Doom versions, and I've never noticed it anywhere else. Randy Linden apparently didn't even use any of the original Doom engine, so the tricks used by John Carmack probably had little to no impact on the SNES version.
I'm not saying that glitch is part of the BSP set, but rather describing how the BSP format is used to draw maps, and with it, the "palleted dithering" effect. But I can see you refered to an issue with texture mapping.

Also, i'm guessing the BSP files would still be needed to be converted, otherwise the levels would have been rewritten from scratch. And I doubt that.
Maybe he used some derivative of BSP? I have no idea
93143 wrote: Sat May 23, 2020 3:52 pm The upshot is that continuous vertical runs of 8-bit pixels will bottleneck at 80 cycles per pixel instead of the nominal 1 cycle for the plot opcode, and twinning them (as SNES Doom does) will still bottleneck at 40. A horizontal run, by contrast, will bottleneck at 10 cycles per pixel, or (hopefully) 5 if all eight pixels in a sliver are solid.

I'm wondering whether the SNES Doom engine contains some sort of efficiency multiplier, perhaps an attempt to draw multiple columns at the same time. It would be complicated to say the least, but the improvement in fill rate could be substantial.
Yeah I understand.

Well GSU is blit-based drawing, the framebuffer then translates to planar format. Randy may have rewritten the Doom Snes engine, but the tools to convert the assets from original doom source code generate maps in 64x64 pixels rather than native SNES PPU planar mode. Thus a native format for the GSU. That means PLOT opcodes just read the colors out of it. It isn't that expensive to draw rectangles, texture mapping on it even if it were pixel per pixel, given the buffer fits the cache.

Also, I'm guessing the double-buffering he mentions considers the worst hypothetical case of 80 cycles, because 2 out of 5 slices (3 on screen), regardless if a GSU opcode forwards at pixel to the RAM or Pixel cache to RAM, abuses prefetch pixel opcodes out of ROM, while swapping the slices between the cached RAM or the RAM itself on the same pipeline. This means if you manage to create a multitasking mode having some kind of front buffer / back buffer, you can trash the caches a lot by reading an entire slice, while minimizing the bottleneck from the GSU, I can imagine at worst forcing the SRAM into a lot of non-sequential accesses. Which would not be that bad at writing multiple columns ( 8 pixels at a time) , as opposed, if say, you were drawing a whole screen at the same time.

Code: Select all

192 / 3 = 64 vertical pixels per slice * 3 * 80 = ~15360 cycles to draw the entire screen of which 1/17 of these would go to cache trashing = ~14457 cycles per frame (in total, regardless of the raster position)
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom development, Randy Linden

Post by 93143 »

coto wrote: Sat May 23, 2020 5:37 pm Well GSU is blit-based drawing, the framebuffer then translates to planar format.
I was (mis?)using the term "blit" to mean a bitwise combination of the new and old graphics data, without prejudice as to format. The framebuffer doesn't "translate" anything; it's just data in RAM, and it is in planar format.

The SA-1 uses a bytemap framebuffer format, and features a special DMA mode or two that can convert it to planar on the fly during the transfer to VRAM. The Super FX doesn't do that; its output to the framebuffer is already planar because the PLOT circuitry does the conversion.

This means that if you want to change just one pixel, you have to read and then write eight pixels. Very inefficient. The use of the pixel caches allows the Super FX to speed up row drawing, since it doesn't have to do the RAM operation until all 8 pixels are full, but column drawing is still terrible.
Also, I'm guessing the double-buffering he mentions considers the worst hypothetical case of 80 cycles, because 2 out of 5 slices (3 on screen), regardless if a GSU opcode forwards at pixel to the RAM or Pixel cache to RAM, abuses prefetch pixel opcodes out of ROM, while swapping the slices between the cached RAM or the RAM itself on the same pipeline. This means if you manage to create a multitasking mode having some kind of front buffer / back buffer, you can trash the caches a lot by reading an entire slice, while minimizing the bottleneck from the GSU, I can imagine at worst forcing the SRAM into a lot of non-sequential accesses. Which would not be that bad at writing multiple columns ( 8 pixels at a time) , as opposed, if say, you were drawing a whole screen at the same time.
I'm pretty sure the 5/3 buffering he's talking about is in SNES VRAM. It saves space versus full double buffering, thereby allowing the gun and status bar graphics to also fit.

The GSU actually doesn't have the ability to multibuffer on its own (screen base register is S-CPU access only), it has no data cache, and the pixel caches are tiny (9 bytes = one 8bpp sliver plus bit pend flags) and fully automated. I don't see how what you're saying makes any sense. How familiar are you with the Super FX?

Code: Select all

192 / 5 = 38 vertical pixels per slice * 3 * 80 = ~9120 cycles to draw the entire screen of which 1/17 of these would go to cache trashing = ~8594 cycles
per frame
The playfield in Doom is 144 pixels high. Dividing that into 3 vertically gives you 48 lines. Each line consists of 216 pixels, or 108 doubled pixels. If each doubled pixel takes 80 cycles to draw (assuming every single pixel in this buffer is part of a vertical wall, with no hidden pixels drawn), that's 414,720 cycles, or about 1.2 frames, or about 1.6 times the amount of frame time left after 1/3 of the buffer has been transferred to VRAM via DMA (you can't draw during this operation).

In other words, filling the whole playfield with vertical lines of doubled pixels would take more than four frames by itself. With single pixels (which apparently the game can be built to do), it would be more like eight. If the wall drawing process could be sped up, it could materially impact the frame rate of the game. And I'm sure Randy figured that out, so I'm hoping to find out if any practical solution occurred to him early enough in development to actually end up in the finished product.
coto
Posts: 102
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: SNES Doom development, Randy Linden

Post by coto »

GSU related stuff I have most of these read from online resources, reverse engineered a bit of it, and also looked at source code (be it verilog or C). Of course i'm no expert at it, but I do understand quite a bit of it, enough to make an implementation and to point out what does what.
93143 wrote: Sat May 23, 2020 10:05 pm How familiar are you with the Super FX?
If you're so called expert on the matter, where are the GSU programs showing the information you've posted so far then? You seem to be an expert, I expect proof of that. I mean GSU programs you have written by yourself. When that happens, I will gladly confirm my sources as incorrect.

If otherwise you have no more proof than online emulator source code, refrain from posting estimates. Because Randy Linden has most of the stuff I am explaining here, in the video.
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: SNES Doom development, Randy Linden

Post by Dwedit »

Could the Super FX do graphics compatible with Mode 7? Maybe it would have been possible to rotate the graphics 90 degrees, so you generate horizontal lines and display them as vertical.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom development, Randy Linden

Post by 93143 »

Not in hardware. The whole PLOT arrangement was designed exclusively for CHR bitplane graphics.

And even if it hadn't been, Mode 7 is a tiled bytemap format, so there would be no access penalty for any particular drawing pattern - just write the byte (or not) and you're done.

You could draw in Mode 7 in software. You'd have to do the address calculations yourself, and branch for transparency, but you might still come out ahead - I haven't tried to code a Mode 7 renderer for the Super FX, but it works pretty well on the S-CPU, as evidenced by Wolfenstein 3D...

...

The real problem with Mode 7 is that you can only have 256 unique tiles, and you can't change where the tile pool is in VRAM.

Doom's 216x144 playfield would actually work with double-wide pixels. But they'd have to all be double-wide, so you'd lose the dither on the floors and ceilings, and you wouldn't be able to use normal-width pixels regardless of how much faster the renderer turned out to be. (Unless you wanted to try to squeeze everything into 11 colours).

Furthermore, since with a 176-line active area it takes at least two VBlanks to fully update the playfield, you would get tearing because you can't double buffer in Mode 7 if you're using virtually the entire tileset. This may be one reason why Wolfenstein 3D used a 224x160 playfield doubled in both axes (112x80 before scaling), because with a 192-line active area you can update the whole thing in one frame.

Writing to VRAM with HDMA (assuming it works universally; current results look good) would give you enough extra bandwidth to do 108x144 in one frame if you were careful not to overwrite the data at the wrong time. But then you might have to draw the gun with the Super FX, because Mode 7 only has the one BG layer (EXTBG doesn't count), and force blanking during HBlank stomps on sprite tile fetches. How high up the screen can the gun get? There might actually be enough room, if you built the status bar out of BG layers only...

EDIT: note that since GPRAM is inaccessible for DMA while the GSU is drawing, any use of HDMA to transfer framebuffer data to VRAM would have to be preceded by a regular DMA of the extra data from GPRAM to WRAM. This would be a complicated way to get extra bandwidth, to be sure...
Last edited by 93143 on Thu May 28, 2020 2:29 am, edited 1 time in total.
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom development, Randy Linden

Post by 93143 »

I re-read coto's earlier post, and it gave me an idea.

If it turns out that the wall drawing code is in fact heavily bottlenecked by pixel cache stall (as I suspect), and drawing four or more wall columns at a time is too complicated to fit in the register set (as I suspect), it might actually be reasonable to use a separate area of the GSU's RAM as an intermediate bytemap buffer. Drawing to it would probably be bottlenecked by code, since RAM writes allow execution to proceed in parallel, and while reading RAM does stall the core, it's not for very long compared to what happens when you catch the secondary pixel cache flatfooted. So the wall renderer could draw a column 8 pixels wide in linear bitmap format (making sure to align the coordinates being plotted with the tile structure of the main framebuffer), and then read it back and use plot to get it into CHR format.

I wonder if something like that was what he was trying to say...
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom development, Randy Linden

Post by 93143 »

93143 wrote: Sat May 23, 2020 10:05 pm I'm pretty sure the 5/3 buffering he's talking about is in SNES VRAM. It saves space versus full double buffering, thereby allowing the gun and status bar graphics to also fit.
Now that I think about it, the cartridge Doom used reportedly only had 64 KB of GPRAM. That being the case, game state could easily have used enough of that to prevent full double buffering there too. This meshes slightly better with how he describes it in the video as well.

I'm guessing it was both. There's too much data in VRAM to fit two whole frames worth of tiles in there, but 5/3 buffering on the GSU side also seems to make sense.

I suppose if I was really gung-ho about this I could take a look in a debugger and try to figure it out, but I'd rather wait for the source...
coto
Posts: 102
Joined: Wed Mar 06, 2019 6:00 pm
Location: Chile

Re: SNES Doom development, Randy Linden

Post by coto »

Yes, i'm also looking forward to the source code. Maybe some cool hardware trick such as hooking the GSU to a low level emulator?
Ability to upload payloads and/or debug registers, IO map seen by the GSU.

Undoubtely the GSU should receive the same treatment as the Z80, like having an open source C compiler or realtime reverse engineering payloads through Ghidra, maybe some 3D modeling tools... I mean that would be awesome
Post Reply