It can also execute at nearly full speed from ROM, and has a real divider (MMIO, but it's much better than the S-CPU's divider, and it beats the Super FX's divide-by-two instruction and cumbersome lookup table access). It also supports 8 MB of ROM, vs. the Super FX's 2 MB (yeah, you can add CPU ROM, but that's only useful for stuff the Super FX doesn't need to know about).psycopathicteen wrote:I wonder if the Super FX chip really is better than the SA-1 for this game. The SA-1 can do character conversion DMA.
It shouldn't be hard to have the S-CPU run mostly in WRAM and only touch BW-RAM (and possibly ROM) for DMA. This gives you 8 master cycles per double pixel, leaving 36 CPU cycles to prepare the data and load the opcode(s) for the write(s) before you reach the Super FX bottleneck. Mind you, this only applies to walls; the Super FX is probably unbeatable at filling floors and ceilings and even drawing backdrops, and objects/enemies can probably be done line by line too.
If the mosaic trick could somehow be got working, it would double the Super FX's speed ceiling for doubled pixel columns, whereas the SA-1 would only see a small benefit...
What? Every emulator supports the SA-1. It's not accurate, but it's there. And as for it being expensive, every game that used it appears to have used it primarily as copy protection, or possibly as aap9 wrote:Possibly. But SA-1 is expensive, and few emulators support it.
Are you thinking of the ST-018?
That sounds like what the Super FX does for its automatic overhead-free checkerboard dither. I've abused the dither feature to get two pixels out of a single ROM read when drawing unscaled 2D graphics. Of course, since the COLOR register is 8-bit, dither only works for 4bpp or lower...ap9 wrote:I remember a while back stepping through the Mac version of Doom in a debugger to discover a horizontal technique to render more than one pixel at a time by storing two 16 bit portions in a 32 bit register, for the low-res. mode. This method, of course, is not conceivable on a 16 bit system.
How necessary is that precision, though? If you're using a 32-bit CPU, it would seem natural to use 32-bit operations.DOOM uses a lot of 32 bit math (esp. fixed point math); a proper port of Doom to SNES would be pretty slow in software because of the 16 bit limit.
On the SNES, we typically find that position can be expressed as a 24-bit fixed-point number (or a 32-bit number, for data integrity reasons) while velocity is fine with 16 bits. This sort of mixed-headroom math can speed things up considerably versus simply doing everything at 32-bit. It's also basically impossible in C...
You sure about that? I was pretty sure no C compiler existed for the Super FX. Leaving aside the fact that it was something of a niche market, the chip is weird enough that it might actually be worse for C than the 6502.The engine Williams Entertainment used was maintained mostly in C, so there was plenty of room for performance improvements.
I can see the S-CPU code being C, but the GSU code would probably have to be assembly.
At the time I was reading up on more exotic implementations for expansion chips (e.g., non-PC emulators). I should've used the word 'fewer' in regards to the general scope… yes, all popular SNES emulators support SA-1. However, 'few' would still apply to completeness and accuracy; bsnes is the first (and still relatively new) to use low-level reverse engineering for expansion chips, whereas most emulators cannot be expected to even work with lesser known operations. SuperFX is better known.93143 wrote:What? Every emulator supports the SA-1. It's not accurate, but it's there.ap9 wrote:Possibly. But SA-1 is expensive, and few emulators support it.
While precision can be reduced without breaking the game, it wouldn't nearly be an authentic port without all the fixed-point and trig operations a game like Doom uses. Of course, some operations may only need 24 bits or less, and rendering can be scaled down considering a smaller window.On the SNES, we typically find that position can be expressed as a 24-bit fixed-point number (or a 32-bit number, for data integrity reasons) while velocity is fine with 16 bits. This sort of mixed-headroom math can speed things up considerably versus simply doing everything at 32-bit. It's also basically impossible in C...
The SA-1 could certainly help as a math co-processor if nothing else.
Yes, I was referring to the S-CPU side. One of the leading people who worked on SNES Doom said the main level engine was maintained in C, and if optimized in assembly could have increased overall performance by maybe 30%.I can see the S-CPU code being C, but the GSU code would probably have to be assembly.
Okay, that's fair. I've got to say, though, I can't bring myself to care much, particularly since AFAIK nobody is planning to actually do anything about this. The scenario is either "how could SNES Doom have been better" or "how could a better SNES Doom hypothetically be made?".ap9 wrote:most emulators cannot be expected to even work with lesser known operations.
Even if we were to actually attempt this, it would be hard to convince me that ZSNES sucking is a good reason to use an inferior chip (if the Super FX really is inferior for this application). If the SA-1 were actually too expensive in early 1996, that would be something to consider, since really the only reason to do this is to prove that it could have been done, but if a game works on real hardware it's not the game's fault if it doesn't work in an emulator.
Sure, but are you really going to notice the difference between 1/256 and 1/65536 positional precision? Just the fact that the frame time is a multiple of 1/60 s instead of 1/70 s would make a bigger difference, I'd expect. And the trig would all be precomputed and tabulated.While precision can be reduced without breaking the game, it wouldn't nearly be an authentic port without all the fixed-point and trig operations a game like Doom uses.
How big is "one" in 16.16 on the actual map? What's the fastest any object can go?
No, it would have to be used to do the rendering as well. I'm pretty sure it draws too much current to pair with the Super FX, even if the memory arrangement were compatible, and the S-CPU is too weak to render Doom at a decent speed and resolution.The SA-1 could certainly help as a math co-processor if nothing else.
Wow. The S-CPU side was the bottleneck, then?One of the leading people who worked on SNES Doom said the main level engine was maintained in C, and if optimized in assembly could have increased overall performance by maybe 30%.
One of the neat things about the Super FX (that Nintendo offered but nobody ever used) was the concept of CPU ROM - you can add up to 6 MB of extra ROM to the cartridge in parallel with the GSU. The S-CPU can then access that ROM freely without getting in the GSU's way or having to jam everything into WRAM. And since it's in the top half of the memory map, you can use FastROM, which is normally impossible with a Super FX. Combine this with what you've just said, and that's possibly a ~50% speed boost without touching the Super FX code at all (unless that 30% is just how much faster it would be if the GSU were the bottleneck).
Not to mention (as I already have) the space that would be freed up in the main Super FX ROM, which seems to have been a limiting factor.
I wonder why they didn't allow circle-strafing? Doesn't seem like it'd be that hard to do...
The backdrops might be doable with the second background layer, though the colour depth would suffer. But it might not make much difference - clearing pixels on the Super FX takes nearly as long as drawing an unscaled backdrop (if the backdrop is stored correctly). And while it might be neat to use full resolution, it might contrast oddly with the low-detail 3D and have players reaching for their reading glasses...
I'll try the mosaic trick again when I get time. It doesn't seem right that it should be impossible to pull off. (Speaking of things emulators don't support properly...)
This also allows you to draw two double-wide pixels per cache flush rather than just one when drawing vertical wall strips on a GSU, which doubles your theoretical pixel throughput. It's more complicated, but it could be worth it.
The problem is that mosaic on the SNES behaves really weird; I tried to make this work one evening when I had a few minutes, and I only managed to make it work on Snes9X, not on real hardware. Furthermore, every emulator behaved differently and higan accuracy was wrong.
The other problem is that some of the floors/ceilings do use dither for shading, and this scheme would break that. The different floor levels mean you couldn't just use a second layer with HDMA - or at least, if you did it wouldn't look the same. Maybe some sort of hybrid method... But it might still be worth it if the frame rate improved enough (which might also require an overhaul of the SNES-side engine and possibly an augmented cartridge design, judging by what ap9 is saying).
(By the way, that pretty much has to be manual dither. Automatic dither only works in 2bpp and 4bpp. But there's not much in the way of savings to be gained here unless they wrote really bad code, since dumping a full 8bpp pixel buffer to GPRAM takes longer than filling it with manually-alternated colours.)
I'm also starting to wonder if it's possible to just draw two vertical strips at once (assuming the original doesn't already). I know 512 bytes isn't a lot of code, but since the RAM buffer is such a massive bottleneck I imagine it might be possible to write a sufficiently general method without slowing things down. (I'd suggest doing four strips at a time, but I get the feeling it would be too branchy and wouldn't fit in the cache; also, from experience, 15 registers isn't as many as it sounds like.) This would allow you to use a normal framebuffer, permitting dithering on the floors (if it's still necessary to have blank floors with the additional ROM space) and perhaps even drawing distant enemies in full resolution (I've been playing the game a bit and it's shockingly hard to see what's going on).
Lessee...93143 wrote:dithering on the floors (if it's still necessary to have blank floors with the additional ROM space)
Code: Select all
with R7 add Ry with R8 add Rx getc to R14 merge plot loop plot
But that only works if the texture is 256x256, which might be a tad excessive. How about this?
Code: Select all
with R7 add Ry with R8 add Rx getc to R14 merge with R14 and Rmask plot loop plot
The texture format implied by this procedure is kinda fragmented, with a horizontal run every 256 bytes and free space in between. The wall drawing routine probably has more time for fancy stuff, so adding an offset to wall texture coordinates is probably fine.
If you wanted to do distance gradations in the texture colours, you might have to add an offset to the colour before loading it, adding another 20% - unless you could somehow arrange the palette so that just subpalettizing with CMODE bit 3 would do the trick. Wall textures already have lighting and distance gradation effects, but then the walls probably have a ton of extra time to fiddle with the colour indices... Overall the master palette is much more of a garbled mess than the one in the PC version. Possibly this is partly to fit stuff like the gun and hand graphics into 4bpp subpalettes, though that doesn't explain the mess they've made of the bottom half... wait. There are duplicate colours in the SNES palette. Adjacent in gradient runs, no less. Maybe the palette could indeed be optimized better... Megamanning the gun and hand graphics and using 8bpp for Doomguy's face might eliminate some of the clutter too... You know what? I bet they used CMODE bit 3 for the enemies and objects. That could explain the whole mess.
Yeah, this one's a maybe for now. I wonder how much non-GSU data is actually in that ROM?