Señor Ventura wrote: ↑
Mon Jul 27, 2020 2:56 am
How is scaled then if it is using the BG1 in mode 3?.
It's not scaled. Each pixel is actually drawn twice. The framebuffer is 216x144, and that's exactly what's displayed. Plus a 32-line high status bar that isn't drawn by the Super FX, for a total of 216x176.
Each frame, including VBlank, is 262 scanlines high, so with 176 lines of active display there remain 86 lines during which DMA to VRAM can happen uninterrupted, resulting in a theoretical transfer size per frame of nearly 14 KB. Some of that will be taken up by overhead and additional tasks like sprite and palette updates. At 8bpp, the 216x144 framebuffer is a bit over 30 KB, so it takes a total of 3 VBlanks to load a frame into VRAM. Without getting fancy, that essentially results in a cap of 20 fps (but rendering the frame takes so long that this cap is never reached).
93143 wrote: ↑
Sun Jul 26, 2020 10:58 pm
Also, the more scanlines are used for active display, the fewer are left for DMA to VRAM. Using 256x224 only barely works because of the mosaic trick and
the VRAM HDMA trick, and
the fact that the actual rendered area is only 192 lines high. The mosaic trick restricts it to double-wide pixels, and the VRAM HDMA trick requires the programmer to take special measures to avoid problems with sprites.
You can use a mode 7 game at fullscreen cause it only needs hdma to operate
Blowing up a Mode 7 image to fullscreen does not require HDMA. It can be accomplished with a constant transform, which means you can set the matrix and forget about it.
Wolfenstein 3D did not use fullscreen. Its display area was 224x192, not 256x224. The framebuffer was 112x80 and was blown up to 224x160.
Doom has a higher resolution than Wolf3D. The display area is only 216x176, but the framebuffer is 216x144 with 2x1 pixels, as opposed to Wolf3D's 224x160 with 2x2 pixels. If this were attempted with Mode 7, it would require a 108x144 framebuffer stretched to 216x144, and since 108x144 is 243 tiles out of the 256 that Mode 7 allows, it would be impossible to display without tearing unless you used the VRAM HDMA trick. Mode 3 does not have this problem because it allows a pool of 1024 tiles, making it easier to double buffer large images.
Note that as far as I am aware, the VRAM HDMA trick
was only demonstrated to work earlier this year, and would probably not have been used by any developer back in the '90s. It also causes the sprite layer to glitch out, so sprites have to be turned off in areas of the screen where VRAM HDMA is occurring. This is a significant problem for Mode 7, because you only get the one BG layer, so if you don't want to have to draw the gun with the Super FX, it has
to be made entirely of sprites.
The mosaic trick
is a method of stretching a non-Mode 7 framebuffer horizontally (kinda - the input format is a bit weird), so as to reduce by half the number of pixels that need to be drawn and transferred to VRAM. It's less edgy than the VRAM HDMA trick, but Randy didn't think of it back in the day, and I am aware of no one who did.
It turns out that if you combine the mosaic trick with the VRAM HDMA trick, you can just barely manage a 256x224 display with a 256x192 Mode 3 rendered window with 2x1 pixels, using a 128x192 framebuffer, at 20 fps (well, except that I doubt the SNES+GSU could run the game that fast - I just mean that DMA bandwidth and VRAM space are sufficient). Without VRAM HDMA, you'd be restricted to 12 fps at that resolution and display size if you wanted to be able to update sprites and change the palette, which you would. The existing Doom port uses neither of these tricks, and its display is pretty close to being as large as it could feasibly get without them.
In Mode 7, the fact that you have a fixed-location pool of only 256 tiles and a fixed-location tilemap of 128x128 tiles means that in the context of a full-screen 256x224 display, the SNES can barely manage 128x96 without tearing, even using the VRAM HDMA trick. The mosaic trick is better than Mode 7 for this application.
but if you want to use those multipliers out of the mode 7 you need to not use all the rendering time to communicate with the ppu1 to use it, right?... i mean, is not a simple question of DMA time, but ppu1 time.
If Mode 7 is not operational, the PPU doesn't care what the CPU does with the multiplier. It can keep rendering the picture in Mode 0-6, and the CPU can use the multiplier in parallel. There's no need to force blank.
If you are
using Mode 7, you don't letterbox the screen to get more time to use the multiplier. That's silly. At most, you'd have a substantial chunk of the screen that was guaranteed to not be Mode 7, so you could do your calculations during those scanlines, but you'd have to make sure that all calculations using the PPU multiplier would actually fit in the available time.
The only good reason to letterbox the screen is to get more DMA bandwidth. If rendering is turned off, you can write to VRAM. The longer the time between turning off rendering at the bottom of the picture and turning it on again at the top of the picture the next time around, the more data can be pushed through to VRAM before the PPU needs exclusive access again.
93143 wrote: ↑
Sun Jul 26, 2020 10:58 pm
That should be pretty easy. IIRC the SNES is the only source of actual timing information, so as long as the code doesn't implicitly assume something about how fast certain tasks get done, it shouldn't much matter how fast the GSU is.
So, the code actually do assume the time in what all must be processed, so, the more mhz, the more speed you get, but not frames per second, Do i'm wrong?.
What I meant was that since the GSU doesn't have timing information available to it, the SNES can keep track of the frame pace independent of it, and if necessary tell it how much time is passing. Speeding up the GSU should increase the frame rate but maintain the correct game speed. I don't know if SNES Doom was actually written this way, but it seems like it was - emulators that run the Super FX too fast seem
to work as desired, with higher frame rate but correct speed, but I haven't done a rigorous A/B comparison...
The way the original Doom engine on PC worked was that after rendering a frame, it would check how much real-world time had passed since the last game world status update, and run that much time in-game before rendering the next frame. This would result in a consistent game speed regardless of the achievable frame rate on any specific PC (and the achievable frame rate varied wildly). If SNES Doom was written that way, it shouldn't matter what the GSU clock is unless some specific CPU/GSU interface function happens to assume something about the timing. Which, as I said, is possible, but a hack or re-port could remove any such assumptions if they exist.
If any such thing does exist, it might not be a systematic thing like in Star Fox, where overclocking the GSU does actually speed up the game... it could very well just work as normal up to some limiting overclock, and then cause a crash or freakout.
Also, what do you mean "not frames per second"? If you overclock, you will
get higher fps regardless of how the engine is written, as long as it's not already running as fast as the SNES can accept the data, which it isn't. The only question is whether the game speed increases together with the frame rate. In Star Fox, it does. In Doom, it probably doesn't (I'm not sure), and in a re-port it would be easy to ensure that it did not.
93143 wrote: ↑
Sun Jul 26, 2020 10:58 pm
One possibly arguable exception is the clock trick I mentioned upthread, because it doesn't actually increase the compute speed, just the memory access speed. One could argue that what is essentially a FastROM[+FastRAM?] option for the GSU should
have been made available at some point anyway; it's not like anything would have had to change except how long it waits for the memory buses to respond, and the clock trick just fools the GSU into acting exactly like a FastROM GSU would have. However, I consider this trick a last resort even for my shmup port, and for Doom I'd rather not use it.
That was that i meant before, compute speed vs memory access...
Are you sure? Because I don't think we were discussing this specific thing before. I'm talking about a particular overclock/underclock trick that could improve a particular bottleneck in how the Super FX works internally, while still running the chip at 21 MHz.
The Super FX runs at 10.7 MHz or 21.4 MHz internally based on whether high-speed mode is selected. However, memory accesses (Game Pak ROM reads and Game Pak RAM reads/writes - nothing to do with DMA to SNES VRAM) are slower and can cause bottlenecks. In low-speed mode, the chip accesses memory at 3 cycles per byte. In high-speed mode, it accesses memory at 5 cycles per byte. Both of these values are consistent with 200 ns memory response time, which is Nintendo's SlowROM spec.
My idea was to use a 42.8 MHz oscillator, but leave the Super FX in slow mode. That way, it would think it was running at 10.7 MHz, and would thus use 3 cycles for memory access, but it would actually be running at 21.4 MHz. 3 cycles at 21.4 MHz will work fine with 120 ns memory response time, which is Nintendo's FastROM spec. Just pay for more expensive 120 ns ROM and RAM, and everything should be within spec, or so it seems to me (I have no hardware design experience).
the sort of thing I think a clever developer could have convinced Nintendo to let them do, and it's absolutely the sort of thing that Argonaut should
have built an equivalent of into the chip, because there was no technical reason not to. But I still don't really like it for Doom, because it smells a bit like cheating... and anyway, most memory accesses (other than RAM reads) are buffered, and there's an instruction cache, so you can run code at full speed while the memory is being accessed, and if your code takes longer than the memory access, there's no advantage to speeding up the latter. Rendering pixels in Doom is just complicated enough that reducing memory access time may not speed things up much (at least once the renderers are rewritten to not PLOT directly in vertical columns, which is almost certainly heavily bottlenecked by memory access because of the way the Super FX handles SNES CHR format).
But the grace is in doing it to work at the standard 21 mhz, i suppose (if not, then never will set a limit with the type of the cpu to use).