SNES Doom Source Released! Now What?

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Post Reply
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom Source Released! Now What?

Post by 93143 »

Oziphantom wrote: Sat Aug 01, 2020 11:34 pm On a PAL SNES, using Mode7 in colour per Tile mode works pretty well for 3D rendering. You might be able to pair the tiles down to colour combinations in something like DOOM so you could say cut the actual number of bytes you need to DMA in half for NTSC.
No, I'm pretty sure Doom would look horrible in any sort of 4bpp. The game squeezes every drop out of that 256-colour palette.

If my calculations are correct, with a 128x96 Mode 7 framebuffer and a 224-line active area, you can sustain 30 fps without tearing even in NTSC if you use VRAM HDMA. The way I figure it is, you've got 12 KB to transfer, and 4 KB free. Fill the 4 KB of unused space in the first VBlank. Then do 5 KB during the second VBlank and 3 KB via HDMA (a little over 150 scanlines at 20 bytes per line, or 192 lines at 16 bytes per line - you can split it over multiple frames, but you have to be careful not to overwrite part of the tilemap at the wrong time). Now you've got about a KB of spare bandwidth on the update frame, and a couple KB of free VBlank on the off frame plus anything you want to put in HDMA. Not that any game rendered this way is likely to hit 30 fps on SNES...

...

You know what might work? Wolfenstein 3D in 2x1 pixel mode, full width. 128x128 stretched to 256x128. I figure this game isn't really known for its verticality, but more peripheral vision is always welcome. With a 32-line status bar, there's enough room to transfer the whole thing in one frame on NTSC (a feature shared by the existing port), meaning it could update without tearing. (And for those who care, it would just about fill the screen on a modern 16:9 HDTV...)

Wolf3D_128x128_filtered.png

This would be almost twice as many pixels as the existing port, and I'm not confident that speeding up pixel address calculation with the tilemap trick would be enough to make up for that. Maybe there's some other sort of optimization id missed during those three weeks of hell...

Compare with the actual port:

Wolf3D_112x80_filtered.png
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: SNES Doom Source Released! Now What?

Post by tepples »

93143 wrote: Sat Aug 01, 2020 3:21 pm Or perhaps like this:
This indeed is what I had in mind. Now that you've prototyped it, I guess it isn't as useful at 15 fields per second as it is at even 30, let alone the 60 of 480i. It'd at least need MIP mapping to limit jumpiness of wall textures.
93143 wrote: Sat Aug 01, 2020 10:10 pm Tile format in Mode 7 just means that the pixels are in rows of 8, which are then stacked in columns of 8. It means extra calculations to figure out where in WRAM to put a pixel when rendering. Linear framebuffer should be easier, because it's just a single giant stack of 128 rows of 128 pixels each. And it turns out it's pretty easy to trick the PPU into rendering the Mode 7 tilemap as if it were a bitmap (by using 256 different solid-colour tiles and zooming out).
I have some dumb ideas as to how to set up a matrix for using the tile data as a linear frame buffer. Repeat each tile 8 times horizontally in the map, then set up the matrix to scan diagonally over these groups of 8 tiles. If this is combined with other tricks, it may be possible to display 112x144 stretched to 224x144 in mode 7. Add a status bar, and 224x160 nearly fills a modern widescreen TV.

And even if not, one could stretch 112x128 to 224x192 by stretching the image vertically by 1.5, alternating between 1-2-1-2 and 2-1-2-1 each vblank, and possibly using interlace to put the 2 lines where the 1 line was last field.
93143 wrote: Sat Aug 01, 2020 3:21 pm Interleaved bitplane format, used for Modes 0-6, is WAY more complicated and cannot be considered for texture mapping on an unassisted SNES.
In which mode does Jurassic Park render interior scenes?
User avatar
Señor Ventura
Posts: 233
Joined: Sat Aug 20, 2016 3:58 am

Re: SNES Doom Source Released! Now What?

Post by Señor Ventura »

93143 wrote: Sat Aug 01, 2020 10:10 pm Tile format in Mode 7 just means that the pixels are in rows of 8, which are then stacked in columns of 8. It means extra calculations to figure out where in WRAM to put a pixel when rendering. Linear framebuffer should be easier, because it's just a single giant stack of 128 rows of 128 pixels each. And it turns out it's pretty easy to trick the PPU into rendering the Mode 7 tilemap as if it were a bitmap (by using 256 different solid-colour tiles and zooming out).

Interleaved bitplane format, used for Modes 0-6, is WAY more complicated and cannot be considered for texture mapping on an unassisted SNES.
Then the thing is:
-Tile format only under mode 7 due to its "bitmap shaping" characteristic.
-Linear framebuffer with the rest of modes (0 to 6), implies not buffering if i understand well.


What about if assisted with an SA-1 (it's the same processor) under mode 5 with 512x190 and 120 colors (4BPP), 1x1 of definition even at 8 to 10 fps?

And achieve visuals like this:
https://youtu.be/s3ZldmlWIPE?t=120
93143 wrote: Sat Aug 01, 2020 10:10 pm You can't do a framebuffer that large in Wolfenstein 3D.

It doesn't have a Super FX, so you need to use Mode 7 because it's the only one with a data format the S-CPU can render into at a decent speed.
Where is exactly the difference?, What makes the cpu being faster under the graphic data format of the mode 7?.
93143 wrote: Sat Aug 01, 2020 10:10 pm And Mode 7 only gives you 256 tiles and a 128x128-tile map, so any way you slice it you can only do ~16K pixels at most.
Only gives 256 tiles... due to bandwidth reasons, right?.
93143 wrote: Sat Aug 01, 2020 10:10 pm I was thinking that since the Mega Drive version looks decent, 4bpp with doubled resolution might be a feasible approach, since Wolf3D doesn't use its palette nearly as well as Doom does. But it turns out that the Mega Drive version has to use horrible-looking column dither to approximate the palette depth of the game (and speed up rendering, I guess, since you can write two pixels in one shot), so you're not really any further ahead this way...
Yes, it seems like the 8BPP wates the use of the color depth in wolfenstein 3D, it can´t be so that lossy at 4BPP.
93143 wrote: Sun Aug 02, 2020 10:13 pmYou know what might work? Wolfenstein 3D in 2x1 pixel mode, full width. 128x128 stretched to 256x128. I figure this game isn't really known for its verticality, but more peripheral vision is always welcome. With a 32-line status bar, there's enough room to transfer the whole thing in one frame on NTSC (a feature shared by the existing port), meaning it could update without tearing. (And for those who care, it would just about fill the screen on a modern 16:9 HDTV...)
How much would be the loss of perfomance with 1x1 of pixel definition?

... and i was thinking... Do 1x2 (256x96) could get better visuals than 2x1?
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom Source Released! Now What?

Post by 93143 »

tepples wrote: Mon Aug 03, 2020 7:11 amRepeat each tile 8 times horizontally in the map, then set up the matrix to scan diagonally over these groups of 8 tiles.
That should work...

...but I can't help noticing you've limited it to a 16-line status bar, perhaps to get enough DMA bandwidth. To do a 32-line status bar with a viewport taller than 128 lines, using a roughly 16 KB framebuffer, you'd need VRAM HDMA to avoid tearing. Which, again, should work, and being able to use a linear framebuffer without being limited to 128 pixels in both directions could be useful.
And even if not, one could stretch 112x128 to 224x192 by stretching the image vertically by 1.5, alternating between 1-2-1-2 and 2-1-2-1 each vblank, and possibly using interlace to put the 2 lines where the 1 line was last field.
...sounds jumpy.

Anyway, the Mega Drive fan port of Wolfenstein 3D seems to use a 256x128 viewport with each pixel pair being essentially a column-dithered pseudo-8bpp pixel, relying on the column blending effect you get with the usual MD H40 video output. So my scheme above would be equivalent to that, but with real 8bpp. In fact one could duplicate the exact pixel arrangement of the MD version's viewport by simply using tiles arranged as 4bpp pixel pairs instead of 8bpp pixels, although I don't really see the point...
93143 wrote: Sat Aug 01, 2020 3:21 pm Interleaved bitplane format, used for Modes 0-6, is WAY more complicated and cannot be considered for texture mapping on an unassisted SNES.
In which mode does Jurassic Park render interior scenes?
7. In fact, it looks like they're using the tilemap-as-bitmap trick, with all the tiles being a single unique colour.

Señor Ventura wrote: Mon Aug 03, 2020 11:08 am-Tile format only under mode 7 due to its "bitmap shaping" characteristic.
What is "bitmap shaping"? The point of rendering in Mode 7 is that a pixel is represented by one single byte, so it's easy to render into.

In every other mode, a pixel is represented by one bit in each of several bytes, so it's very time-consuming to plot a pixel because you have to do a bunch of shifting and masking, not to mention that you have to read, modify, and write a number of bytes equal to your pixel bit depth every time you plot a single pixel. The Super FX exists largely for this reason - it has special hardware that does this process automatically.
-Linear framebuffer with the rest of modes (0 to 6), implies not buffering if i understand well.
You don't. Linear framebuffer only works with Mode 7.

My trick involved treating the tilemap as a bitmap, because it's 128x128 tiles in Mode 7, and a tile can be treated as a pixel because you can zoom out. The trick tepples thought up uses the fact that Mode 7 is a chunky pixel format combined with the fact that you can do an arbitrary affine transformation on the graphics.

None of these factors apply to any other mode; they can't zoom or transform in any way (meaning the tilemap-as-bitmap trick would result in 8x8 pixels on screen, which is too large for gameplay even in hi-res mode), and they use an interleaved bitmap format for the tiles that's complicated to render into.

Please do some reading on your own. It's very frustrating trying to explain this stuff to someone without the necessary understanding of the hardware. You seem to misunderstand just about everything I say. If you read and understand https://wiki.superfamicom.org/ and maybe http://problemkaputt.de/fullsnes.htm, and/or maybe watch the SNES Features videos by Retro Game Mechanics Explained, what I'm telling you will make more sense.
Only gives 256 tiles... due to bandwidth reasons, right?.
Wrong. Mode 7 is inherently limited to 256 tiles in VRAM at any time. And you can't change where the PPU assumes they are, so you can't double buffer by switching data pools; there's only one. The tilemap is 16 KB representing a grid 128 tiles wide by 128 tiles high, using one byte per tile, so it actually couldn't specify tile indices outside the range 0-255 even if they existed. And just like the tileset, the map is in a fixed location, so you can't switch to a different map; you have to overwrite it to change it.

Other modes use smaller tilemaps with two bytes per tile, including extra bits for palette selection, tile priority and horizontal and vertical tile flipping, along with support for up to 1024 tiles. And you can write to registers to tell the PPU where the tileset and tilemap are, so you can easily switch between multiple data pools existing in VRAM, even between one scanline and the next. Mode 7 doesn't have any of that.

If you had been reading up on the SNES, you would know this stuff.

The reason you need high DMA bandwidth to use large framebuffers in Mode 7 is that the data pool is so small. If you can't fully double buffer your frame data, your single-frame DMA time needs to be sufficient to replace an amount of data equal to your frame size minus the free space available in VRAM. With a 128x128 map being used as a 128x128 framebuffer, there is ZERO free space in the Mode 7 tilemap area (VRAM as a whole may well have a ton of free space, but Mode 7 can't use it), so you have to be able to transfer the whole thing in one frame. Otherwise the screen will show a partially updated buffer with a clearly visible seam; this is called "tearing".

It turns out that with a 128-line viewport and a 32-line status bar, with the entire rest of the display being forced blank, a full 128x128 tilemap update does just barely fit in between the end of one display refresh and the beginning of the next one, even on an NTSC SNES (which has a shorter frame and thus less VBlank time than a PAL SNES). So my scheme above works.
93143 wrote: Sat Aug 01, 2020 10:10 pm I was thinking that since the Mega Drive version looks decent, 4bpp with doubled resolution might be a feasible approach, since Wolf3D doesn't use its palette nearly as well as Doom does. But it turns out that the Mega Drive version has to use horrible-looking column dither to approximate the palette depth of the game (and speed up rendering, I guess, since you can write two pixels in one shot), so you're not really any further ahead this way...
Yes, it seems like the 8BPP wates the use of the color depth in wolfenstein 3D, it can´t be so that lossy at 4BPP.
Slow down and read what you're quoting. I said I thought initially it might be a good idea, but it turned out to be a bad one. 16 colours is really not very many, and even with dithered pseudo-8bpp the Mega Drive version doesn't look great.
93143 wrote: Sun Aug 02, 2020 10:13 pmYou know what might work? Wolfenstein 3D in 2x1 pixel mode, full width. 128x128 stretched to 256x128. I figure this game isn't really known for its verticality, but more peripheral vision is always welcome. With a 32-line status bar, there's enough room to transfer the whole thing in one frame on NTSC (a feature shared by the existing port), meaning it could update without tearing. (And for those who care, it would just about fill the screen on a modern 16:9 HDTV...)
How much would be the loss of perfomance with 1x1 of pixel definition?
You can't do that in Mode 7, unless you use a tiny window, because you only get 16384 pixels at most. And without a Super FX, no other modes can be used.

The image I posted used the maximum number of pixels that can be rendered on an unassisted SNES at a halfway acceptable frame rate. The only way to use 1x1 pixels would be to scale the image down, which shouldn't massively affect the performance because you're rendering the same number of pixels.
... and i was thinking... Do 1x2 (256x96) could get better visuals than 2x1?
No. We've already demonstrated that with Doom, and this game is even flatter.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: SNES Doom Source Released! Now What?

Post by lidnariq »

93143 wrote: Mon Aug 03, 2020 5:15 pm
Only gives 256 tiles... due to bandwidth reasons, right?.
Wrong. Mode 7 is inherently limited to 256 tiles in VRAM at any time.
He's not wrong. He's also not right.

The 256 tile limitation comes because the output of one 8-bit-wide VRAM is fed into the address bus of the other 8-bit-wide VRAM, and the VRAMs still run at the same 5.4MHz that it does in all SNES video modes. If the PPUs changed the clock, or the RAMs were wider, it would allow for more tiles. So in a way it is ultimately a bandwidth constraint...

But it's also misleading to think of it as one: the entire design of the S-PPU is predicated on these properties, and changing these properties is comparable to using the Genesis's VDP instead.
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom Source Released! Now What?

Post by 93143 »

It seems Randy has had trouble getting the rights to the input data. Accordingly, he appears to have written a tool to extract the data from the ROM itself, and is testing the results in bsnes.

At least, that's how I interpret what's going on in that thread.
User avatar
Nikku4211
Posts: 569
Joined: Sun Dec 15, 2019 1:28 pm
Location: Florida
Contact:

Re: SNES Doom Source Released! Now What?

Post by Nikku4211 »

93143 wrote: Thu Aug 27, 2020 5:02 pm It seems Randy has had trouble getting the rights to the input data. Accordingly, he appears to have written a tool to extract the data from the ROM itself, and is testing the results in bsnes.

At least, that's how I interpret what's going on in that thread.
What about extracting and converting the data from the WAD?
I have an ASD, so empathy is not natural for me. If I hurt you, I apologise.
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom Source Released! Now What?

Post by 93143 »

Some of the levels are modified. Notably, e2m2 (Refinery, which was the original e2m3) has a new secret exit, because the original secret exit was in Command Center (e2m5), which isn't in this game. But in addition, all of the partly transparent walls are missing, because Randy's engine doesn't support those (perhaps because it would have made performance even worse). Sometimes the transparent walls are outright missing, as in the last room of e1m1, but sometimes they're replaced with other geometry that does the same job, as in e1m3. Also, the texture pool is smaller due to ROM constraints, so walls are routinely reassigned to use one of the available textures.

...

I've played through the whole game on Hurt Me Plenty and Ultra-Violence (and episode 1 on Nightmare), using the final version of the circle-strafe hack in no$sns. Since no$sns runs the Super FX too fast, the frame rate is quite serviceable. All it's really missing is shorter input lag and a fix for sticky wall syndrome and it's basically there. Certainly one could make a much longer wish list (in fact I have), but the really serious playability issues stem from those four flaws.

Are there supposed to be missing monsters in Ultra-Violence? For example, Computer Station is eerily dead in spots, missing the zombies on the upper floor of the computer room as well as most of the enemies that you first see through windows over nukage pits. It makes the level easier, and it's only creepy for someone who's used to the enemy placement from Hurt Me Plenty. Another example: the trap rooms in Phobos Lab have a couple Demons each but nothing else, while on Hurt Me Plenty they were packed with Imps and Sergeants. Same with the dark room near the exit - a few pinkies, but none of the zombies that accompanied them on the lower difficulty level. Ultimate Doom on PC seems to have all those missing enemies and more.
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom Source Released! Now What?

Post by 93143 »

I just thought of another way in which Doom is metal. Stop me if you've heard this one:

You know how it's only former humans who drop ammo and weapons? The actual demons don't. So what this means is that in order to replenish your supplies, you have to rob zombies.

...

(Here was a long what-if piece about how SNES Doom's audio could be spruced up. If anybody cares, I'll re-post it.)
_aitchFactor
Posts: 2
Joined: Tue Sep 01, 2020 4:14 pm
Location: Melbourne, Australia

Re: SNES Doom Source Released! Now What?

Post by _aitchFactor »

long what-if piece
Please post it. I'd love to see it!
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom Source Released! Now What?

Post by 93143 »

First post just to ask that? Okay, I guess I'll re-post it. Don't hesitate to ask any questions you may have - it's not as long as it could be, and assumes some knowledge of SNES audio.

I originally deleted it because this sort of post never seems to get any engagement, and I felt that the people on here are probably more interested in actual programming achievements than in speculation. But there are counterexamples, as with the discussion about porting Metal Slug to SNES, and this post is similar in spirit to that, even if it's arguably in the wrong thread...

...

It strikes me that in the hypothetical case where a remake is made of Doom on SNES, something should be done to make it worthwhile to play even if the player has access to the PC version. One possibility lies with the audio. The music in the SNES version is often praised, but I feel that both the music and the sound effects could be done much better with a bit of time and trickery.

First, there are several problems with the audio in the existing port. The music and sound effects seem to be heavily constrained by ROM and ARAM limits, and are plagued by poor timing and janky channel sharing. The sound effects in particular are monaural, and are often played very late or not at all, both of which harm immersion and situational awareness.

A number of things could be done to improve this situation. With 6 MB of CPU-ROM combined with high-bandwidth HDMA audio streaming, each track could use its own largely unique sample set*, and sound effects could be easily streamed if they don't fit in ARAM (potentially allowing the full set of PC sound effects with full length and sample rate, while also leaving more ARAM for the music). The music could also be enhanced by streaming, and could use techniques such as chord samples, microsequencing, loop switching and pitchmod to save channels** and maximize the sound quality within the available ARAM and streaming bandwidth. The timing issues shouldn't require any fancy handling, as they are already unusual on SNES, but any streaming scheme should make sure to leave enough room in between data bursts for the audio engine to run with a high minimum frequency.

* The consumer-grade Sound Canvas GM sample set you usually hear the PC version's music on is not especially well-adapted for metal, and should be easy to beat. Heck, some of the non-metal tracks on SNES already sound better than the PC version in some ways...

** A lot of the tracks in the existing port use all 8 channels for music. The chainsaw is sufficient proof that this should be avoided if at all possible.

...

The sound effects shouldn't be in mono. In fact, they could easily be in Dolby Surround. It would also be good to have the pan position and volume dynamically updated as the player and the sound source move relative to one another.

Sound effects could be given echo when indoors. The echo settings might be fixed by the requirements of the music, but if the music could be written to not need echo, the filter and feedback (and perhaps echo volume) could be adjusted depending on the player's location. (The echo delay length would probably have to remain fixed throughout a level to avoid glitches.)

With streaming available, indirect sounds could use alternate samples with more muffling and (where appropriate) baked-in reverb. Indirect sound could also be panned to the propagation point instead of travelling straight through walls.

In order to allow as many sound effects as possible without the engine having to drop music channels (and to provide maximum quality), perhaps an MSU1 music option should exist. The default would be S-DSP music, of course, but having the option to free up the entire DSP for sound effects could be worth the slight break in authenticity. And really the MSU1, when used just for music, isn't that different from the Voicer-kun...

There could be an alternate sound effect mode, accessible when the music is off or the MSU1 is being used, in which multiple channels per sound effect are used together with location-dependent filter, feedback and echo volume settings to produce realistic true-stereo surround reverb.

Loud sounds such as gunfire or demon roars that occur close to the player could be accompanied by a software compression/limiting effect that attenuates everything else. This can be done effectively on SNES, as shown by some of Strobe's chiptunes, and could make the loudness profile of the sound effects seem more realistic without making the actual dynamic range of the game's audio too large.

Monster voices in the PC version were supposed to vary in pitch slightly to add variety, but it seems a bug introduced by a library update resulted in these sounds being varied randomly in pan position instead. Pitch variations combined with accurate pan would be good, but what about having a characteristic average pitch for each individual monster, with small variations around that? At least for former humans, who could be expected to vary naturally from tenor to bass...

...

I recently read a YouTube comment to the effect that (IIRC) the poster had the PC version of Doom when he was young, and was jealous of his friend who had the SNES version because he thought it was better. Now, obviously this wasn't true (presumably either his PC was very old or he hadn't actually seen the SNES version), but what if it were, at least in the limited sense of having better music and more realistic sound effects?

Oh, and couch deathmatches.
_aitchFactor
Posts: 2
Joined: Tue Sep 01, 2020 4:14 pm
Location: Melbourne, Australia

Re: SNES Doom Source Released! Now What?

Post by _aitchFactor »

I've been lurking since around 2016 or so, but I'm more active around the related Discord servers. Recently I've become pretty interested in SNES audio, which is why I felt the need to ask.
High-bandwidth HDMA streaming is a new concept to me. How fast is it and are there any particular caveats to it?
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: SNES Doom Source Released! Now What?

Post by lidnariq »

Blargg had a proof-of-concept a decade ago, capable of sending audio at the full capabilities of the S-APU's DAC (stereo, 16-bit, 32kHz). Unfortunately it uses all the CPU time of both the S-CPU and S-SMP, so it's not widely applicable.

Since then we've had people using HDMA to stream stereo BRR at 32kHz, so there's not too much impact on the main CPU. (I think. I can't find a link as easily)
93143
Posts: 1717
Joined: Fri Jul 04, 2014 9:31 pm

Re: SNES Doom Source Released! Now What?

Post by 93143 »

lidnariq wrote: Thu Sep 03, 2020 5:03 pmSince then we've had people using HDMA to stream stereo BRR at 32kHz, so there's not too much impact on the main CPU. (I think. I can't find a link as easily)
I'm unaware of anybody actually doing it, but the method I proposed is here: viewtopic.php?f=12&t=14634&p=178343#p178343

Due to real life being rather busy, my hobbies have been on a very slow burn, so I'm just now getting around to trying to write an audio engine. This will be part of it, and hopefully I'll be able to test it in the relatively near future.

_aitchFactor wrote: Thu Sep 03, 2020 4:45 pmHigh-bandwidth HDMA streaming is a new concept to me. How fast is it and are there any particular caveats to it?
There's a game called N-Warp Daisakusen that you may have heard of, by d4s. I've looked through the source code, and it looks like it uses HDMA to stream audio, writing to all four I/O ports once per scanline in data bursts lasting multiple scanlines, and using a cycle-counted loop on the SPC700 to get the data from the ports while remaining in step with the DMA unit.

This method can send four bytes per scanline while the transfer is active. However, perhaps due to synchronization issues with the unreliable ceramic oscillator used by the APU, d4s seems to have prioritized speed of data reading above all else, so as to leave as much timing margin as possible in the pickup loop. This results in a method where the SPC700 simply dumps the data on the stack during the transfer. After the transfer, it then has to pick up the data from the stack and put it where it's supposed to go. This limits the average data rate, because you can't be sending more data while the previous data burst is being copied to its destination. Also, the fixed-length pickup loop raises compatibility questions when using long data bursts, because there's no way to re-synchronize, or even detect a problem, during a burst.

When I say "high-bandwidth" HDMA streaming, I'm referring to a method that writes the data from the I/O ports directly to the desired address, eliminating the secondary data move loop. Such a method should additionally be capable of supporting long data bursts of at least 30-40 scanlines, with multiple bursts per frame. The method I've proposed uses self-modifying code on the SPC700 side to write the base destination addresses into the mov instructions in the pickup loop before a burst. Since my method is more timing-sensitive, partly due to the longer pickup process, it uses a hot-swappable delay section to pad the pickup loop to a specific length based on the observed clock ratio between the two processors (an HDMA timing pulse each frame should do nicely if you set up the SPC700's timers right, although you might want to put the frame length through a lowpass filter before using it).

This sort of method should be pretty light on the S-CPU. If I understand correctly, you can use indirect-addressed HDMA to send whole blocks straight from ROM with a couple of writes to the HDMA table.

...

How fast is it? That depends on how hard you want to hit the SPC700. It should be possible to fit several 27- or 36-line data bursts into one 224-line frame, with several scanlines in between bursts for the music engine to do tasks unrelated to the streaming. This could get you 600-620 bytes per frame easily enough, which on NTSC is enough for two 32 kHz BRR streams (Mozart in stereo) or three 22 kHz streams (Ken+Ryu+announcer).

The theoretical limit should be somewhere north of 900 bytes per frame in overscan mode, or somewhat less than that in 224-line mode. However, using the whole frame restricts the music engine to VBlank and therefore 60 Hz (or 50 Hz), which could cause timing issues in certain scenarios.

In the case of something like Doom, the active frame is shorter for PPU DMA bandwidth reasons. Since you can't casually mix DMA and HDMA without breaking compatibility with early-model consoles, you can't use any of the extended VBlank for HDMA audio streaming. And because the overhead (and gap size) is largely fixed, this means the data rate shrinks somewhat faster than the screen size. I figure the equivalent of four or five 11 kHz streams should still fit, given the display size in the existing port. (I believe Doom on PC uses 11 kHz for all its sound effects, with the exception of the Super Shotgun which doesn't show up until Doom II.)

...

Caveats? Well, first there's the fact that AFAIK no one has tried this, so it might still run into a showstopper. No one has brought one up since I posted my idea, and I haven't thought of any, but you never really know until you try it for real...

There's also the fact that while it doesn't load down the 5A22 much, it ties up the SPC700 perhaps even more than a conventional high-bandwidth occasional-sync CPU-to-CPU transfer would, because it has to waste about a quarter of each scanline waiting for the next data shot. Streaming three 22 kHz BRR sound effects at the same time is about 620 bytes per frame on NTSC, or 740 on PAL, which is roughly 60% of the SPC700's compute time, and there's overhead on top of that. Sadly, there is no way to automate data pickup on the SPC700 side.

This sort of method is limited to no more than 256 bytes in a single burst (and I think the method I've posted is only good for 252) because the index registers on the SPC700 are 8-bit. This is not a huge issue, because it's hard to reliably stay in sync much longer than that anyway.

It might be good for transfers to be a multiple of 9 bytes, at least when transferring sample data, because that's the size of a BRR block. If a single data burst isn't a multiple of 9 bytes, you end up with annoying constraints on ring buffer size in ARAM. Even as it stands, you have to make sure the buffer fits a whole number of transfers in it - add the requirement that it be a multiple of 9 bytes when the transfers aren't, and your options for small buffer sizes get pretty restricted.

Also, you have to be careful with how you tell the SPC700 other things (music control, sound effects, etc.). Obviously the S-CPU can't be writing to the I/O ports while HDMA is active, because it will corrupt the transfer. So you have to either include any additional instructions in the HDMA table itself or do your general audio control communications while HDMA is not running (which could mean during VBlank, and generally you want VBlank for video DMA). Receiving data from the SPC700 is easy if it's 4 bytes or less and you know when it gets written, because AFAIK reading the I/O ports doesn't disrupt anything, but for extended or unscheduled communications the same considerations apply: either use the audio HDMA channel to request data from the SPC700 and use a second HDMA channel to receive it, or do the I/O manually outside the range of the HDMA.

As mentioned above, if you trim the screen to get more bandwidth to VRAM (as Doom does), you leave yourself less room for HDMA. This hits harder if you're trying to keep some space between bursts to allow a high-precision music engine.

If you're trying to leave space between bursts, you ABSOLUTELY NEED to ensure that the music engine loop cannot take longer than the space you've allotted. HDMA won't wait for confirmation from the SPC700. If you miss the "prepare to receive data" command, you will miss the data burst, and you will probably end up misinterpreting data as instructions and doing something stupid. It might be feasible to design a leaner "quick loop" for this purpose that only handles stuff that can't wait until VBlank.

Finally, there's the sync issue. Not only do you have to make sure your pickup loop isn't going to desync during a burst, you have to make sure that if you're playing a streamed sound, you don't get buffer overrun or underrun. There are a couple of possible ways to handle this: 1) APU-side sync, where the pitch is adjusted to match the incoming data rate, or 2) CPU-side sync, where the amount of data per frame can be adjusted to stay in step with the playback. If you're just doing a 32 kHz stereo demo, you might be able to get away with using all of ARAM as a giant ring buffer, meaning sync probably won't become an issue for at least a couple of minutes regardless, but if you're using small streaming buffers to save ARAM in a game scenario you'll want to pay attention to this.

Also, sync pulse handling is very important if you want any of the above to work properly. The SPC700 has to have a reasonably precise idea of what the clock ratio is before you start trying to stream something, so the audio HDMA channel may have to run all the time regardless of what the game is doing, just so the sync pulse will fire on the same scanline every frame like clockwork. Just measuring the clock ratio at boot is dicey because of thermal drift, so it's probably wise to keep it a live measurement.

This is going to be fun...
none
Posts: 117
Joined: Thu Sep 03, 2020 1:09 am

Re: SNES Doom Source Released! Now What?

Post by none »

Sorry if I'm talking gibberish, my knowledge of SPC700 coding isn't to deep, especially about the timing things, just some ideas.
The method I've proposed uses self-modifying code on the SPC700 side to write the base destination addresses into the mov instructions in the pickup loop before a burst.
Can't you use indirect moves instead and just move the direct page to where you want to stream the data? Also if the loop is unrolled, can you maybe just indirect jump into the right spot in the loop in the beginning of the burst so that transfer will start at the right target address?
Also, sync pulse handling is very important if you want any of the above to work properly.
Maybe you can pass the current buffer positions / stride to the CPU during sync, and adjust the burst length based on that?
Post Reply