It is currently Sun Oct 22, 2017 7:49 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 40 posts ]  Go to page Previous  1, 2, 3
Author Message
PostPosted: Tue Oct 03, 2017 7:19 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2295
I wonder if the Super FX chip really is better than the SA-1 for this game. The SA-1 can do character conversion DMA.


Top
 Profile  
 
PostPosted: Tue Oct 03, 2017 3:41 pm 
Offline

Joined: Sat Jun 01, 2013 11:55 am
Posts: 23
Location: Maine, U.S.A.
Possibly. But SA-1 is expensive, and few emulators support it.


Top
 Profile  
 
PostPosted: Tue Oct 03, 2017 4:44 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
psycopathicteen wrote:
I wonder if the Super FX chip really is better than the SA-1 for this game. The SA-1 can do character conversion DMA.

It can also execute at nearly full speed from ROM, and has a real divider (MMIO, but it's much better than the S-CPU's divider, and it beats the Super FX's divide-by-two instruction and cumbersome lookup table access). It also supports 8 MB of ROM, vs. the Super FX's 2 MB (yeah, you can add CPU ROM, but that's only useful for stuff the Super FX doesn't need to know about).

It shouldn't be hard to have the S-CPU run mostly in WRAM and only touch BW-RAM (and possibly ROM) for DMA. This gives you 8 master cycles per double pixel, leaving 36 CPU cycles to prepare the data and load the opcode(s) for the write(s) before you reach the Super FX bottleneck. Mind you, this only applies to walls; the Super FX is probably unbeatable at filling floors and ceilings and even drawing backdrops, and objects/enemies can probably be done line by line too.

If the mosaic trick could somehow be got working, it would double the Super FX's speed ceiling for doubled pixel columns, whereas the SA-1 would only see a small benefit...

ap9 wrote:
Possibly. But SA-1 is expensive, and few emulators support it.

What? Every emulator supports the SA-1. It's not accurate, but it's there. And as for it being expensive, every game that used it appears to have used it primarily as copy protection, or possibly as a laziness budget/schedule buffer. There's nothing about any of them that looks like it needed a coprocessor, never mind one as powerful as the SA-1.

Are you thinking of the ST-018?

ap9 wrote:
I remember a while back stepping through the Mac version of Doom in a debugger to discover a horizontal technique to render more than one pixel at a time by storing two 16 bit portions in a 32 bit register, for the low-res. mode. This method, of course, is not conceivable on a 16 bit system.

That sounds like what the Super FX does for its automatic overhead-free checkerboard dither. I've abused the dither feature to get two pixels out of a single ROM read when drawing unscaled 2D graphics. Of course, since the COLOR register is 8-bit, dither only works for 4bpp or lower...

Quote:
DOOM uses a lot of 32 bit math (esp. fixed point math); a proper port of Doom to SNES would be pretty slow in software because of the 16 bit limit.

How necessary is that precision, though? If you're using a 32-bit CPU, it would seem natural to use 32-bit operations.

On the SNES, we typically find that position can be expressed as a 24-bit fixed-point number (or a 32-bit number, for data integrity reasons) while velocity is fine with 16 bits. This sort of mixed-headroom math can speed things up considerably versus simply doing everything at 32-bit. It's also basically impossible in C...

Quote:
The engine Williams Entertainment used was maintained mostly in C, so there was plenty of room for performance improvements.

You sure about that? I was pretty sure no C compiler existed for the Super FX. Leaving aside the fact that it was something of a niche market, the chip is weird enough that it might actually be worse for C than the 6502.

I can see the S-CPU code being C, but the GSU code would probably have to be assembly.


Top
 Profile  
 
PostPosted: Thu Oct 05, 2017 7:38 pm 
Offline

Joined: Sat Jun 01, 2013 11:55 am
Posts: 23
Location: Maine, U.S.A.
93143 wrote:
ap9 wrote:
Possibly. But SA-1 is expensive, and few emulators support it.

What? Every emulator supports the SA-1. It's not accurate, but it's there.

At the time I was reading up on more exotic implementations for expansion chips (e.g., non-PC emulators). I should've used the word 'fewer' in regards to the general scope… yes, all popular SNES emulators support SA-1. However, 'few' would still apply to completeness and accuracy; bsnes is the first (and still relatively new) to use low-level reverse engineering for expansion chips, whereas most emulators cannot be expected to even work with lesser known operations. SuperFX is better known.

Quote:
On the SNES, we typically find that position can be expressed as a 24-bit fixed-point number (or a 32-bit number, for data integrity reasons) while velocity is fine with 16 bits. This sort of mixed-headroom math can speed things up considerably versus simply doing everything at 32-bit. It's also basically impossible in C...

While precision can be reduced without breaking the game, it wouldn't nearly be an authentic port without all the fixed-point and trig operations a game like Doom uses. Of course, some operations may only need 24 bits or less, and rendering can be scaled down considering a smaller window.

The SA-1 could certainly help as a math co-processor if nothing else.

Quote:
I can see the S-CPU code being C, but the GSU code would probably have to be assembly.

Yes, I was referring to the S-CPU side. One of the leading people who worked on SNES Doom said the main level engine was maintained in C, and if optimized in assembly could have increased overall performance by maybe 30%.


Top
 Profile  
 
PostPosted: Thu Oct 05, 2017 10:11 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
ap9 wrote:
most emulators cannot be expected to even work with lesser known operations.

Okay, that's fair. I've got to say, though, I can't bring myself to care much, particularly since AFAIK nobody is planning to actually do anything about this. The scenario is either "how could SNES Doom have been better" or "how could a better SNES Doom hypothetically be made?".

Even if we were to actually attempt this, it would be hard to convince me that ZSNES sucking is a good reason to use an inferior chip (if the Super FX really is inferior for this application). If the SA-1 were actually too expensive in early 1996, that would be something to consider, since really the only reason to do this is to prove that it could have been done, but if a game works on real hardware it's not the game's fault if it doesn't work in an emulator.

Quote:
While precision can be reduced without breaking the game, it wouldn't nearly be an authentic port without all the fixed-point and trig operations a game like Doom uses.

Sure, but are you really going to notice the difference between 1/256 and 1/65536 positional precision? Just the fact that the frame time is a multiple of 1/60 s instead of 1/70 s would make a bigger difference, I'd expect. And the trig would all be precomputed and tabulated.

How big is "one" in 16.16 on the actual map? What's the fastest any object can go?

Quote:
The SA-1 could certainly help as a math co-processor if nothing else.

No, it would have to be used to do the rendering as well. I'm pretty sure it draws too much current to pair with the Super FX, even if the memory arrangement were compatible, and the S-CPU is too weak to render Doom at a decent speed and resolution.

Quote:
One of the leading people who worked on SNES Doom said the main level engine was maintained in C, and if optimized in assembly could have increased overall performance by maybe 30%.

Wow. The S-CPU side was the bottleneck, then?

One of the neat things about the Super FX (that Nintendo offered but nobody ever used) was the concept of CPU ROM - you can add up to 6 MB of extra ROM to the cartridge in parallel with the GSU. The S-CPU can then access that ROM freely without getting in the GSU's way or having to jam everything into WRAM. And since it's in the top half of the memory map, you can use FastROM, which is normally impossible with a Super FX. Combine this with what you've just said, and that's possibly a ~50% speed boost without touching the Super FX code at all (unless that 30% is just how much faster it would be if the GSU were the bottleneck).

Not to mention (as I already have) the space that would be freed up in the main Super FX ROM, which seems to have been a limiting factor.
...

I wonder why they didn't allow circle-strafing? Doesn't seem like it'd be that hard to do...

...

The backdrops might be doable with the second background layer, though the colour depth would suffer. But it might not make much difference - clearing pixels on the Super FX takes nearly as long as drawing an unscaled backdrop (if the backdrop is stored correctly). And while it might be neat to use full resolution, it might contrast oddly with the low-detail 3D and have players reaching for their reading glasses...

...

I'll try the mosaic trick again when I get time. It doesn't seem right that it should be impossible to pull off. (Speaking of things emulators don't support properly...)


Top
 Profile  
 
PostPosted: Fri Oct 06, 2017 6:19 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2295
What exactly is the mosaic trick?


Top
 Profile  
 
PostPosted: Fri Oct 06, 2017 7:03 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
The idea is to be able to draw in double-wide pixels without having to draw and DMA each pixel twice. The pixels in the graphics data would alternate between line N and line N+1, and by manipulating mosaic and scroll with HDMA you turn this:



into this:

▄▄▄▄▄▄▄▄
▀▀▀▀▀▀▀▀

This also allows you to draw two double-wide pixels per cache flush rather than just one when drawing vertical wall strips on a GSU, which doubles your theoretical pixel throughput. It's more complicated, but it could be worth it.

The problem is that mosaic on the SNES behaves really weird; I tried to make this work one evening when I had a few minutes, and I only managed to make it work on Snes9X, not on real hardware. Furthermore, every emulator behaved differently and higan accuracy was wrong.

...

The other problem is that some of the floors/ceilings do use dither for shading, and this scheme would break that. The different floor levels mean you couldn't just use a second layer with HDMA - or at least, if you did it wouldn't look the same. Maybe some sort of hybrid method... But it might still be worth it if the frame rate improved enough (which might also require an overhaul of the SNES-side engine and possibly an augmented cartridge design, judging by what ap9 is saying).

(By the way, that pretty much has to be manual dither. Automatic dither only works in 2bpp and 4bpp. But there's not much in the way of savings to be gained here unless they wrote really bad code, since dumping a full 8bpp pixel buffer to GPRAM takes longer than filling it with manually-alternated colours.)


Last edited by 93143 on Fri Oct 06, 2017 7:48 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Fri Oct 06, 2017 7:37 pm 
Online

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19116
Location: NE Indiana, USA (NTSC)
I'd be interested to see your mosaic test ROM to see if it differs between a 1/1/1 and a 2/1/3.


Top
 Profile  
 
PostPosted: Wed Oct 11, 2017 3:39 am 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
I'm sorry; I meant to get on this over the weekend, but I didn't. My current test is really primitive and doesn't prove a whole lot, so I'd rather do some more investigating before I post anything.

I'm also starting to wonder if it's possible to just draw two vertical strips at once (assuming the original doesn't already). I know 512 bytes isn't a lot of code, but since the RAM buffer is such a massive bottleneck I imagine it might be possible to write a sufficiently general method without slowing things down. (I'd suggest doing four strips at a time, but I get the feeling it would be too branchy and wouldn't fit in the cache; also, from experience, 15 registers isn't as many as it sounds like.) This would allow you to use a normal framebuffer, permitting dithering on the floors (if it's still necessary to have blank floors with the additional ROM space) and perhaps even drawing distant enemies in full resolution (I've been playing the game a bit and it's shockingly hard to see what's going on).


Top
 Profile  
 
PostPosted: Thu Oct 12, 2017 3:07 pm 
Offline

Joined: Fri Jul 04, 2014 9:31 pm
Posts: 788
93143 wrote:
dithering on the floors (if it's still necessary to have blank floors with the additional ROM space)

Lessee...

Code:
with R7
add Ry
with R8
add Rx
getc
to R14
merge
plot
loop
plot

Ten cycles. At 8bpp, that's enough for two pixels when writing a solid sliver, or one when blitting a non-solid sliver (ie: in neither case does the code execution time exceed the secondary pixel buffer flush time). Apparently in this case, texture mapping is basically free for long enough runs if you can get the initial texture coordinate and increments fast enough.

But that only works if the texture is 256x256, which might be a tad excessive. How about this?

Code:
with R7
add Ry
with R8
add Rx
getc
to R14
merge
with R14
and Rmask
plot
loop
plot

Twelve cycles. 20% slower than the pixel buffer for solid slivers. If the walls are as much of a bottleneck as I suspect, this might be acceptable. Requires the floor/ceiling textures to begin at the start of a bank, which there are only 32 of; you'd have to sacrifice another 20% to add an offset to R14. But honestly, how many floor/ceiling textures are there in Doom? It helps that bit 2 of CMODE allows you to pack two 4bpp textures into the same area of ROM, and bit 3 lets you do palettized 8bpp with 4bpp source data.

The texture format implied by this procedure is kinda fragmented, with a horizontal run every 256 bytes and free space in between. The wall drawing routine probably has more time for fancy stuff, so adding an offset to wall texture coordinates is probably fine.

If you wanted to do distance gradations in the texture colours, you might have to add an offset to the colour before loading it, adding another 20% - unless you could somehow arrange the palette so that just subpalettizing with CMODE bit 3 would do the trick. Wall textures already have lighting and distance gradation effects, but then the walls probably have a ton of extra time to fiddle with the colour indices... Overall the master palette is much more of a garbled mess than the one in the PC version. Possibly this is partly to fit stuff like the gun and hand graphics into 4bpp subpalettes, though that doesn't explain the mess they've made of the bottom half... wait. There are duplicate colours in the SNES palette. Adjacent in gradient runs, no less. Maybe the palette could indeed be optimized better... Megamanning the gun and hand graphics and using 8bpp for Doomguy's face might eliminate some of the clutter too... You know what? I bet they used CMODE bit 3 for the enemies and objects. That could explain the whole mess.

Yeah, this one's a maybe for now. I wonder how much non-GSU data is actually in that ROM?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 40 posts ]  Go to page Previous  1, 2, 3

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group