fullsnes - new SNES hardware specs

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
User avatar
Posts: 7884
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Post by Bregalad » Fri May 18, 2012 1:58 pm

Holy shit. Our entire SPC7110 emulation is substantially wrong in so many, many ways ... this is the most poorly emulated of coprocessor of all at this point, even beating out the terrible SA-1 knowledge.
Should I conclude the Super-FX is well known ? Because this chip always fascinated me and I always wondered how it works internally.
"Mode 7" effects on sprites kicks ass.
Life is complex: it has both real and imaginary components.

Founder of higan project
Posts: 1550
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near » Fri May 18, 2012 9:10 pm

I started another thread to discuss the decompression. It's my opinion at this point that our evolution tables are okay.
I think the problem is that the chip is 'glitching' out, just like the S-DD1 did. Feed it a certain input pattern and it can't handle it and goes screwy for the rest of that decompression.
Usually it would end up repeating the same byte forever (even after 8MB of reads) until the next decompression.
My bet is that Epson provided a compressor that could detect when the input would cause a problem and force a 'renormalize' to avoid it.
For whatever reason, $4807 -appears- to greatly exacerbate this behavior, as my output logs show errors much faster this way.
I believe this error is what throws off my algorithms below, because their first mismatch appears to be the first mismatch read from an offset that would fail on $480b=0.
So ... we really need to emulate this 'edge case' that breaks the SPC7110, but I don't have a clue how to do that. I put up a $100 bounty if anyone can figure it out.

That said, here's the final word on $4807. Consider it a "length" parameter, and $480b.d0 is the enable setting.
length = $480b.d0 & 1 ? $4807 : 1; //$4807 can be zero

The decompressor has to be modified to handle this mode.
First, consider that the modes work like this:
Mode 0 = 1bpp 8x8 (8 bytes)
Mode 1 = 2bpp 8x8 (16 bytes)
Mode 2 = 4bpp 8x8 (32 bytes)
Mode 3 = invalid (doesn't start the decompression so you always get 0x00)
Modes 4-255 = mirrors of 0-3 (decompressor only looks at low two bits.)
Now every time the buffer is empty (which it obviously will be at the start of a new decompression), what you want to do is load one tile worth of -output- data from the appropriate mode. To do this, you need to generate a number of input tiles first.
So next, load max(1, length) tiles in to dcu_tiledata[256 * 32];
Yes, even if length was zero, what ends up happening is the first byte of decompressed output repeats forever, which may not be 0x00, it's up to the data.
Now that you have your tiles, call the current mode's below deinterleave function to generate one output tile from your input tile pool into dcu_output[32];
from here, $4800 can be read until it's empty (its size is based on the mode: 8, 16 or 32 bytes of data will be in here.)
The nice thing about doing it this way is you get rid of bitplanebuffer entirely. You can write to direct offsets during decompression now, so it's unnecessary.
After doing all of this, the behavior of $4807 finally becomes rather simple:

Code: Select all

void SPC7110::deinterleave_1bpp(unsigned length) {
  uint8 *target = dcu_output, *source = dcu_tiledata;
  for(unsigned row = 0, sp = 0; row < 8; row++) {
    target[row] = source[sp];
    sp += length;

void SPC7110::deinterleave_2bpp(unsigned length) {
  uint8 *target = dcu_output, *source = dcu_tiledata;
  for(unsigned row = 0, sp = 0; row < 8; row++) {
    target[row * 2 + 0] = source[sp + 0];
    target[row * 2 + 1] = source[sp + 1];
    sp += 2 * length;

void SPC7110::deinterleave_4bpp(unsigned length) {
  uint8 *target = dcu_output, *source = dcu_tiledata;
  for(unsigned row = 0, sp = 0; row < 8; row++) {
    target[row * 2 +  0] = source[sp +  0];
    target[row * 2 +  1] = source[sp +  1];
    target[row * 2 + 16] = source[sp + 16];
    target[row * 2 + 17] = source[sp + 17];
    //the purpose of this is that every time the address crosses over a 16-byte boundary, we add 16 more (so it crosses a 32-byte boundary always.)
    //this allows us to never copy the same source word to more than one target word
    sp = sp + 2 * length + 16 * ((sp + 2 * length) / 16 - sp / 16);
Now here's the thing. Deinterleave is the best name I can come up with for what this is doing.
Although the actual effect on using it with data appears to be to -interleave- it, my only theory is that this is meant to compress data that was interleaved in raw form.
My guess is that somehow, these functions will shrink the size of the compressed data size in certain cases; but I honestly can't see how.

So by all means ... if anyone can make sense of how this would -ever- be useful for -anything-, please let me know, so that we can name the function better.


Anyway, I still need to write emulation of the invalid BCD increment behavior on the RTC, and test some data port behaviors (want to verify the control settings register.)
Once that's done, I'll try and get a final doc up on how this all works. It'll be a monster, though. This chip is the epitome of edge case behavior.

> You've seen my pseudo-code decompression functions, don't you?

They're a bit tricky to follow. I've actually already done a lot of simplifications on the original algorithm (which was designed to show you how the chip works), but it's tricky code.

> Timing might be also a problem, at least when using DMAs which expects the decompressed bytes to be ready within 8 clks. But as far as I understood, you are already using non-DMA reads for that purpose? Does that make a visible difference? More distorted results with DMA, and less distorted without DMA?

Yes, I was using direct reads. When it comes to $4807=#$ff with mode 2, that would seem to require the SPC7110 to really go into hyperdrive. That's basically 255 bytes that need to be decompressed for every read to $4800.

I am betting heavily that a DMA will result in seeing the same value repeated when you read $4800 too quickly, but I haven't gotten to that yet. Kind of afraid of that, to be honest. We likely need the decompression context to be threaded, and consume cycles for each byte output.

> Should I conclude the Super-FX is well known ? Because this chip always fascinated me and I always wondered how it works internally.

Well, we are using the official mnemonics for the SuperFX opcodes, so take from that what you will. The SuperFX has less added functionality than the SA-1. The latter seemed like Nintendo was holding a contest for who could come up with the most functions to add to the chip. It's so bad that ~30-40% of the chip's functionality are never even used by ANY SA-1 games. Plus SFX doesn't allow both the CPU and chip itself to access the same memory at the same time. The SA1 does this by stalling the opcode cycles on the SA1 side when there would otherwise be a conflict. I'm sure both SFX and SA1 are missing a lot of edge case behavior, though.

After I get the RTC BCD and data port last bits in, I'll be happy enough to consider the SPC7110 better emulated than the SFX and SA1.

Posts: 1211
Joined: Fri Feb 24, 2012 12:09 pm

Post by nocash » Sat May 19, 2012 11:16 am

> when you read $4800 too quickly, but I haven't gotten to that yet.
> Kind of afraid of that, to be honest.
Yes, with DMA timings, effects may be frightening. But what I meant was, did you make sure that those effects do NOT occur when NOT using DMA?

Maybe there's always a timeout after 8 clks... or maybe a limit on the "while" loop to execute only max so-and-so-many times.

For RTCs there are two ways to emulate them:
In sync with the SNES CPU: pausing the RTC when emu is paused, and running RTC at double speed when emu is set to run at double speed. From the emulation-view that'd be most accurate.
Or, in sync with the PC's RTC: Simply ignore anything written to the RTC registers, and always copy the current PC time to the SNES time. I think, from the user's view, that would appear more accurate.
In so far, I wouldn't emulate the RTC edge-cases at all. But anyways, knowing the behaviour on things like invalid BCD values would be interesting.

Founder of higan project
Posts: 1550
Joined: Mon Mar 27, 2006 5:23 pm

Post by Near » Sat May 19, 2012 4:06 pm

> But what I meant was, did you make sure that those effects do NOT occur when NOT using DMA?

Aside from having a very long pause between each byte transfer to send to my PC over serial UART (~17KB/s decompression rate), no.

I suppose I can stall it out 4x as long to see if there is any difference in output.

> For RTCs there are two ways to emulate them:

I know of three.

1. use PC time. This is the least accurate. It will ignore your in-game features like time setting and 30-second adjust (round to nearest minute, a crude 'slight clock drift' fix.) Games will easily be able to detect this if you use save states, or slower/faster than normal emulation. And most importantly of all, FEoEZ limits you to 1995-2014. It has no events after that. Although you can push it to 2094 manually, I don't know what will happen if you do. And although there isn't a person that will care about this chip in 2094, an emulator that does this will fail then. I confess you have to be hardcore to worry about an event guaranteed to be after your own death.

2. let the program code set the RTC time, but only tick the RTC time when a second has passed on your PC. When you unload the game, save the timestamp. When you load the game, add time - timestamp seconds to the RTC clock. The nice thing is this lets you set any time you want in the game, so the time setting stuff isn't useless. The 30-second adjust also works. And importantly, this is needed to pass the SPC7110-RTC self-test without cheating. It's pretty good about keeping your time accurate across runs of the emulator. But this one is also easily detectable via save states and speed adjustments.

3. total RTC emulation. Let the program code set the RTC time, and tick the RTC every one emulated second. Save states save the time, load states load the time. You still do the save timestamp on unload, and time - timestamp adjust on load. 30-second adjust of course works, and the game is completely unable to detect any issues with the clock. The big downside here is that the clock desyncs very easy when you start to cheat or speed through dialogue or whatever.

I go with option 3. Have to put accuracy before convenience. However, I'm planning to include a time-adjust tool that lets you edit the saved RTC clock to any time you want. Or perhaps I'll just store the saved RTC time in plain text or XML. Then you can just hand-edit it.

Posts: 21985
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Post by tepples » Sat May 19, 2012 5:25 pm

Meet Chester.
This was my character back when I played Animal Crossing.

You could do what Animal Crossing does: store the offset between the in-game time and the system time. In Animal Crossing, I can't adjust the actual RTC time, which is protected by the system menu's privileges, but I can adjust the in-game time. This is saved as an offset (in seconds?) between the in-game time and the system time. I've used this as a time zone offset when going to different towns.

If you implement an RTC using a time zone offset, you can save the offset whenever the player sets the RTC and then load the offset when loading a ROM. That way you get the best of #2 and #3: one RTC tick for every 21.5 million master clock cycles and none of the time-warping having an effect over power cycles.

Posts: 224
Joined: Mon Jan 23, 2006 7:47 am
Location: Germany


Post by creaothceann » Sat May 11, 2019 5:15 am

nocash wrote:
Lemmings 2 supports the super scope for shooting lemmings, but it does not appear to have software calibration.
Whew, one more rare scope-game. Hmmmm, calibration... it seems to randomly add/subtract different offsets each time when starting the game... ah, got it! Works like so: When the game starts, you need to aim the gun at the crosshair; that needs to be done quickly (within the first 1-2 seconds or so).
EDIT: With the black screen background & and thin crosshair, I am wondering if the calibration does actually work on real hardware - did anybody ever try that?
More fun facts about the SNES version of Lemmings: https://youtu.be/ybs5FR-uUNI?t=2182
My current setup:
Super Famicom ("2/1/3" SNS-CPU-GPM-02) → SCART → OSSC → StarTech USB3HDCAP → AmaRecTV 3.10

Post Reply