SPC7110 Reverse Engineering Project

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
neviksti
Posts: 205
Joined: Thu Jun 22, 2006 11:39 pm

Post by neviksti » Sat Jul 12, 2008 1:59 am

I may have some bad news, but I'll save that for another post as I still haven't convinced myself what is going on yet.
caitsith2 wrote:So far, All of SPL4's mode 0 decompressions, and ALL of MDH mode 0 decompressions are BIT PERFECT. (Other than one isolated case in MDH, where the real hardware decompressor crashed, causing a repeated byte to be output for the rest of that 32KiB run. I have seen that crash before. I bet if I redumped that exact entry, it might be BIT PERFECT as well.)

I believe 100% of the mismatches in the feoez-sjns raw packs, are way past the normal requested size for each of the entries, as are 100% of the feoez mismatches that are past 25 bytes.
Alright!
Thanks for running all those tests.
byuu wrote:Hmm, I have an FEoEZ cart, and two SNES units. I'd be willing to try and perform this mod, though I'm 98% certain I'll just ruin hardware.
I'm amazed you doubt yourself so much.
Nice to hear you got everything going eventually.

As a quick test you can use the ROM I posted in the SF3 Exploit thread in this subforum. Run it in "normal" mode or whatever your copier calls it (the mode which doesn't allow the savestates,etc.), the exploit part of the code will of course do nothing, but you'll still be left with a memory viewer and modifier program running entirely from RAM.
byuu wrote:Thanks again for everything. Getting the SPC7110 emulated was one of my main goals for getting involved in SNES emulation, so you've really made my day/week/month here :)
It's really an amazing "perfect storm" of people happenning to have time to work on this all at once. I haven't checked on this board in probably over a month and happenned upon this thread almost right after it got started. Was anyone notified of this project, or did everyone here just luckily happen upon it?

Amazing.

neviksti
Posts: 205
Joined: Thu Jun 22, 2006 11:39 pm

Post by neviksti » Sat Jul 12, 2008 3:56 am

Okay, I'm still stumped. So let me just lay it on you...

Here's the symptom:
I can't calculate the probability values for each bit. Within 10 or so bits it always fails.

This is very very bad.
Calculating these probabilities depends in no way on the evolution table, the mps prediction, or almost anything except the fact that it is 8-bit arithmetic compression.


My guesses:
I'd be amazed if they didn't use 8-bit arithmetic compression for everything. So let's take that as a given for now. The only additional assumptions I'm making in the code are twofold:
- the symbols encoded correspond one to one with an output bit
- an assumption must be made about the bit ordering.

Looking over previous data again -
http://neviksti.com/SPC7110/firstbyte_1.txt
http://neviksti.com/SPC7110/firstbyte_2.txt
If I number the bits such that mode 0 encoded them in order
0123456789ABCDEF
mode 1 seems to encode them in order
08192A3B4C5D6E7F and I assume then starts over with the next word.

There is only enough data to actually fix the ordering for the first 10 output bits.


I changed the code accordingly anyway, and it fails.
Using input_7030_an1.bin, the probs.mps."what bit actaually was" are:
5A.0.lps 5A.0.lps 25.1.mps 25.1.lps 2D.0.mps 5A.0.mps 0C.1.mps 7.0.mps 3.0.lps ??.1.

So one of those assumptions must be wrong (or there is an implicit assumption I'm making that is wrong). Worse yet, there is enough data to fix the bit order to the bit which the prob calculation failed.

That means, it is this assumption that is wrong:
- the symbols encoded correspond one to one with an output bit

(This still seemed to work for the first four bits though.)


For further evidence/proof try to figure out the bit order for mode 2. There is enough data to show clearly that there is no valid bit order.

(However it seems to work for the first 2 bits.)

;-------------------------------------------------------------------------

So if the symbols we're decoding are not bits, what are they?

Some clues: we saw previously that mode 2 still saturated at roughly 1 bit in -> 128 bits out. So we're encoding the same information just in a different format (ie. we're not encoding symbols of something like "are the remaining symbols of this word all as predicted?", but it could be something like encoding "average" of three bits and then the first two of these bits).

At this point I'd guess we are seeing some kind of layering encoding (encode "averaged" lower resolution image, then use this information to help encode information for the next level resolution, etc.). Mode 0 may be "no layering", mode 1 "1 layering" (which looks slightly messed up when interpretting symbols as bits in the beginning), mode 2 "2 layerings" (which looks seriously messed up when trying to interpret symbols as bits in the beginning).

...
I'll pause now to gather my thoughts.
That document linked earlier with the q-coder table also mentioned such "layering" encodings. So maybe that is it.

Any thoughts?

EDIT: Nothing of substance added, just reworded some things for clarity.
Last edited by neviksti on Sat Jul 12, 2008 6:01 pm, edited 1 time in total.

kammedo
Posts: 57
Joined: Wed May 28, 2008 5:43 am

Post by kammedo » Sat Jul 12, 2008 6:24 am

Shots out of nothing :
"Layering" recalls alot "buffering" (something like LZ?).
Perhaps they had a specialized compression alg for each bitplane? As in
1bpp = mode 0
2bpp = mode 1
etc

jolly_codger
Posts: 14
Joined: Wed Jun 04, 2008 10:38 am

Post by jolly_codger » Sat Jul 12, 2008 7:04 am

Holy poons! Very fantastic work Neviksti (and Andreas Naive).

Doubtful I'll be of any practical help - won't work on this till much later.

IIRC:
(1) = ..? This one throws out zero bytes really fast
(2) = 2/4-bpp SNES bitmaps

The one that recalls to memory from FEOEZ:
(2) 13 C0 89 80 00 00 00 00
-->

Code: Select all

bitplanes
0  1  2  3
00 00 00 FF
00 00 00 FF
(..)
00 00 00 FF
00 00 00 FF

Stupid random thoughts:
What if it kept working on bp0 until a renorm (bit shift)? Then it switches to bp1.

I'm assuming the data isn't byte/word-aligned per plane.
Or it could work per nybble. 4 first output bits = bp0. Then switch state to bp1.


EDIT:

(Thought dump)
(0) We take only 8-bit startup0 cost for the initial 'val'.

(1) FFFFFFFF --> A8C3DD34
Which makes no sense in terms of (0) alone.

Let's say there is an additional setup layer - startup1.
Maybe assume that startup1 > 32 bits for all inputs.
Then afterwards it eventually moves to MODE 0 functioning.

Perhaps there's some mid-area of the bitstream which we can latch onto as the point MODE 0 kicks in. Then we can work backwards from that precise area.

(area when output >= 0x80xxxxxx)
0. A6F7FD8C 818F303E
1. A68E03D6 FF00AFF9
2. A60369EB 8078209C

(peculiar moments)
1. 01020408 00000000
2. 395366DB 00000000

(2) Just thinking that maybe it's MODE 1 x 2 pipelines. Then MODE 0 x 2 pipelines.

2. FFFFFFFF 91C9733F

neviksti
Posts: 205
Joined: Thu Jun 22, 2006 11:39 pm

Post by neviksti » Sun Jul 13, 2008 5:24 am

Okay, we're back on track.
I'm tired and haven't even looked at the data yet besides openning it to make sure the program spit out reasonable looking data, but the point is we now have prob data. On a glance it looks like it uses the same probability evolution table ... so you guys may have this figured out by the time I wake up.

Thinking about it some more I couldn't figure out how each symbol appeared to be a bit in the beginning and failed miserably afterwards. The manner in which it failed was interesting though. Sometimes there would be a bit you could choose which was completely "segregated" in the current TOP - BOT range of inputs. Othertimes there literally was none.

These two seemed impossible together unless:
1] the first few data bits were treated differently (like "startup" data)
2] the only "layering" was into the bit depth of each pixel... not inbetween pixels (like the document with the q-coder suggested)

I didn't have enough data to distinquish these, and the first one seemed less likely. So I changed the SNES program to search for prob assuming #2... using input_7030_an1.bin, it made it all the way through!

This is practially statistically "impossible", unless my guess was correct. So there you go, we know enough about the symbols to extract the real info ... the prob values and (sort of) the mps values.

The way to get around this bizarre depth issue was kind of cludgey in the code, so what I have saved instead is the two bit values at the top of each span. Unfortunately this isn't enough to let you figure out how all four sections are mapped to the 2 bits. I can change this if necessary, but for now figuring out the contexts and how they interleave seems more important.

Here's the data:
http://neviksti.com/SPC7110/output1_7030_an1.other.dat (the .other. to represent the 'other (unintended) path' described below in my edit)

(EDIT: Warning! I just realized there was an error in my code. The first bit of every pixel it always follows the opposite path it should to match the input data. The data is still good though, as it still represents a valid path (and all the data in the file should be correct for that path), but it means the second bit will always end up taking the lps path. I hope this doesn't restrict your ability to learn from the data too much. I'll do a new run after some sleep.)

The format is different from the .bin output files.
For each symbol I now have two bytes:
first byte is prob value
second byte
bit 7 - first bit of 2bit color at top of current input range
bit 6 - second bit of 2bit color at top of current input range
bit 4 - 1 if lps path was taken
bit 3-0 - number of renomalizations before moving onto next symbol


While that long data run went on, I looked through the "first byte" mode 2 data.

snes format has the first pixel of 4bit color stored as such
bit 0 = first bit of output byte 0
bit 1 = first bit of output byte 1
bit 2 = first bit of output byte 16
bit 3 = first bit of output byte 17

Sure enough, looking at the data (only list changes)

Code: Select all

input byte, [output bits]
00 [0000]
13 [0001]
1F [0010]
36 [0011]
4C [0100]
62 [0101]
7A [0110]
8F [0111]
A6 [1000]
B2 [1001]
BD [1010]
C8 [1011]
D3 [1100]
DE [1101]
E9 [1110]
F5 [1111]
Looking at the 00-13 range, I was even able to pick out where the next pixel steps through in the same way.


So we now know what mode 0, mode 1, mode 2 mean, and what they were intended to compress. I bet the probability evolution table is the same. All we need is to figure out precisely how the symbols are used to construct the pixel value, and how the predictions for each symbol are made.

EDIT: Oh who am I kidding, I can't sleep.
Here's the data from the fixed program:
http://neviksti.com/SPC7110/output1_7030_an1.dat

I have no idea how the contexts are chosen, for they are shared within a pixel and even between pixels. Anyone have any ideas?

zino
Posts: 4
Joined: Sun Jul 13, 2008 3:49 pm

Post by zino » Sun Jul 13, 2008 3:56 pm

neviksti wrote:It's really an amazing "perfect storm" of people happenning to have time to work on this all at once. I haven't checked on this board in probably over a month and happenned upon this thread almost right after it got started. Was anyone notified of this project, or did everyone here just luckily happen upon it?

Amazing.
Amazing indeed. I got the urge to check up on Snes9X development after years of staying out of it and for random reasons ended up on nesdev just as this discussion got started. This has really got me interested in doing some development again.

Synchronicity is everyones friend this month it seems.

--
Peter Bortas

neviksti
Posts: 205
Joined: Thu Jun 22, 2006 11:39 pm

Post by neviksti » Sun Jul 13, 2008 10:15 pm

I'm not sure what word to use, as bitplane has another meaning now, but before we discussed in mode 0 that there were 8 "bitplanes" and each could have several contexts to choose from.

Well for mode 1, there appear to only be two "bitplanes". I haven't figured out how many contexts each has yet. Here's the data to support my claim:

I ran the prob calculator with all zeros as inputs, and the .dat output is

Code: Select all

5A C0 5A 41 5A C2 25 40 25 C0 25 40 25 C1 25 40
11 C0 25 41 11 C0 11 40 11 C0 11 40 11 C0 11 41
11 C0 08 40 11 C0 08 40 11 C0 08 40 11 C0 08 40 
11 C1 08 40 08 C0 08 40 08 C0 08 40 08 C0 08 40 
08 C0 08 40 08 C0 08 40 08 C0 08 41 08 C0 03 40 
08 C0 03 40 08 C0 03 40 08 C0 03 40 08 C0 03 40 
08 C0 03 40 08 C0 03 40 08 C0 03 40 08 C0 03 40 
08 C0 03 40 08 C0 03 40 08 C1 03 40 03 C0 03 40

byuu
Posts: 1539
Joined: Mon Mar 27, 2006 5:23 pm
Contact:

Post by byuu » Sun Jul 13, 2008 11:00 pm

I don't mean to throw off the important discussion here, but for what it's worth, I have the rest of the SPC7110 (eg non-compression commands) emulated "correctly" now. When I have time, I'll be sure to refine the unknown cases by running some hardware tests.

http://byuu.cinnamonpirate.com/images/bs_275.png
http://byuu.cinnamonpirate.com/images/bs_276.png

Adding the finished compressor should only take a few minutes, and I can add any work-in-progress ones if you'd like to see the results in-game. Which brings me to a question: neviksti, I'll respect whatever license you want to use for your decompressor, but would I have your permission to use it? Ideally, I'd like to release my part of the SPC7110 class under the 2-clause BSD license and/or to the public domain. The decompressor portion can be any license you'd like, of course.

neviksti
Posts: 205
Joined: Thu Jun 22, 2006 11:39 pm

Post by neviksti » Mon Jul 14, 2008 1:15 am

byuu wrote:I don't mean to throw off the important discussion here, but for what it's worth, I have the rest of the SPC7110 (eg non-compression commands) emulated "correctly" now. When I have time, I'll be sure to refine the unknown cases by running some hardware tests.

http://byuu.cinnamonpirate.com/images/bs_275.png
http://byuu.cinnamonpirate.com/images/bs_276.png

Adding the finished compressor should only take a few minutes, and I can add any work-in-progress ones if you'd like to see the results in-game.
That's great news.
After the high of completing the first mode decompression, I must admit I'm slowing down quite a bit. We should get it all eventually though.
byuu wrote:Which brings me to a question: neviksti, I'll respect whatever license you want to use for your decompressor, but would I have your permission to use it? Ideally, I'd like to release my part of the SPC7110 class under the 2-clause BSD license and/or to the public domain. The decompressor portion can be any license you'd like, of course.
Oh I don't care about any of that liscence stuff. If someone can make sense of my code and use it, great for them. Do with it as you please.

Which reminds me, what ever happenned to the DSP-1 stuff? Maybe we should finish up that at some point too.

;-----------------------------------------

Alright, looking at output1_7030_an1.dat and output1_7030_an1.other.dat, both require 5 contexts for bitplane0 and 10 contexts for bitplane1 to explain the probability values.

Such bizarre numbers. The 10 could possibly be explained by 2 contexts for each of the 5 bitplane0 contexts. Actually, I'd venture a guess it is selected by 5*(bit value of bitplane0) + context of bitplane0, since after all this is the remaining bit of the color for that pixel... most details of context selection are probably already done once bitplane0 context is chosen.

Regardless, it is likely bitplane1 context selection will not be possible to sift through until bitplane0 context selection is understood. So let's focus on that. First, why 5? I have no clue what to make of that. As a prime number it can't be a series of bits it considers, unless it uses those to look up in a table, and then the table entries are only 0-4.

You'd want prediction of the pixel to be based on surrounding pixels, and for each pixel bit we have the following data:
- actual bit value
- encoded as mps or lps
- context # of bit
- the "invert" state of that context

Since we don't have the "answers" of which prob values are from which context, I'm not sure the best way to continue. Before, we focussed on "bitplane1" since it only had two contexts, which made it possible to figure out which contexts were used for each prob value ... but with 10, I'm not sure how we'd accomplish that.

Any ideas on the best way to continue here?

tepples
Posts: 21751
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Mon Jul 14, 2008 6:49 am

neviksti wrote:
byuu wrote:neviksti, I'll respect whatever license you want to use for your decompressor, but would I have your permission to use it? Ideally, I'd like to release my part of the SPC7110 class under the 2-clause BSD license and/or to the public domain. The decompressor portion can be any license you'd like, of course.
Oh I don't care about any of that liscence stuff. If someone can make sense of my code and use it, great for them. Do with it as you please.
I'll take that as a WTFPL.
neviksti wrote:Which reminds me, what ever happenned to the DSP-1 stuff? Maybe we should finish up that at some point too.
Is the DSP-1 already documented well enough that an emulator running on the Nintendo DS could high-level emulate it effectively? A lot of people on Some Other Forum want some reasonable facsimile of Super Mario Kart, and not just the 20 percent of tracks that made it into Mario Kart DS.

Andreas Naive
Posts: 104
Joined: Mon Nov 26, 2007 2:06 am
Location: Madrid, Spain
Contact:

Post by Andreas Naive » Mon Jul 14, 2008 9:00 am

Wow, it seems this is going fast. :P
:
Which reminds me, what ever happenned to the DSP-1 stuff? Maybe we should finish up that at some point too.
Is the DSP-1 already documented well enough that an emulator running on the Nintendo DS could high-level emulate it effectively? A lot of people on Some Other Forum want some reasonable facsimile of Super Mario Kart, and not just the 20 percent of tracks that made it into Mario Kart DS.
Timing emulation is the main lacking behaviour. Some bits of information about initializations and 'glue' code maybe should be checked, but the opcodes are thought to be bit perfect. You can take a look at them in bsnes's code or here:

http://www.romhacking.net/docs/320/
So we now know what mode 0, mode 1, mode 2 mean, and what they were intended to compress.
Ummm, so (in case anyone cares) the SPC7110-S-DD1 equivalences would be

Code: Select all

SPC7110       S-DD1
Type 00       Type 11
Type 01       Type 00
Type 10       Type 10
First, why 5?
The patent use 5 "states" determined by the surrounding pixels in the following way: once you have chosen three surrounding pixels A,B,C (it used the pixels to the left, up and left-up corner, but this could change, as occured with the S-DD1 by using the top-right pixel too or whatever combination --figure 12 in the patent--) you determine a 5-fold state in the following way (figures 6, 7, 15A & 15B in the patent):

A=B=C -> 0
A=B!=C -> 1
A=C!=B -> 2
B=C!=A -> 3
A!=B!=C!=A -> 4

tepples
Posts: 21751
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Bell numbers

Post by tepples » Mon Jul 14, 2008 9:21 am

Andreas Naive wrote:
First, why 5?
The patent use 5 "states" determined by the surrounding pixels in the following way: once you have chosen three surrounding pixels A,B,C (it used the pixels to the left, up and left-up corner, but this could change, as occured with the S-DD1 by using the top-right pixel too or whatever combination --figure 12 in the patent--) you determine a 5-fold state in the following way (figures 6, 7, 15A & 15B in the patent):

A=B=C -> 0
A=B!=C -> 1
A=C!=B -> 2
B=C!=A -> 3
A!=B!=C!=A -> 4
So it appears the number of states for an n pixel context would be equal to the number of ways to partition a set of n elements. This is called the nth Bell number B[n], and B[3] is 5.

Just my two bells...

Andreas Naive
Posts: 104
Joined: Mon Nov 26, 2007 2:06 am
Location: Madrid, Spain
Contact:

Post by Andreas Naive » Mon Jul 14, 2008 10:34 am

OK. By applying a "quick sieve" method, i have come to the conclusion that the 3 pixels used in bitplane #0 are, counting backwards, #2, #8 & #9 (that is, the second pixel to the left, the top one and the top-left corner). It's strange that the first pixel to the left is unused.

I still haven't followed the Qe evolution for the contexts, so take this with a bit of salt, as i could have jumped to a conclusion too soon like the last time. I will double check it now.

EDITED: Double checked. I have followed the Qe evolution all the way down the output1_7030_an1.dat file. For debug purposes, here you have the output of my program (you can see the 5 contexts for bitplane #0).
http://andreasnaive.en.eresmas.com/SPC7 ... _2.txt.zip

neviksti
Posts: 205
Joined: Thu Jun 22, 2006 11:39 pm

Post by neviksti » Mon Jul 14, 2008 11:18 am

Andreas Naive wrote:
Which reminds me, what ever happenned to the DSP-1 stuff? Maybe we should finish up that at some point too.
Timing emulation is the main lacking behaviour. Some bits of information about initializations and 'glue' code maybe should be checked, but the opcodes are thought to be bit perfect. You can take a look at them in bsnes's code or here:

http://www.romhacking.net/docs/320/
Wow! I had no idea.
Overload didn't update the DSP page ( http://users.tpg.com.au/advlink/dsp/dsp1.html ) so I thought we still had an opcode to go.

Great job!
Andreas Naive wrote:
First, why 5?
The patent use 5 "states" determined by the surrounding pixels in the following way: once you have chosen three surrounding pixels A,B,C (it used the pixels to the left, up and left-up corner, but this could change, as occured with the S-DD1 by using the top-right pixel too or whatever combination --figure 12 in the patent--) you determine a 5-fold state in the following way (figures 6, 7, 15A & 15B in the patent)
Ah, thank you. I completely forgot about the patent already.
I'll read through that tonight.
Andreas Naive wrote:OK. By applying a "quick sieve" method, i have come to the conclusion that the 3 pixels used in bitplane #0 are, counting backwards, #2, #8 & #9 (that is, the second pixel to the left, the top one and the top-left corner). It's strange that the first pixel to the left is unused.

I still haven't followed the Qe evolution for the contexts, so take this with a bit of salt, as i could have jumped to a conclusion too soon like the last time. I will double check it now.

EDITED: Double checked. I have followed the Qe evolution all the way down the output1_7030_an1.dat file. For debug purposes, here you have the output of my program (you can see the 5 contexts for bitplane #0).
http://andreasnaive.en.eresmas.com/SPC7 ... _2.txt.zip
Wow, you'll probably be done with this mode before I even get home tonight :)

Reworking the prob calculator for mode 2 will be super cludgey. Maybe we'll learn enough from mode 1 to not even need it, but if we do I'll need to think of a better way of rewritting it. (It's probably going to run _real_ slow.)

byuu
Posts: 1539
Joined: Mon Mar 27, 2006 5:23 pm
Contact:

Post by byuu » Mon Jul 14, 2008 1:57 pm

Derailing once again, I am ... :/
Which reminds me, what ever happenned to the DSP-1 stuff?
Someone asked if I was planning on emulating it, and I responded that it wasn't "even" bit-perfect, so I wanted to avoid it. Essentially, it came off very rude (when you considered how much effort went into where we were at thus far,) which really wasn't my intention. I'm rather bad with words like that (ask Dark Force sometime.) I apologize again about that, Andreas and co. Essentially, I'm just overly pessimistic about emulation in general. I get too caught up in perfection. I criticize my own work the same way (consider it a failure, especially the PPU emulation), but it's never cool to be anything but humbly grateful for work others have done.

But Andreas was really cool, and went back and finished the last bit for me, and even made a really spiffy C++ interface class for me!! It was literally almost drop-in and compile. Awesome guy.

So, right now, only two issues remain:

1) we need timing emulation. Suzaku 8 Hours has some rather serious issues (more on zboards somewhere), and Mario Kart has lesser timing issues going on. The problem with emulating this is that most certainly, the length of time to complete each opcode is going to vary based on how complex the inputs are. The best we could do would be an approximation. And even at that, to do this, we'd have to implement the DSP as a coprocessor, rather than as a memory read/write device. This could substantially slow down emulation, for something we can't even get very accurate. Hence I've avoided trying.

2) there's a chance you can read back partially computed results before the operations finish, ala the SNES mul/div registers. I'd shudder to try and emulate that.

Realistically, the only way we're going to get the DSP-1 bit-perfect and time-perfect, would be to dump the program ROM, and then emulate the processor it uses. It would then have overhead similar to the SuperFX and SA-1 add-ons. Ouch. But if someone managed to dump the PROMs, I'd give willing to give it a shot :)

Post Reply