Page 1 of 2

Big Bird's Hide and Speak sample compression

Posted: Sat Mar 03, 2012 8:13 pm
by rainwarrior
Tepples made a comment about this game's samples being a little bit obscure and I guess I took it as a dare to figure it out.

Python decoder:
http://rainwarrior.ca/projects/nes/bigbird_samples.py

Dumped samples:
http://rainwarrior.ca/projects/nes/Sesa ... 0(USA).wav

So, the samples actually decode to 8-bit samples, which is a bit bizarre. They just get an LSR before writing $4011. The reason it's so strange is because the compressed sample format is a fixed 5-bits per sample stream which index a 32 byte lookup table for the output; at the beginning of the stream, and at every 20th sample, the lookup table is reset with a 4-bit selection code; there are 16 different lookup tables which map to different ranges of output samples. So... they're decompressing 5-bit samples into 8-bit samples then throwing away a bit, so really they're only getting a compression ratio a bit worse than 5/7. A 5-bit value of 0 halts the sample and returns from playback.

I figured out the locations of samples by hand; there's a bunch of contiguous blocks. (Not sure if I missed any.) There's a mechanism to load and play a sample by a 3-byte pointer (basically bank select + pointer + a few extra bits of data). All of these pointers are stored in bank 2 but some of them are contiguous, some are not; there's not convenient table here. Where there's contiguous ones, they tend to splice words/sentences together out of the consecutive sounds.

As Tepples mentioned, the consonant sounds are often separated (and I think are used as common to many words). Some samples are broken into parts, I think so playback can return for an instant to do animation or something else quickly. The alphabet gets really weird; they tend to be stored in strings of ~30 sample blocks, or other strange combinations.

The following code is the decoding loop. It reads and outputs 8 5-bit samples from 5 bytes of memory, selecting a new lookup table every 20 bytes. It also bankswitches if the end of a bank is reached. Note that the code is interspersed with time wasting NOPs and JSRs to keep the samplerate consistent.

Code: Select all

;-------------------------------------------------------------------------------
; entry into sample playback (MMC1 bank has been selected, ($02) is address of sample)
; sample playback also returns here to reset the lookup table every 13 bytes or so
; this first block selects the sample output lookup table
;-------------------------------------------------------------------------------
__f378:     LDY #$01           ; $f378: a0 01     
            LDA ($02),y        ; $f37a: b1 02     ; load $02+1 ---0
            LSR                ; $f37c: 4a        
            DEY                ; $f37d: 88        
            LDA ($02),y        ; $f37e: b1 02     ; load $02+0 210-
            ROL                ; $f380: 2a        
            AND #$0f           ; $f381: 29 0f     ; table = 4 bit value in A
            STY $09            ; $f383: 84 09     ; $09 = 0 (used later)
            ASL                ; $f385: 0a        
            ASL                ; $f386: 0a        
            ASL                ; $f387: 0a        
            ASL                ; $f388: 0a        
            ASL                ; $f389: 0a        ; A = table 210----- (low 3 bits as high 3 bits of A)
            ROL $09            ; $f38a: 26 09     ; store table bit 3 in $09 temporarily
            ADC #$17           ; $f38c: 69 17     ; A = A + $17 (tables all start at +$17 offset, note carry = 0 here)
            STA $06            ; $f38e: 85 06     ; store low byte of table address
            LDA $09            ; $f390: a5 09     ; A = table -------3 (high bit)
            ADC #$fb           ; $f392: 69 fb     ; A = A + $FB (FB17 is address of lowest table, 4 byte value selects from 16 tables)
            STA $07            ; $f394: 85 07     ; store high byte, ($06) now stores table
            LDX #$03           ; $f396: a2 03     ; X = 3 (return here after 3 loops)
;-------------------------------------------------------------------------------
; load the first sample before entering 5-byte / 8-sample loop
;-------------------------------------------------------------------------------
            LDA ($02),y        ; $f398: b1 02     ; load $02 76543
            LSR                ; $f39a: 4a        
            LSR                ; $f39b: 4a        
            LSR                ; $f39c: 4a        
            BEQ __f40a         ; $f39d: f0 6b     ; terminating 0
            INC $02            ; $f39f: e6 02     ; inc $02
            BNE __f3a8         ; $f3a1: d0 05     
            INC $03            ; $f3a3: e6 03     
            JMP __f3ad         ; $f3a5: 4c ad f3  
__f3a8:     NOP                ; $f3a8: ea        
            NOP                ; $f3a9: ea        
            JMP __f3ad         ; $f3aa: 4c ad f3  
__f3ad:     JSR __f520         ; $f3ad: 20 20 f5  ; write sample
            JMP __f3b3         ; $f3b0: 4c b3 f3  ; return to top of loop
;-------------------------------------------------------------------------------
; top of sample decoding loop
;-------------------------------------------------------------------------------
__f3b3:     LDA ($02),y        ; $f3b3: b1 02     ; load $02 54321
            LSR                ; $f3b5: 4a        
            AND #$1f           ; $f3b6: 29 1f     
            BEQ __f40a         ; $f3b8: f0 50     ; terminating 0
            JSR __f514         ; $f3ba: 20 14 f5  ; delay
            NOP                ; $f3bd: ea        
            NOP                ; $f3be: ea        
            NOP                ; $f3bf: ea        
            NOP                ; $f3c0: ea        
            NOP                ; $f3c1: ea        
            NOP                ; $f3c2: ea        
            JSR __f520         ; $f3c3: 20 20 f5  ; do sample
;-------------------------------------------------------------------------------
            LDA ($02),y        ; $f3c6: b1 02     ; load $02 ---76
            ASL                ; $f3c8: 0a        
            ROL                ; $f3c9: 2a        
            ROL                ; $f3ca: 2a        
            AND #$03           ; $f3cb: 29 03     
            STA $09            ; $f3cd: 85 09     
            JSR __f50a         ; $f3cf: 20 0a f5  ; inc $02
            LDA ($02),y        ; $f3d2: b1 02     ; load $02 210--
            AND #$07           ; $f3d4: 29 07     
            ASL                ; $f3d6: 0a        
            ASL                ; $f3d7: 0a        
            ORA $09            ; $f3d8: 05 09     
            BEQ __f40a         ; $f3da: f0 2e     ; terminating 0
            JSR __f51d         ; $f3dc: 20 1d f5  ; delay
            NOP                ; $f3df: ea        
            NOP                ; $f3e0: ea        
            NOP                ; $f3e1: ea        
            NOP                ; $f3e2: ea        
            NOP                ; $f3e3: ea        
            NOP                ; $f3e4: ea        
            NOP                ; $f3e5: ea        
            JMP __f3e9         ; $f3e6: 4c e9 f3  
__f3e9:     JSR __f520         ; $f3e9: 20 20 f5  ; do sample
;-------------------------------------------------------------------------------
            LDA ($02),y        ; $f3ec: b1 02     ; load $02 76543
            LSR                ; $f3ee: 4a        
            LSR                ; $f3ef: 4a        
            LSR                ; $f3f0: 4a        
            BEQ __f40a         ; $f3f1: f0 17     ; terminating 0
            JSR __f50a         ; $f3f3: 20 0a f5  ; inc $02
            JSR __f517         ; $f3f6: 20 17 f5  ; delay
            JMP __f3fc         ; $f3f9: 4c fc f3  
__f3fc:     NOP                ; $f3fc: ea        
            JSR __f520         ; $f3fd: 20 20 f5  ; do sample
;-------------------------------------------------------------------------------
; every third pass through the loop we reset the table and restart
;-------------------------------------------------------------------------------
            DEX                ; $f400: ca        ; decrement X
            BNE __f406         ; $f401: d0 03     
            JMP __f378         ; $f403: 4c 78 f3  ; reset the lookup table if X = 0
;-------------------------------------------------------------------------------
__f406:     LDA ($02),y        ; $f406: b1 02     ; load $02 43210
            AND #$1f           ; $f408: 29 1f     
__f40a:     BEQ __f483         ; $f40a: f0 77     ; terminating 0
            JSR __f514         ; $f40c: 20 14 f5  ; delay
            NOP                ; $f40f: ea        
            NOP                ; $f410: ea        
            NOP                ; $f411: ea        
            NOP                ; $f412: ea        
            NOP                ; $f413: ea        
            NOP                ; $f414: ea        
            JSR __f520         ; $f415: 20 20 f5  ; do sample
;-------------------------------------------------------------------------------
            LDA ($02),y        ; $f418: b1 02     ; load $02 --765
            ASL                ; $f41a: 0a        
            ROL                ; $f41b: 2a        
            ROL                ; $f41c: 2a        
            ROL                ; $f41d: 2a        
            AND #$07           ; $f41e: 29 07     
            STA $09            ; $f420: 85 09     
            JSR __f50a         ; $f422: 20 0a f5  ; inc $02
            LDA ($02),y        ; $f425: b1 02     ; load $02 10---
            AND #$03           ; $f427: 29 03     
            ASL                ; $f429: 0a        
            ASL                ; $f42a: 0a        
            ASL                ; $f42b: 0a        
            ORA $09            ; $f42c: 05 09     
            BEQ __f483         ; $f42e: f0 53     ; terminating 0
            NOP                ; $f430: ea        
            NOP                ; $f431: ea        
            NOP                ; $f432: ea        
            NOP                ; $f433: ea        
            NOP                ; $f434: ea        
            NOP                ; $f435: ea        
            JMP __f439         ; $f436: 4c 39 f4  
__f439:     JSR __f520         ; $f439: 20 20 f5  ; do sample
;-------------------------------------------------------------------------------
            LDA ($02),y        ; $f43c: b1 02     ; load $02 65432
            LSR                ; $f43e: 4a        
            LSR                ; $f43f: 4a        
            AND #$1f           ; $f440: 29 1f     
            BEQ __f483         ; $f442: f0 3f     ; terminating 0
            JSR __f514         ; $f444: 20 14 f5  ; delay
            NOP                ; $f447: ea        
            NOP                ; $f448: ea        
            NOP                ; $f449: ea        
            NOP                ; $f44a: ea        
            NOP                ; $f44b: ea        
            JMP __f44f         ; $f44c: 4c 4f f4  
            JSR __f520         ; $f44f: 20 20 f5  ; do sample
;-------------------------------------------------------------------------------
            LDA ($02),y        ; $f452: b1 02     ; load $02 ----7
            ASL                ; $f454: 0a        
            JSR __f50a         ; $f455: 20 0a f5  ; inc $02
            LDA ($02),y        ; $f458: b1 02     ; load $02 3210-
            ROL                ; $f45a: 2a        
            AND #$1f           ; $f45b: 29 1f     
            BEQ __f483         ; $f45d: f0 24     ; terminating 0
            JSR __f517         ; $f45f: 20 17 f5  ; delay
            JSR __f520         ; $f462: 20 20 f5  ; do sample
;-------------------------------------------------------------------------------
            LDY #$01           ; $f465: a0 01     
            LDA ($02),y        ; $f467: b1 02     ; load $02+1 0----
            LSR                ; $f469: 4a        
            DEY                ; $f46a: 88        
            LDA ($02),y        ; $f46b: b1 02     ; load $02+0 -7654
            JSR __f50a         ; $f46d: 20 0a f5  
            ROR                ; $f470: 6a        
            LSR                ; $f471: 4a        
            LSR                ; $f472: 4a        
            LSR                ; $f473: 4a        
            BEQ __f483         ; $f474: f0 0d     ; terminating 0
            JSR __f549         ; $f476: 20 49 f5  ; triggers a bankswitch if appropriate
            NOP                ; $f479: ea        
            NOP                ; $f47a: ea        
            NOP                ; $f47b: ea        
            NOP                ; $f47c: ea        
            JSR __f520         ; $f47d: 20 20 f5  ; do sample
            JMP __f3b3         ; $f480: 4c b3 f3  ; loop
;-------------------------------------------------------------------------------
; when a 0 sample is read, this handles the very last sample
;-------------------------------------------------------------------------------
__f483:     JSR __f50a         ; $f483: 20 0a f5  ; do last sample
            RTS                ; $f486: 60        ; end of loop

;...............................................................................
; purpose of $f487 - $f509 not known, omitted
;...............................................................................

;-------------------------------------------------------------------------------
; increment ($02) address (sample data pointer)
;-------------------------------------------------------------------------------
__f50a:     INC $02            ; $f50a: e6 02     
            BNE __f511         ; $f50c: d0 03     
            INC $03            ; $f50e: e6 03     
            RTS                ; $f510: 60        
__f511:     NOP                ; $f511: ea        
            NOP                ; $f512: ea        
            RTS                ; $f513: 60        

;-------------------------------------------------------------------------------
; various delay subroutines
;-------------------------------------------------------------------------------
__f514:     JSR __f51d         ; $f514: 20 1d f5  
__f517:     JSR __f51d         ; $f517: 20 1d f5  
__f51a:     JSR __f51d         ; $f51a: 20 1d f5  
__f51d:     NOP                ; $f51d: ea        
            NOP                ; $f51e: ea        
__f51f:     RTS                ; $f51f: 60        

;-------------------------------------------------------------------------------
; looks up a table value from the table at ($06) based on the value in A
; delays for a bit, then stores the value (right shifted by 1) in 4011
;-------------------------------------------------------------------------------
__f520:     TAY                ; $f520: a8        
            LDA ($06),y        ; $f521: b1 06     ; lookup sample
            JMP __f53f         ; $f523: 4c 3f f5  
            .hex 00 00 00 00   ; $f526: 00 00 00 00 ; empty space in code?
            .hex 00 00 00 00   ; $f52a: 00 00 00 00
            NOP                ; $f52e: ea        
__f52f:     NOP                ; $f52f: ea        
__f530:     NOP                ; $f530: ea        
            NOP                ; $f531: ea        
            NOP                ; $f532: ea        
            NOP                ; $f533: ea        
            NOP                ; $f534: ea        
            NOP                ; $f535: ea        
            NOP                ; $f536: ea        
            NOP                ; $f537: ea        
            NOP                ; $f538: ea        
            NOP                ; $f539: ea        
            NOP                ; $f53a: ea        
            NOP                ; $f53b: ea        
            NOP                ; $f53c: ea        
            NOP                ; $f53d: ea        
            NOP                ; $f53e: ea        
__f53f:     LDY #$15           ; $f53f: a0 15     ; end of "empty" space?
__f541:     DEY                ; $f541: 88        
            BNE __f541         ; $f542: d0 fd     ; loop 15 times to delay
            LSR                ; $f544: 4a        ; discard pesky 8th bit
            STA $4011          ; $f545: 8d 11 40  ; play sample
            RTS                ; $f548: 60        

;-------------------------------------------------------------------------------
; automatically bankswitch if end of bank is reached
;-------------------------------------------------------------------------------
__f549:     PHA                ; $f549: 48        
            LDA $03            ; $f54a: a5 03     
            CMP #$bf           ; $f54c: c9 bf     
            BNE __f572         ; $f54e: d0 22     
            LDA $02            ; $f550: a5 02     
            CMP #$c0           ; $f552: c9 c0     
            BCS __f558         ; $f554: b0 02     ; if ($02) > $bfc0
            PLA                ; $f556: 68        
            RTS                ; $f557: 60        
__f558:     INC $019c          ; $f558: ee 9c 01  ; increment bank in $019c
            LDA $019c          ; $f55b: ad 9c 01  
            JSR __f578         ; $f55e: 20 78 f5  ; bankswitch to bank in $019c
            LDA $02            ; $f561: a5 02     ; ($02) -= $3fc0
            SEC                ; $f563: 38        
            SBC #$c0           ; $f564: e9 c0     
            STA $02            ; $f566: 85 02     
            LDA $03            ; $f568: a5 03     
            SBC #$bf           ; $f56a: e9 bf     
            ORA #$80           ; $f56c: 09 80     
            STA $03            ; $f56e: 85 03     
            PLA                ; $f570: 68        
            RTS                ; $f571: 60        

;...............................................................................
; sample lookup tables are stored at $fb17
;...............................................................................
The unannotated disassembly was produced by DISASM6 v1.4, with help from FCEUX. The command line was:
disasm6 romname.nes -o 0xC000 -fs 0x3C010 -i

Posted: Sat Mar 03, 2012 9:05 pm
by Dwedit
I just played the WAV file backwards looping. It's quite strange and disturbing.
Nice job!

Posted: Sun Mar 04, 2012 10:16 am
by tepples
Thank you for figuring this out.

Of course, we homebrewers don't need to implement this codec directly in our projects. We just need to use it as a sort of "pace car" to see if our own codecs are better or worse than the state of the art was during the NES's commercial era. (The M.C. Kids post-mortems are the same way.)

I still use Ubuntu 11.10, which has Python 2.7 by default, which has different semantics for str and bytes from the Python 3.x series. I had to make a small change to load_rom() to get the program to work:

Code: Select all

def load_rom():
    from array import array
    f = open(filename_rom,'rb')
    rom = array('B', f.read())
    f.close()
    print('Loaded ROM: %s (%d bytes)' % (filename_rom, len(rom)))
    return rom
Some samples are stored in CHR ROM. How does the playback work for those? I seem to remember that a lot of older emulators used to show garbage tiles for Big Bird's sprites when some samples were playing and/or freeze when Big Bird says "Go". Perhaps they were screwing up the MMC1 bankswitching.

I investigated the format of the sixteen tables. Apart from entry 0 which appears to duplicate entry 16 (both are always 128 in both tables), they appear to just be linear PCM at 30 different scale factors:
[30, 34, 40, 46, 54, 62, 70, 82, 94, 108, 124, 144, 166, 192, 220, 254]

Here's the code I used to prove this out:

Code: Select all

def print_tables(rom):
    tables = rom[0x3FB27:0x3FB27 + 32 * 16]
    tables = [tables[i:i + 32]
              for i in range(0, 512, 32)]
    scalefactors = [(row[31] - row[1]) for row in tables]
    deltas = [[(i - 1, round((s - row[1]) * 30 / (row[31] - row[1]), 2))
                for (i, s) in list(enumerate(row))[2:]]
              for row in tables]

    print("Raw tables:")
    print("\n".join(repr(row) for row in tables))
    print("Volume scale factors:")
    print(scalefactors)
    print("Expected levels vs. relative levels:")
    print("\n".join('Row %d:\n%s' % (i, repr(row))
                    for (i, row) in enumerate(deltas)))
Even delta PCM would have let them use 4-bit samples and a simpler playback routine, like I did here, and keep roughly the same audio quality.

Posted: Sun Mar 04, 2012 12:28 pm
by rainwarrior
Yeah, this is unnecessarily complicated and far from optimal.

Mistake #1 was starting with 8-bit samples to begin with. I do suspect as you do that a modified 4-bit VOX decoder would be almost as good for this purpose, save more space, and would reduce the complexity of the code down to just two bit-block decoding stages rather than 10 (far fewer branches to try and time correctly too).

If you check the game's credit screen, the voice stuff was written by another company; perhaps it was a solution for a different 6502 platform (with 8-bit playback) that they purchased and adapted.

Though, despite this criticism, it works, and it saved I think 80kb or so vs 8-bit samples, so it did its job somewhat. The game shipped, and plays just fine for what it is. It's a very simple and robust game. I think it's a bit more playable and engaging than the other Sesame Street NES games I've seen (i.e. the same company's Sesame Street Countdown, and Rare's Sesame Street ABC).

I don't know enough about how the iNES format handles banks or how the MMC1 should be implemented to know how the hell they used samples in CHR ROM, but they seem to be there, and the relevant sounds (e.g. "Grover") are heard in game and don't seem to be duplicates. There could be a duplicate decoder hiding in the code somewhere, but I didn't find it if there is one. In FCEUX's PPU viewer I don't see any "noise" data or flickering in the CHR pages while the samples are playing, so I think it manages to bankswitch the data into $8000 somehow.

Edit: there is a duplicate decompressor, and it loads in blocks of data from CHR ROM just after NMI when playing these samples. See below for more specific information.

Posted: Sun Mar 04, 2012 1:07 pm
by Dwedit
It's just reading it out of CHR-ROM during Vblank time into NES RAM using an unrolled loop. Nothing special.
It also seems to be playing previously decoded data at the same time as well.

Posted: Sun Mar 04, 2012 1:11 pm
by Bregalad
And all this sound data fit inside a NES ROM ? This is surprising to say the least.
Even using some kind of 4-bit-per-sample ADPCM-ish compression (teeple's algorithm, not mine) I could only get several seconds of sound before getting some ridiculously large data.

Posted: Sun Mar 04, 2012 1:17 pm
by Dwedit
It's 384KB in size, that is pretty big for a NES game.

Posted: Sun Mar 04, 2012 1:31 pm
by rainwarrior
Ah, found the code that does it. It reads a block from CHR ROM to RAM, and then runs a second implementation of the decompressor that works on that RAM address.

The CHR reading code starts at around $F592 in the code bank, and it gets executed just after an NMI (reads a block of data to $0280). I guess playback for these samples halts briefly after NMI to do this and then the code at $F8E1-$F9B5 looks really similar to the 8-sample decoder cycle that I annotated above from $F3B3-$F480.

Posted: Sun Mar 04, 2012 1:34 pm
by Dwedit
Playback doesn't halt, there are 4011 writes mixed in with the reading.

Posted: Sun Mar 04, 2012 1:39 pm
by rainwarrior
Ah, you're right. There's three pre-decoded samples written to $4011 during the CHR ROM loading code.

Also, this is hardly "nothing special". ;) The detail in the timing of this code is very impressive.

Posted: Sun Mar 04, 2012 4:14 pm
by tepples
It's "nothing special" compared to Forrest Mozer's codecs for ESS. Those are ridiculously efficient for the time, producing understandable speech around 4 kbps, comparable to an LPC vocoder but algorithmically much simpler to decode.

Posted: Sun Mar 04, 2012 4:58 pm
by rainwarrior
The speech stuff for this game was written by ESS, actually. (Check the title screens.)

Posted: Sun Mar 04, 2012 5:36 pm
by tepples
That leaves two possibilities:
  1. The fidelity loss from using the MX codec was unacceptable for an early childhood edutainment game in which clear diction and Spinney's iconic voice are of paramount importance. In the sample page, hear how warbly the "gbusters scream" and "imission scream" samples are.
  2. ESS was willing to sell an LPCM solution for a much cheaper royalty than a solution based on MX, and the royalty difference outweighed the difference in replication cost between the smaller cart that used MX and the bigger cart that used LPCM.
I'll go with c. All of the above.

Posted: Mon Mar 05, 2012 1:00 pm
by Bananmos
In the sample page, hear how warbly the "gbusters scream" and "imission scream" samples are
Hmm, this makes me wonder whether the samples there are quantized to 4-bit (as they were in those C=64 games) or not. That could potentially make quite a difference. Maybe not in the "warbliness", but at least in general perceptive quality.

My first guess would be no such quantization is done by this player, and this is then how good a sample with this codex could possibly be rendered without those 4-bit quantization limitations. But I can't tell for sure from the playback.

Posted: Mon Mar 05, 2012 9:46 pm
by rainwarrior
The ghostbusters thing doesn't sound like fixed bit compression to me. If I had to guess what's going on there, I'd say it cuts the sound into short segments and replaces each segment with a single cycling waveform that is a close match.