SDD1 FGPA implementation

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
magno
Posts: 193
Joined: Tue Aug 15, 2006 5:23 am
Location: Spain
Contact:

SDD1 FGPA implementation

Post by magno »

This project began some years ago, about 2011, when I was interested on implementing SDD1 chip on an FPGA using Andreas Naive documentation about the chip. I first create the core files which decoded 2BPP and 4BPP modes, 8BPP was buggy bt then and the last mode was not implemented. I tries to contact with some guys from SD2SNES to ask if they were interested on the project but never got an answer and I gave up the poject due to lack of interested and come personal issues.

But finally, some months ago I finished it all, checked that all modes worked perfectly and decided to implement the chip on a Zedboard. My final goal is to connect the board to SNES using a custom interface board and check it on the real hardware.

I uploaded a video in Youtube in case some of you are interesed:

https://youtu.be/dsewU6s4Nrs
srg320
Posts: 32
Joined: Fri Feb 16, 2018 5:52 am
Location: Ukraine

Re: SDD1 FGPA implementation

Post by srg320 »

Hi.
I also worked on the SDD1 FGPA implementation for my SNES FPGA. Decoder is working, but embed in SNES yet it does not work.

I'm interested, how many master cycles does it take to decode first two bytes (one row) and the rest in your implementation?
magno
Posts: 193
Joined: Tue Aug 15, 2006 5:23 am
Location: Spain
Contact:

Re: SDD1 FGPA implementation

Post by magno »

srg320 wrote:I'm interested, how many master cycles does it take to decode first two bytes (one row) and the rest in your implementation?
Well, it depends on the bit-depth and the context. For 2BPP and context 3, it tooks 31 master cycle from wirte to $4801 to first 2 bytes are ready on the output FIFO, plus 1 cycle to get first byte on the DMA data bus. It takes 15 cycles to fill the pipeline, 16 master cycles to complete each pixel on 2 bitplanes and 1 cycle in reading from output FIFO.
srg320
Posts: 32
Joined: Fri Feb 16, 2018 5:52 am
Location: Ukraine

Re: SDD1 FGPA implementation

Post by srg320 »

My Decoder takes 16 master cycles for 16 bit of 2 bitplanes (plus 5 cycles for fist 2 bitplanes for read header and initialize decoder). When I created DMA module and SDD1 I was based on this post.
I think, decoder should start after write to $420B and fetch next opcode (from line 2490 in post). But then remains few cycles to initialize decoder and decode the first 2 bitplanes. Also we must not forget about pause for DRAM REFRESH.
Maybe SDD1 chip decode 2 bitplanes over 8 master cycles.
magno
Posts: 193
Joined: Tue Aug 15, 2006 5:23 am
Location: Spain
Contact:

Re: SDD1 FGPA implementation

Post by magno »

srg320 wrote:My Decoder takes 16 master cycles for 16 bit of 2 bitplanes
Yes, any compliant decoder must decode 16 bits on 16 master cycles, if not, DMA would "ask" for data faster than decoder generated the output bytes.
srg320 wrote: (plus 5 cycles for fist 2 bitplanes for read header and initialize decoder).
That sounds fast! Do you use input FIFO for decoding? Andreas Naive wondered if it was necessary, and looks it is for high decodin ratios (up to 128:1).

srg320 wrote: I think, decoder should start after write to $420B and fetch next opcode (from line 2490 in post). But then remains few cycles to initialize decoder and decode the first 2 bitplanes.
I'm pretty sure decoding starts after writing to $4801; I take that point as 0t (start of time) to measure master cycles. If I measure time from writting to $420B, my decoder is as fast as 4 master cycles, but that's tricky, because after triggering DMA (ie, after writting to $420B) data must be present on data bus at most 7 cycles later so CPU latches data in the 8th cycle.
srg320 wrote: Also we must not forget about pause for DRAM REFRESH
DRAM refresh doesn't affect at all to S-DD1, in fact, that cartrdige connector pin is not even routed to the chip (neither in Star Ocea nor in SFA2). When DRAM occurs, CPU is halted, so /RD and /WR strobes are drive to '1'. S-DD1 must obey to those strobes to present decoded data to the DMA.

srg320 wrote: Maybe SDD1 chip decode 2 bitplanes over 8 master cycles.
I thought a lot about this; at first, I also thought maybe the chip had some kind of parallelism, but if you check carefully the decoding algorithm, context information is only available after decoding current pixel, so it's impossible to decode (n-1)th bit if nth bit is not decoded first to create the context.
Then, if S-DD1 is designed fully synchronous, 1 pixel is decoded each clock cycle. If S-DD1 is partially synchronous, there is high risk of combinational loops, so latches must be instantiated.
srg320
Posts: 32
Joined: Fri Feb 16, 2018 5:52 am
Location: Ukraine

Re: SDD1 FGPA implementation

Post by srg320 »

magno wrote:
srg320 wrote: (plus 5 cycles for fist 2 bitplanes for read header and initialize decoder).
That sounds fast! Do you use input FIFO for decoding? Andreas Naive wondered if it was necessary, and looks it is for high decodin ratios (up to 128:1).
I mean 5 cycles for init decoder and header + 16 cycles for 2 bitplanes = 21.
I am not use FIFO, I wait end of DMA RD signal (every second) and run decoding new 2 bitplanes.

magno wrote:I'm pretty sure decoding starts after writing to $4801
Ok, what will happen if 43x2-43x4 is writed after writing 4801, or selected more than one channel SDD1. And how SCPU will fetch next opcode after writing to 4801, if SDD1 will use ROM bus? Or, SDD1 reading ROM between SCPU access when RD and WR is high level?
magno
Posts: 193
Joined: Tue Aug 15, 2006 5:23 am
Location: Spain
Contact:

Re: SDD1 FGPA implementation

Post by magno »

srg320 wrote: I am not use FIFO, I wait end of DMA RD signal (every second) and run decoding new 2 bitplanes.
How would you fetch data from ROM if for each output pixel you needed to decode a 7-order Golomb code? You'd need 1 input byte each output pixel (ie, each master cycle), to achieve that your ROM should be 47ns time access or better. What if you must mantain this input rate because context is switching with each output pixel?

srg320 wrote:
magno wrote:I'm pretty sure decoding starts after writing to $4801
Ok, what will happen if 43x2-43x4 is writed after writing 4801, or selected more than one channel SDD1. And how SCPU will fetch next opcode after writing to 4801, if SDD1 will use ROM bus? Or, SDD1 reading ROM between SCPU access when RD and WR is high level?
These are good questions, I should check in the real hardware, but I haven't had free time to mount the components on my interface board (between zedboard and SNES). My guess is:
srg320 wrote: what will happen if 43x2-43x4 is writed after writing 4801
you shouldn't do that, in fact, neither SO nor SFA2 do that. But if you did, S-DD1 would start decoding from the last source address it had sniffed from SNES dara bus, I guess.
srg320 wrote: selected more than one channel SDD1
that's not a problem, you select which DMA channel to sniff writting to $4800 and which channel to decode writting to $4801. If you trigger a decompression from a different channel you sniffed, nothing happens, ie, DMA is filled with the same byte on each beat. I checked this on emulators, so maybe is not accurate.
srg320 wrote: how SCPU will fetch next opcode after writing to 4801, if SDD1 will use ROM bus?
SCPU is much slower than master cycle (6 or 8 cycles down), so it is easy to time-multiplex acces from SCPU and S-DD1 decompression core. But you need an input FIFO for data which will feed the decompression core.
srg320 wrote: SDD1 reading ROM between SCPU access when RD and WR is high level?
That can happen only during DMA: DMA engine stalls the SNES CPU while DMA is in progress; the CPU resumes after all bytes are transferred. In any other cases, S-DD1 decompression core doesn't need to access ROM if no decompression is running.
The only situation when collision occurs is after writting to $4801 and writting to $420B, because both decompression core and SCPU need data.
Star Ocean has some padding instructions between them (PLA - PHA) for delaying start of DMA so SDD1 has time enough to read first words from ROM and begin decoding.
srg320
Posts: 32
Joined: Fri Feb 16, 2018 5:52 am
Location: Ukraine

Re: SDD1 FGPA implementation

Post by srg320 »

magno wrote:
srg320 wrote: what will happen if 43x2-43x4 is writed after writing 4801
you shouldn't do that, in fact, neither SO nor SFA2 do that. But if you did, S-DD1 would start decoding from the last source address it had sniffed from SNES dara bus, I guess.
srg320 wrote: selected more than one channel SDD1
that's not a problem, you select which DMA channel to sniff writting to $4800 and which channel to decode writting to $4801. If you trigger a decompression from a different channel you sniffed, nothing happens, ie, DMA is filled with the same byte on each beat. I checked this on emulators, so maybe is not accurate.
srg320 wrote: how SCPU will fetch next opcode after writing to 4801, if SDD1 will use ROM bus?
SCPU is much slower than master cycle (6 or 8 cycles down), so it is easy to time-multiplex acces from SCPU and S-DD1 decompression core. But you need an input FIFO for data which will feed the decompression core.

The only situation when collision occurs is after writting to $4801 and writting to $420B, because both decompression core and SCPU need data.
Star Ocean has some padding instructions between them (PLA - PHA) for delaying start of DMA so SDD1 has time enough to read first words from ROM and begin decoding.
Thanks, you very helped me
User avatar
marvelus10
Posts: 243
Joined: Fri Feb 09, 2007 5:01 pm
Location: Nanaimo, BC Canada

Re: SDD1 FGPA implementation

Post by marvelus10 »

Have you guys joined the Classic Gaming discord, there is quite a lot of discussion on new SD2SNES projects there.
magno
Posts: 193
Joined: Tue Aug 15, 2006 5:23 am
Location: Spain
Contact:

Re: SDD1 FGPA implementation

Post by magno »

marvelus10 wrote:Have you guys joined the Classic Gaming discord, there is quite a lot of discussion on new SD2SNES projects there.
Yes, sure!
Markfrizb
Posts: 607
Joined: Sun Dec 02, 2012 8:17 am
Location: East Texas

Re: SDD1 FGPA implementation

Post by Markfrizb »

Forgive me for asking, but why not tackle the SFA2 decompression similar to what was done with Star Ocean? Then a "standard" cart could be used? Or am I missing the point?
magno
Posts: 193
Joined: Tue Aug 15, 2006 5:23 am
Location: Spain
Contact:

Re: SDD1 FGPA implementation

Post by magno »

Markfrizb wrote:Forgive me for asking, but why not tackle the SFA2 decompression similar to what was done with Star Ocean? Then a "standard" cart could be used? Or am I missing the point?
Well, it's more exciting to replicating the chip than decompressing the graphics XD It could be done, of course, but the task is more tedious and less creative. Moreover, implementing the chip could e useful for the scene to create hacks that uses it.
93143
Posts: 1715
Joined: Fri Jul 04, 2014 9:31 pm

Re: SDD1 FGPA implementation

Post by 93143 »

magno wrote:Moreover, implementing the chip could e useful for the scene to create hacks that uses it.
At least two large potential S-DD1 projects have been talked about here, although they are just speculation at this point (and although they aren't hacks, they do have copyright issues):

- a port of Metal Slug, as close to arcade-perfect as possible with period hardware
- a re-port of Street Fighter Alpha 2, employing advanced techniques to fix the music and sound, loading pauses, and cut-down graphics and animation

Both of these projects should fit comfortably in the S-DD1's available ROM space (which I believe is 16 MB addressable plus 3.875 MB in parallel) with the graphics compressed. Neither one is especially likely to fit in an ordinary cartridge, particularly since software decompression would take too much S-CPU power to be feasible, and the SA-1 imposes an 8 MB limit. In both cases, using the MSU1 would defeat the purpose of the project.

It could be argued that neither of these projects is likely to happen, but it's nice to know that they could.
Markfrizb
Posts: 607
Joined: Sun Dec 02, 2012 8:17 am
Location: East Texas

Re: SDD1 FGPA implementation

Post by Markfrizb »

- a re-port of Street Fighter Alpha 2, employing advanced techniques to fix the music and sound, loading pauses, and cut-down graphics and animation
So the game has issues already.... humph, wasn't aware of that.
Well, it's more exciting to replicating the chip than decompressing the graphics XD It could be done, of course, but the task is more tedious and less creative. Moreover, implementing the chip could e useful for the scene to create hacks that uses it.
But wouldn't that require a few options --- either A, someone would need to buy the FPGA SDD1 pcb (presumably a dev board), or B, wouldn't this lead to more cart destruction because some potential future game/hacks that needs the SDD1. Can the SD2Snes can run the SDD1 games?

The Star Ocean, since it's been decompressed, can play on a OEM style cart. https://youtu.be/_c2OoGkPA4o (video has no sound)
Truthfully, I'd like to see it decompressed, but I do understand the drive to replicate the sdd1 chip also. If I understood how FPGA's worked, I'd probably do the same thing. :D
93143
Posts: 1715
Joined: Fri Jul 04, 2014 9:31 pm

Re: SDD1 FGPA implementation

Post by 93143 »

Markfrizb wrote:So the game has issues already.... humph, wasn't aware of that.
It's a port of a CPS2 game, so it was never going to be perfect.

It's just that I and others feel that it was probably possible to do better. Maybe it would have been unreasonable to expect better under reasonable time and budget constraints, or for an affordable price. Maybe it was a corporate afterthought or a contractual obligation and didn't get a reasonable schedule or budget. Maybe the RAM-limited nature of the next-gen consoles made devs wary of pushing the SNES too hard for fear of making the PlayStation look bad. Maybe the programmers were just lazy or incompetent. Or maybe it really is as good as it can get on the hardware - but I doubt it.

The graphics seem to be smaller than they need to be, the screen is letterboxed, and the animations are missing frames. Preliminary calculations suggest that it may be possible to remedy all of these things with a sufficiently advanced animation engine and more ROM.

The vocals are muddy, the music is terrible, and the game has loading pauses where everything freezes. These things are intimately related, and I think it's possible to fix all of them at once with a high-bandwidth HDMA streaming scheme. Even if I'm wrong, there are multiple examples of games that handle on-the-fly ARAM loading better than this one.

The game also has some slowdown, and I don't really see why it should.
Post Reply