Introducing the VeriSNES (FPGA-based SNES)
Posted: Mon Nov 21, 2016 2:08 pm
Hello all,
I'm really excited to finally share this with you. I've been working on implementing the SNES in an FPGA using Verilog HDL. In the past I've implemented an NES, SNES APU, HQ2X filter, etc in an FPGA so I figured full SNES was the next logical step. I've been making lots of progress and I wanted to share with you all the current state of the design.
Hardware setup:
- The entire SNES is hooked up to my TLA715 logic analyzer for investigation purposes. I have attached some images below for you to check out. My TLA715 supports up to 256 simultaneous channels @ 250MHz sample rate with a 64M sample buffer depth.
- I've primarily been using the SD2SNES for running my own custom test ROMs in addition to original carts for the most popular games.
- I'm using a Cyclone 4 FPGA, specifically the Terasic DE2-115 board.
Logic blocks that are completed:
- S-SMP and S-DSP - I completed this some time ago. As far as I know it's the most accurate FPGA-based APU core ever created. I ran my Verilog implementation against Blargg's C-based APU core using an automated script for all 35,000+ SPCs on snesmusic.org and my Verilog implementation matched Blargg's output bit for bit.
- S-CPU 65816 - This is 100% complete and fully verified. I used SystemVerilog Assertions (SVAs) and a formal equivalence tool to verify proper behavior of every combination of internal register, opcode, and addressing mode on every microcycle. I wrote over 3,200 SVAs by hand from scratch for the formal equivalence checking. For those of you who don't know what formal equivalence checking is you can google it - but essentially it will literally create a formal proof which states that no matter what combination of inputs are given to a particular logic block that said logic block will never violate a particular assertion. If anyone is really interested I suppose I could show you what a simple assertion looks like, but without understanding the syntax it would be pretty useless. My 65816 core is probably the most accurate hardware implementation of an 816 ever created aside from the original ASIC. The CPU is fully microcoded for all instructions so it is very resource efficient.
- S-CPU DMA and HDMA - Both of these are complete and working very well. My logic analyzer setup was extremely useful for implementing these blocks. The MDMA implementation was pretty easy, but the HDMA was an incredibly huge complicated pain in the ass. The way HDMA interacts with MDMA (stalling in-progress MDMAs or killing them), the multiple modes of operation, corner cases, etc, etc, etc. Suffice it to say, it sucked! There are probably a few clock cycle differences here an there in my HDMA implementation versus a real SNES but I got tired of working on it - I do intend on going back and fixing them but for now I wanted to move on to get some graphics working. It's at least 98% accurate which is good enough for me for now.
- S-CPU Dynamic Clocking - The ability for the S-CPU to dynamically switch between 1.79M/2.68M/3.58M depending on IO cycle, memory region, control register settings is 100% complete.
- S-CPU Joypad/Auto-Joypad interface is 100% complete.
- S-PPU1/2 Register files and Bus-B interfaces are 100% done.
- S-PPU1/2 VRAM/OAM/CGRAM interfaces are 100% done.
- S-PPU graphics background/sprites = 1% (i.e. one) done. Haha. I've only just spent a few hours on this so far. You can see the very buggy initial results for mode 0 in the video below.
- S-WRAM - Both A-bus and B-bus interfaces are 100% complete and verified.
- Standalone FPGA-based affine transformation and perspective projection (i.e. mode 7) implementation is complete. I made this just to see if I could do it and it turned out to be super cool and I learned a lot. The code will be extremely useful and portable to the mode 7 logic in the SNES. I am 100% confident in implementing mode 7 once I get to that point. I've provided a video link to a demo of this code.
Miscellaneous items:
- Created over 30 original test ROMs with pass/fail conditions. Approximately 8 of those test ROMs (about half are HDMA tests) will only pass on my VeriSNES and a real SNES, no other emulators can pass them including bsnes/higan with accuracy profile - this is primarily thanks to my logic analyzer setup where I can see everything happening on a per clock-cycle basis. This might sound cool but it has actually made development quite a bit more difficult since I can no longer rely on BSNES/Higan for the accuracy I need when comparing tracelogs (for one example). Instead I'm basically forced to use my logic analyzer for almost everything...but in the end I suppose that's probably better anyway.
- I'm actually able to load and run nearly every single game that I test out (as long as it doesn't require a special enhancement chip). Even though I have no graphics output I can still hear the attractors/demos running (since my APU core is 100% complete). I can even interact with the games using the joypad, for example in SMW I can press the start button, select a level, jump around, collect coins, kill enemies, etc I just can't see anything. Lol. I actually also played LoZ blind while running a software emulator simultaneously and matching button presses. I was actually able to get all the way to the dungeon where Link meets his father and picks up the sword - at that point I just stopped playing but it was super fun to do!
- I've captured 100s upon 100s of GBytes of logic analyzer traces covering nearly every aspect of the SNES hardware and its behavior including odd corner cases and other esoteric behaviors.
- I am actually implementing each of the S-PPU chips individually just as they are implemented in the original SNES as opposed to one combined humongous PPU1+PPU2 blob of logic. This forces me to critically think about exactly how the two PPUs are communicating with each other in order complete a particular task.
Next to be implemented:
- Most common video/background modes, then sprites.
- I started with Mode 0 because it was the simplest and most straight forward so I'll probably finish that off first and then move on to Mode 1.
YouTube videos for your viewing pleasure:
Introduction to the VeriSNES (w/ first ever graphics output!)
- https://www.youtube.com/watch?v=sRVRcXRQkGA
FPGA-based Affine Transformation and Perspective Projection (SNES Mode 7) Demo
- https://www.youtube.com/watch?v=e6UhJ3qV2aI
Timelapse of MDMA Verilog Implementation for FPGA-based SNES
- https://www.youtube.com/watch?v=Yo0t6hAjARU
Timelapse of HDMA Verilog Implementation for FPGA-based SNES
- https://www.youtube.com/watch?v=YcNFs0r1oz8
Request: If anyone has any clues or ideas whatsoever what is causing the Bad Apple demo to run so slowly I would love to hear them. I'm hoping it's as simple as just being some register/feature I haven't implemented yet which is causing the issue. Remember that this is a hardware emulator, not software, so it's not an issue with the VeriSNES slowing down under load or anything - hardware always runs at a constant rate regardless of what it's doing. Here is a link to the exact time index where I start the Bad Apple demo.
That's all for now!
I'm really excited to finally share this with you. I've been working on implementing the SNES in an FPGA using Verilog HDL. In the past I've implemented an NES, SNES APU, HQ2X filter, etc in an FPGA so I figured full SNES was the next logical step. I've been making lots of progress and I wanted to share with you all the current state of the design.
Hardware setup:
- The entire SNES is hooked up to my TLA715 logic analyzer for investigation purposes. I have attached some images below for you to check out. My TLA715 supports up to 256 simultaneous channels @ 250MHz sample rate with a 64M sample buffer depth.
- I've primarily been using the SD2SNES for running my own custom test ROMs in addition to original carts for the most popular games.
- I'm using a Cyclone 4 FPGA, specifically the Terasic DE2-115 board.
Logic blocks that are completed:
- S-SMP and S-DSP - I completed this some time ago. As far as I know it's the most accurate FPGA-based APU core ever created. I ran my Verilog implementation against Blargg's C-based APU core using an automated script for all 35,000+ SPCs on snesmusic.org and my Verilog implementation matched Blargg's output bit for bit.
- S-CPU 65816 - This is 100% complete and fully verified. I used SystemVerilog Assertions (SVAs) and a formal equivalence tool to verify proper behavior of every combination of internal register, opcode, and addressing mode on every microcycle. I wrote over 3,200 SVAs by hand from scratch for the formal equivalence checking. For those of you who don't know what formal equivalence checking is you can google it - but essentially it will literally create a formal proof which states that no matter what combination of inputs are given to a particular logic block that said logic block will never violate a particular assertion. If anyone is really interested I suppose I could show you what a simple assertion looks like, but without understanding the syntax it would be pretty useless. My 65816 core is probably the most accurate hardware implementation of an 816 ever created aside from the original ASIC. The CPU is fully microcoded for all instructions so it is very resource efficient.
- S-CPU DMA and HDMA - Both of these are complete and working very well. My logic analyzer setup was extremely useful for implementing these blocks. The MDMA implementation was pretty easy, but the HDMA was an incredibly huge complicated pain in the ass. The way HDMA interacts with MDMA (stalling in-progress MDMAs or killing them), the multiple modes of operation, corner cases, etc, etc, etc. Suffice it to say, it sucked! There are probably a few clock cycle differences here an there in my HDMA implementation versus a real SNES but I got tired of working on it - I do intend on going back and fixing them but for now I wanted to move on to get some graphics working. It's at least 98% accurate which is good enough for me for now.
- S-CPU Dynamic Clocking - The ability for the S-CPU to dynamically switch between 1.79M/2.68M/3.58M depending on IO cycle, memory region, control register settings is 100% complete.
- S-CPU Joypad/Auto-Joypad interface is 100% complete.
- S-PPU1/2 Register files and Bus-B interfaces are 100% done.
- S-PPU1/2 VRAM/OAM/CGRAM interfaces are 100% done.
- S-PPU graphics background/sprites = 1% (i.e. one) done. Haha. I've only just spent a few hours on this so far. You can see the very buggy initial results for mode 0 in the video below.
- S-WRAM - Both A-bus and B-bus interfaces are 100% complete and verified.
- Standalone FPGA-based affine transformation and perspective projection (i.e. mode 7) implementation is complete. I made this just to see if I could do it and it turned out to be super cool and I learned a lot. The code will be extremely useful and portable to the mode 7 logic in the SNES. I am 100% confident in implementing mode 7 once I get to that point. I've provided a video link to a demo of this code.
Miscellaneous items:
- Created over 30 original test ROMs with pass/fail conditions. Approximately 8 of those test ROMs (about half are HDMA tests) will only pass on my VeriSNES and a real SNES, no other emulators can pass them including bsnes/higan with accuracy profile - this is primarily thanks to my logic analyzer setup where I can see everything happening on a per clock-cycle basis. This might sound cool but it has actually made development quite a bit more difficult since I can no longer rely on BSNES/Higan for the accuracy I need when comparing tracelogs (for one example). Instead I'm basically forced to use my logic analyzer for almost everything...but in the end I suppose that's probably better anyway.
- I'm actually able to load and run nearly every single game that I test out (as long as it doesn't require a special enhancement chip). Even though I have no graphics output I can still hear the attractors/demos running (since my APU core is 100% complete). I can even interact with the games using the joypad, for example in SMW I can press the start button, select a level, jump around, collect coins, kill enemies, etc I just can't see anything. Lol. I actually also played LoZ blind while running a software emulator simultaneously and matching button presses. I was actually able to get all the way to the dungeon where Link meets his father and picks up the sword - at that point I just stopped playing but it was super fun to do!
- I've captured 100s upon 100s of GBytes of logic analyzer traces covering nearly every aspect of the SNES hardware and its behavior including odd corner cases and other esoteric behaviors.
- I am actually implementing each of the S-PPU chips individually just as they are implemented in the original SNES as opposed to one combined humongous PPU1+PPU2 blob of logic. This forces me to critically think about exactly how the two PPUs are communicating with each other in order complete a particular task.
Next to be implemented:
- Most common video/background modes, then sprites.
- I started with Mode 0 because it was the simplest and most straight forward so I'll probably finish that off first and then move on to Mode 1.
YouTube videos for your viewing pleasure:
Introduction to the VeriSNES (w/ first ever graphics output!)
- https://www.youtube.com/watch?v=sRVRcXRQkGA
FPGA-based Affine Transformation and Perspective Projection (SNES Mode 7) Demo
- https://www.youtube.com/watch?v=e6UhJ3qV2aI
Timelapse of MDMA Verilog Implementation for FPGA-based SNES
- https://www.youtube.com/watch?v=Yo0t6hAjARU
Timelapse of HDMA Verilog Implementation for FPGA-based SNES
- https://www.youtube.com/watch?v=YcNFs0r1oz8
Request: If anyone has any clues or ideas whatsoever what is causing the Bad Apple demo to run so slowly I would love to hear them. I'm hoping it's as simple as just being some register/feature I haven't implemented yet which is causing the issue. Remember that this is a hardware emulator, not software, so it's not an issue with the VeriSNES slowing down under load or anything - hardware always runs at a constant rate regardless of what it's doing. Here is a link to the exact time index where I start the Bad Apple demo.
That's all for now!