It is currently Sat Nov 18, 2017 12:51 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: Sat Jan 10, 2015 6:12 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3482
Location: Indianapolis
Tired of the same old memory map? Recently I was thinking that in theory, it should be possible for an NES cart to handle separate pages for code and data, like the W65C816 (but only for the cart mapped area, obviously). If you're not familiar with it, this is similar to what you'd use instead of bankswitching on the SNES, when using 16-bit addressing.

It's not simple, but I think it could be done in a small $5-$6 FPGA. It would basically be emulating the 6502, but as little of it as possible, based on observations of what the CPU is doing. I'm not concerned if it's practical, it's just been a fun puzzle for me to think about and get some more Verilog HDL practice, maybe I'll post it when it's more complete but I won't be able to test it anytime soon. If nothing else, I figured I'd share the idea, for your entertainment.

You could get away with much of it based the data bus alone, until you start branching, indexing across page boundaries, and using interrupts. For interrupts, you need to watch the full address bus, no way around that. Branching I believe can be done by also watching the address bus to detect the program counter location (and to know if we branched or not). One situation is when a branch instruction simply branches to the following opcode (which isn't as useless as it sounds, it's invaluable when you need a fractional delay in a cycle-timed loop), but I believe that can be handled by a special case for the opcode fetch following a branch instruction, if it's reads the following opcode address twice, then we're in the extra cycle of a branch to nowhere. For page crossing, you have to emulate the X and Y registers.

So it's basically it involves watching the data bus to detect the opcode, then we enter a state machine based on the type of opcode (pretty much the address mode type). CPU accesses memory for each instruction cycle, and every memory access will be 1 of 4 types: Data, Program Counter, Program Counter + increment, and opcode fetch (a special case of PC+increment which also resets the state machine). So the end result is it automatically selects one page on the Data cycles, and another on the Program Counter cycles.

Anyone see a reason it wouldn't work? Or any other ideas?


Top
 Profile  
 
PostPosted: Sat Jan 10, 2015 6:19 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19225
Location: NE Indiana, USA (NTSC)
A parallel 6502 to determine which cycles are opcode fetches could work, but I can't see the advantage it'd give over an MMC3-class mapper.


Top
 Profile  
 
PostPosted: Sat Jan 10, 2015 9:36 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3482
Location: Indianapolis
Yeah, that's what I meant when I said I'm not concerned if it's practical, it's complex but really doesn't allow anything new. It certainly could be MMC3-flavoured though, by having separate code and data banks per each region of memory.

The part I've so far been avoiding in this setup is DPCM. I think that could also work (and allow for a separate DPCM page), but it's not something I've looked into.


Top
 Profile  
 
PostPosted: Sat Jan 10, 2015 10:23 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19225
Location: NE Indiana, USA (NTSC)
If you're adding complicated hardware that controls the address bus, you can feed the "separate DPCM page" from a single-byte loop at $FFF0.


Top
 Profile  
 
PostPosted: Sun Jan 11, 2015 12:31 am 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3482
Location: Indianapolis
Nice, I like that idea. I remember lidnariq posted a similar concept, but narrowing down into a 1-byte looped sample is really cool. So the mapper would have it's own DPCM address/length registers, which means it could address all the memory on cart, and maybe even better, could support single-byte loop boundaries (vs the usual 16 bytes).

In this type of fully address-controlled setup too, another thing can be done is allowing multiple interrupt vectors. Convenient, if the mapper has multiple IRQ sources (maybe one reason would be to stop the DPCM channel, heheh).


Top
 Profile  
 
PostPosted: Sun Jan 11, 2015 1:29 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6442
Location: UK (temporarily)
At some point I persuaded myself that the "right" way to deal with DPCM streaming was to map INL's deserialized SPI ROM output idea over some fixed 4k window. You'd still have to dance between DPCM reads and normal level loading. But there's no reason to not use all the address lines you have connected to the CPU's bus, and switching it down from 4k to however much smaller you could justify would ease the sting of losing more of a fixed bank.


Top
 Profile  
 
PostPosted: Sun Jan 11, 2015 11:27 am 
Offline
User avatar

Joined: Mon Jun 02, 2014 7:47 pm
Posts: 19
Location: Cairo, Egypt
Why not just add some additional FIFO with its own DMA to deal with 4-bit DPCM for example?

If it's an attempt to evolve the existing 2A03 capabilities, then adding additional DMA hardware schemes would be favorable..


Top
 Profile  
 
PostPosted: Tue Jan 13, 2015 8:18 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3482
Location: Indianapolis
I think the sprite DMA might be reusable, for writing to the mapper and cart memory. Because it corrupts the OAM, it would have be done during vblank or with sprites off. That would be a nice way to update video memory though, for CHR that would be 16 tiles in 513 CPU cycles. On NTSC, if the screen was turned off just a couple lines early, there would be time for 5 DMAs, you could write 1kB of PPU data and update the OAM every vblank.

4-bit DPCM and any audio stuff can be done from the 2A03's perspective as an interrupt occurring at the sample rate, I've done a successful experiment like that using a PIC18 MCU. A while back tepples came up with a brilliant minimal audio interrupt routine, simply INC $4011 / RTI. An INC can read the audio from the mapper, write it to the NES DAC, and acknowledge the interrupt all in one instruction (it means you can't use sample value $7F though, with the 7-bit DAC it'll overflow). That's is low-overhead as it gets, good enough to be usable in a game, I think.

DMA functionality is definitely interesting to me, and something I've been planning out how to use in an ideal setup. If you look at an FPGA, even at the low end like a MachXO2 for $6, includes 8kB of embedded RAM. One could do some pretty good stuff with that.


Top
 Profile  
 
PostPosted: Tue Jan 13, 2015 8:39 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6442
Location: UK (temporarily)
Memblers wrote:
it means you can't use sample value $7F though, with the 7-bit DAC it'll overflow
I think I remember it being pointed out that it just adjusts the encoded range. e.g. if the read-only register there contains values from 1 to 128, then DEC $4011 would DTRT; or -1 to 126 with INC $4011. Using LSR $4011 even automatically converts unsigned 8-bit numbers.

Quote:
the low end like a MachXO2 for $6, includes 8kB of embedded RAM.
Their iCE40 series are a little cheaper and the next-larger-than-smallest also includes 4 or 8 KiB of block RAM. ... huh, now I'm not certain exactly what the differences are between the MachXO2 and iCE40.


Top
 Profile  
 
PostPosted: Wed Jan 14, 2015 7:57 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19225
Location: NE Indiana, USA (NTSC)
Memblers wrote:
4-bit DPCM and any audio stuff can be done from the 2A03's perspective as an interrupt occurring at the sample rate, I've done a successful experiment like that using a PIC18 MCU. A while back tepples came up with a brilliant minimal audio interrupt routine, simply INC $4011 / RTI. An INC can read the audio from the mapper, write it to the NES DAC, and acknowledge the interrupt all in one instruction (it means you can't use sample value $7F though, with the 7-bit DAC it'll overflow).

I refined it to dec $4011, which means you can't use $01. But in the topic CPLD square wave synthesizer, we determined that the bigger problem was that an IRQ during OAM DMA won't get seen until several samples later.


Top
 Profile  
 
PostPosted: Wed Jan 14, 2015 3:01 pm 
Offline
User avatar

Joined: Mon Jun 02, 2014 7:47 pm
Posts: 19
Location: Cairo, Egypt
lidnariq wrote:
Quote:
the low end like a MachXO2 for $6, includes 8kB of embedded RAM.
Their iCE40 series are a little cheaper and the next-larger-than-smallest also includes 4 or 8 KiB of block RAM. ... huh, now I'm not certain exactly what the differences are between the MachXO2 and iCE40.


The ICE40 Ultra family has max 26 I/O pins of the parts available at DigiKey, compared to MachXO2 (up to 108).. :D


Top
 Profile  
 
PostPosted: Wed Jan 14, 2015 3:11 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6442
Location: UK (temporarily)
The iCE40 Ultra series doesn't come in high pin counts, period. The iCE40 HX and LP does have 100+ pin packages, though...


Top
 Profile  
 
PostPosted: Wed Jan 14, 2015 7:23 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6442
Location: UK (temporarily)
tepples wrote:
an IRQ during OAM DMA won't get seen until several samples later.
I'm just not convinced that's that big of a problem...

The attached file is a simulation of such an instance: it's a 110Hz sine (or square) wave played at 6991 Hz (1789773÷256). I assume that DMA will block two updates every vblank (6991 Hz÷60 Hz≈i.e. every 116 samples), but the hardware automatically drops samples if necessary.

There are audible clicks in both the sine wave and square wave, but whether that matters ... I'm not qualified to judge.


Attachments:
demo-dropout.zip [9.51 KiB]
Downloaded 54 times
Top
 Profile  
 
PostPosted: Wed Jan 14, 2015 7:30 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3482
Location: Indianapolis
tepples wrote:
Memblers wrote:
4-bit DPCM and any audio stuff can be done from the 2A03's perspective as an interrupt occurring at the sample rate, I've done a successful experiment like that using a PIC18 MCU. A while back tepples came up with a brilliant minimal audio interrupt routine, simply INC $4011 / RTI. An INC can read the audio from the mapper, write it to the NES DAC, and acknowledge the interrupt all in one instruction (it means you can't use sample value $7F though, with the 7-bit DAC it'll overflow).

I refined it to dec $4011, which means you can't use $01. But in the topic CPLD square wave synthesizer, we determined that the bigger problem was that an IRQ during OAM DMA won't get seen until several samples later.


It's a shame I left my old Squeedo project in such a sad state (I didn't start using version control until much later), because I had tested that a little bit. From what I remember, I had an audio IRQ coming in at around 30khz, and when I used OAM DMA I didn't seem to hear a difference. I kinda expected a 60hz buzz or something, but nope. My samples were all square waves, maybe that combined with the tone frequencies being too low covered up the problem (doesn't matter how high the output rate is, if you're resampling it to 8khz or whatever). This was with any missed samples simply being dropped. With OAM DMA enabled, it's effectively giving you a lower sample rate 1% of the time. I'd bet it would sound better than digitized audio did on the Genesis, heheh.

Yeah with Lattice parts I've looked extensively into the MachXO2 and XP2, and not at all into the others. Those 2 have some overlap as well, the important difference being that XP2 has multipliers and DSP blocks. The (superficial) impression I got from ICE series is that it's geared towards low current consumption, if there are any trade-offs, it's possible that it won't matter for use with NES though.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group