It is currently Wed Oct 16, 2019 10:18 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 34 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Sun Apr 22, 2018 9:02 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8611
Location: Seattle
In my opinion, they're too expensive per cart.

The iCE40xx384s don't have enough I/O, even though they're otherwise cheap enough and have enough logic inside.

The iCE40xx1K parts are about $4, plus a few dimes for its boot ROM and two regulators. They're not 5V tolerant, but the cost of that translation is probably balanced out by cheaper 3V parallel NOR flash. Might be possible to justify it for MMC5-class stuff, but it's really the wrong place for anything less.

(Note: the $3 iCE40LP1K has way too few pins)


Top
 Profile  
 
PostPosted: Sun Apr 22, 2018 9:44 pm 
Offline
User avatar

Joined: Mon Apr 04, 2011 11:49 am
Posts: 2097
Location: WhereverIparkIt, USA
zzo38 wrote:
Do you know how suitable iCE40 devices are?

In my original post I didn't elaborate much on higher density devices with 1k+ logic elements. If you have a good use/need for that much logic the iCE40 devices are comparable to Lattice MachXO2 & Altera max10 devices. Price and density wise they're all somewhat comparable, and you can avoid BGA packages with all of them. If you're looking for MMC5 scale hardware the iCE40-HX series would be a good fit as the 100-144pin TQFP packages are offered in 1k-2k LE options.


lidnariq wrote:
(Note: the $3 iCE40LP1K has way too few pins)

For a traditional design this is true. But if you want to get fancy and take advantage of the level shifters and actually make them work for you, the 39 i/o may just be enough. I've got a time multiplexed 'dual port' design I'm prototyping on right now that only requires 53 gpio and has full PPU/CPU bus decoding and 1MByte of ROM/RAM. While working on it I've recently realized some other tricks that might be able to be pulled off to get the i/o count down to 33 which would fit within the iCE5LP1K/2K/4K QFN-48 package that provides 39 io.

_________________
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers


Top
 Profile  
 
PostPosted: Sun Apr 22, 2018 10:32 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8611
Location: Seattle
infiniteneslives wrote:
lidnariq wrote:
(Note: the $3 iCE40LP1K has way too few pins)
For a traditional design this is true. But if you want to get fancy and take advantage of the level shifters and actually make them work for you, the 39 i/o may just be enough.
Sorry, I elided a lot of details. What I should have said was "Be careful that the thing you think is cheap actually has enough pins; there are iCE40LP1Ks that only have 16 pins and 10 I/O.

That said, my caveats about the iCE40xx384s appears to have been wrong; they are now(?) selling iCE40LP384-SG32 (QFN32) for a quite nice price, and it's a little I/O limited (17 to 21). Other iCE40LP384 parts with more I/O are available for not too much more money, but require dealing with BGA (and thus possibly with 4-layer boards)

iCEstorm even recently grew support for the small 384-cell ones.


Top
 Profile  
 
PostPosted: Tue May 07, 2019 2:49 pm 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 161
infiniteneslives wrote:
For a traditional design this is true. But if you want to get fancy and take advantage of the level shifters and actually make them work for you, the 39 i/o may just be enough. I've got a time multiplexed 'dual port' design I'm prototyping on right now that only requires 53 gpio and has full PPU/CPU bus decoding and 1MByte of ROM/RAM. While working on it I've recently realized some other tricks that might be able to be pulled off to get the i/o count down to 33 which would fit within the iCE5LP1K/2K/4K QFN-48 package that provides 39 io.


I've been thinking of a design that would use a couple of small CPLDs like the Atmel ATF1502, each sitting on one of the address buses, with a resistor packs isolating the cartridge buses from the console. Since the address wires from the console become valid early in the cycle, typical cycles could start with the address wires floating, latch them in response to a slightly-delayed M2 signal, and then drive the proper address on the back half of the cycle. I don't think I'd bother connecting the CPU-side CPLD to the data bus; code can use addresses for signalling just as easily as data in most cases, and sometimes even more easily, and the pins could probably be better used for other things.

A key feature I'd include that I'd include, that I haven't seen in other mappers, would be a 74HC373 that would take input from the CPU-side data bus and output onto the PPU-side data bus. This would allow data to be copied into PPU-side cartridge RAM in response to CPU-side *reads*, at any time when the PPU isn't actively displaying picture content, thus doubling the speed of transferring data to PPU-side RAM. It might even be possible to include a mode that, when enabled, would cause the first half of each CPU cycle to use one address, latch data for the PPU data mid-cycle, while the second half (which would be seen by the CPU) used a slightly different address, thus allowing data transfer at a rate of one byte/cycle. The late-arriving /A15 would be a nuisance, but such a design could probably make something like the wire-frame graphics in Elite practical even on NTSC consoles with short VBLANK.


Top
 Profile  
 
PostPosted: Tue May 07, 2019 4:00 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8611
Location: Seattle
supercat wrote:
a 74HC373 that would take input from the CPU-side data bus and output onto the PPU-side data bus.
The PPU drives the AD0..7 pins continuously at every single moment that it is not itself asserting /RD. See Visual2C02 transistors t16479, t16501, t16516, t16511, and t16534.


Top
 Profile  
 
PostPosted: Tue May 07, 2019 4:09 pm 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 161
lidnariq wrote:
supercat wrote:
a 74HC373 that would take input from the CPU-side data bus and output onto the PPU-side data bus.
The PPU drives the AD0..7 pins continuously at every single moment that it is not itself asserting /RD. See Visual2C02 transistors t16479, t16501, t16516, t16511, and t16534.


I'd been planning on separating the cart-side address bus from the console via resistors; it sounds like i'd need to do likewise with the PPU data bus, but that shouldn't be a problem.


Top
 Profile  
 
PostPosted: Wed May 08, 2019 7:16 am 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 161
lidnariq wrote:
supercat wrote:
a 74HC373 that would take input from the CPU-side data bus and output onto the PPU-side data bus.
The PPU drives the AD0..7 pins continuously at every single moment that it is not itself asserting /RD. See Visual2C02 transistors t16479, t16501, t16516, t16511, and t16534.


Continuing my last point: if a cart includes resistors (maybe 2.2Kish) separating the address and data buses, do you see any problem with it "doing its own thing" independent of the PPU any time the PPU doesn't need to fetch meaningful data [e.g. because background rendering is disabled and there are no sprites at the current screen location]? While a true dual-port memory system would probably be awkward to implement, increasing the length of time that the main CPU would have to draw the screen when games don't need to show stuff at the top of the frame would be a big win, as would increasing the rate of data transfer. Using a resistors as a crude sort of "multiplexer" is something I did in my Atari 2600 super-banking cartridge to save macrocells. While that was only 1.19Mhz, I think the same principle should probably be usable here. Reliable operation might require an active bus transceiver on the data pins to ensure the cartridge can feed data back to the console fast enough, but I still think the approach would open up a world of possibilities.

Also, what do you think of the idea of having a cartridge include a mode where some addresses 0x6000-0x7FFF would access a banked region of RAM while M2 is low and then a different region (perhaps set all upper address bits to ones) once M2 goes high? I think the effect would be that a mixture of 2-3 byte instructions would execute code located in the top part of memory while the corresponding region below was copied to PPU memory. Unlike OAMDMA, this could happen concurrently with digitized audio playback DMA or even--if the affected address range stopped short of $7FFA--an audio playback IRQ.


Top
 Profile  
 
PostPosted: Wed May 08, 2019 11:33 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8611
Location: Seattle
supercat wrote:
do you see any problem with it "doing its own thing" independent of the PPU any time the PPU doesn't need to fetch meaningful data [e.g. because background rendering is disabled and there are no sprites at the current screen location]?
If it got garbage during unused name and pattern table fetches? No, I can't see why that wouldn't work.

I think the only tricksy part is making sure that any design doesn't violate the setup time for the PPU. Different parts of the PPU fetch cadence go to different places in the die, so it's not always obvious. Name table and sprite pattern fetches are synchronized through a transmission gate that opens halfway through /RD being asserted, and closes when /RD is deasserted. Attribute table and background pattern fetches use more complex logic but I'm pretty certain they ultimately use the same timing.

Quote:
Using a resistors as a crude sort of "multiplexer" is something I did in my Atari 2600 super-banking cartridge to save macrocells. While that was only 1.19Mhz,
I vaguely remember there being a lot of bus capacitance here. When we were doing the initial research on the VRC6's extended nametable modes, I remember asking BootGod to solder eight resistors (10k) in lieu of the ROM and NTRAM, from (VRC6 CHR ROM A10..A15 and VRC6 CHR ROM /CE and VRC6 NTRAM /CE) to (PPU D0..D7), and he only got garbage out.

Quote:
Also, what do you think of the idea of having a cartridge include a mode where some addresses 0x6000-0x7FFF would access a banked region of RAM while M2 is low and then a different region (perhaps set all upper address bits to ones) once M2 goes high?
What would be around to use the RAM while M2 is low?


Top
 Profile  
 
PostPosted: Wed May 08, 2019 2:35 pm 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 161
lidnariq wrote:
I vaguely remember there being a lot of bus capacitance here. When we were doing the initial research on the VRC6's extended nametable modes, I remember asking BootGod to solder eight resistors (10k) in lieu of the ROM and NTRAM, from (VRC6 CHR ROM A10..A15 and VRC6 CHR ROM /CE and VRC6 NTRAM /CE) to (PPU D0..D7), and he only got garbage out.


Sounds like a bus driver would probably be required then, which could be enabled using PPU /RD. If software uses the "outside" '373 for writing data, there would be no need for the cart to care about anything PPU /WR or anything the PPU does with the data bus.

Quote:
Quote:
Also, what do you think of the idea of having a cartridge include a mode where some addresses 0x6000-0x7FFF would access a banked region of RAM while M2 is low and then a different region (perhaps set all upper address bits to ones) once M2 goes high?
What would be around to use the RAM while M2 is low?


The '373 (or--thinking about it--maybe a '374) feeding data to the PPU-side memory. A typical usage scenario would be something like:

Code:
    ; Assume last 256 bytes of RAM hold 254 instances of $C9, then $60 and an arbitrary value.
    lda destHi
    sta $2006
    lda destLo
    sta $2006
    ldx srcBank
    cmp $4200,x ; Assume this sets bank for use at $6000-$6FFF and enables "fast transfer" mode
    ; telling the PPU-side CPLD to latch the address and enable its own address drivers, and
    ; respond to each write pulse by writing a byte and incrementing its address.

    ; Assume springBoard and springBoard+1 hold $4C $00
    lda srcPage ; In the range $60-$6F
    sta springBoard+2
    jsr springBoard ; Copy 256 bytes of data from data banked at $6xxx while running code at $7Fxx
    cmp $4300,x ; Assume this disables "fast transfer" mode (stop driving PPU-side address)

The CPU would think it was fetching 127 "CMP #$C9" instructions followed by "RTS" and a dummy byte from addresses $6n00-$6nFF, but it would actually see byte values fetched from the last 256 bytes of RAM. Meanwhile, the 256 bytes banked at $6n00-$6nFF would be sent to the PPU-side memory.


Top
 Profile  
 
PostPosted: Wed May 08, 2019 2:53 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8611
Location: Seattle
supercat wrote:
jsr springBoard ; Copy 256 bytes of data from data banked at $6xxx while running code at $7Fxx
Oh, the ZX80 trick. Read the byte in the first half, replace it with something else (a 1-cycle NOP there, a NOP-slide here) as long as it's strictly in-order and no bytes are refetched... I wonder if there is any actually useful computation that would have PC doing that.

What's the receiving side look like?


Top
 Profile  
 
PostPosted: Wed May 08, 2019 4:20 pm 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 161
lidnariq wrote:
supercat wrote:
jsr springBoard ; Copy 256 bytes of data from data banked at $6xxx while running code at $7Fxx
Oh, the ZX80 trick. Read the byte in the first half, replace it with something else (a 1-cycle NOP there, a NOP-slide here) as long as it's strictly in-order and no bytes are refetched... I wonder if there is any actually useful computation that would have PC doing that.

What's the receiving side look like?


If there's no need for the main CPU to signal the PPU-side CPLD during rendering, one could have two control wires between the two CPLDs, called "/control" and "/wppu", the latter of which could also be tied to latch enable (active high) and output enable (active low) on the '373 connecting the buses as well as /WE on the PPU RAM. The "/control" input would act as an active-low output enable and active-high latch enable for the address pins on the CPLD(*), so while it's high the PPU would have control of the bus in normal fashion. When /control goes low, the CPLD would latch the current address into a counter and output that counter on the bus until /control goes high again. Each rising edge of /wppu would increment the counter.

(*) The latch enable could be implemented using asynchronous set/asynchronous reset product terms on a register that would otherwise respond to rising clock edges. Banking control options could be controlled when rendering is disabled by reading addresses in the 3xxx range.


Top
 Profile  
 
PostPosted: Wed May 08, 2019 7:04 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3737
Location: Indianapolis
It sounds interesting, if it can work. What concerns me is that I don't think you can safely rely on the address lines being stable early, from discussions I've had with kevtris (while he was designing the Hi-Def NES), the address lines and M2 are going to be problematic if you want to be compatible with all consoles. Just be aware, if something tricky like this works on the G-revision CPU, doesn't mean it will work just as well on all the other revisions. It had been a while, but I had seen some of kevtris' logic captures for comparison, I seem to remember especially M2 being different-looking on the older revision CPUs. Also, the M2 cycles during operations like OAM DMA and DPCM playback might be really different from a "normal" cycle. It's enough to cause bus conflicts when the PRG bus is outputting fast enough. I'd seen OAM DMA get corrupted when using a ROM emulator, and the earliest version of the NES PowerPak had a similar problem. Both were solved by putting (IIRC) 300 ohm resistors in series with the data bus.

Once I have a testbed for it, one thing I'd like to try is using OAM DMA to copy to multiplexed CHR-RAM. 2 cycles per byte that way, not too bad. I hadn't thought of that before seeing this thread, but maybe one could take over both read/write cycles of the DMA?

The Game Genie is also doing some multi-operation-per-cycle type stuff. Because it does that data comparison, it must be outputting data from the cartridge, doing the comparison which if it matches, shuts off the cartridge bus and outputs it's own data. Not really any relation to this, other than being unusual things on the CPU bus..


Top
 Profile  
 
PostPosted: Wed May 08, 2019 7:52 pm 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 161
Memblers wrote:
It sounds interesting, if it can work. What concerns me is that I don't think you can safely rely on the address lines being stable early, from discussions I've had with kevtris (while he was designing the Hi-Def NES), the address lines and M2 are going to be problematic if you want to be compatible with all consoles. Just be aware, if something tricky like this works on the G-revision CPU, doesn't mean it will work just as well on all the other revisions. It had been a while, but I had seen some of kevtris' logic captures for comparison, I seem to remember especially M2 being different-looking on the older revision CPUs. Also, the M2 cycles during operations like OAM DMA and DPCM playback might be really different from a "normal" cycle. It's enough to cause bus conflicts when the PRG bus is outputting fast enough. I'd seen OAM DMA get corrupted when using a ROM emulator, and the earliest version of the NES PowerPak had a similar problem. Both were solved by putting (IIRC) 300 ohm resistors in series with the data bus.


I would expect that delaying M2 slightly would be a good idea so as to ensure that /ROMEN becomes available before the the CPLD decides what to do with addresses. On the Atari 2600, the address bus gets driven early in the cycle, but I could imagine that OAMDMA could be problematical. If so, however, that may simply mean that one can't use DMA audio playback during data transfers. My idea of being able to use audio IRQs probably wouldn't work without extra logic in the CPLD because IRQ handling would start with two ignored code fetches.

A two-cycle-per-byte approach would be to add logic to the CPLD so that the first access in the zone will use a normally-banked address while latching data, and the next will use a modified address but not latch. From outside the zone, load the first byte of the zone and then jump there. The CPU would then fetch the first byte, ignore the second (allowing it to be used to supply PPU data), re-fetch the second as an opcode, ignore the third, etc. If a single-cycle-per-byte approach couldn't work, the two-cycle approach could probably be used instead, but I think the single-cycle approach would probably be simpler and better.

BTW, audio IRQs with the single-cycle approach could probably be accommodated by having the CPU-side CPLD keep track of whether the last byte came from an even or odd address and only pay attention to accesses of the other kind. If DMA addresses are resolved late, they may still porse unsolvable problems, but fast graphics would be cool even if they couldn't coexist with audio playback.

Can you point me to any timing measurements or other precise specifications about the sub-cycle behavior of the CPU and PPU?


Top
 Profile  
 
PostPosted: Thu May 09, 2019 5:53 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21634
Location: NE Indiana, USA (NTSC)
One possibility is OAM DMA twice during vblank: once with the data to send to the pseudo-dual-port video memory on the cartridge, and again with the actual data to send to OAM. This way, CPU A7-A0 could act as the counter, and the cart would snoop the first DMA to transfer one byte per two cycles.

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
PostPosted: Thu May 09, 2019 6:53 am 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 161
tepples wrote:
One possibility is OAM DMA twice during vblank: once with the data to send to the pseudo-dual-port video memory on the cartridge, and again with the actual data to send to OAM. This way, CPU A7-A0 could act as the counter, and the cart would snoop the first DMA to transfer one byte per two cycles.


I'm not sure I really see the advantage here unless the timing of OAMDMA is more favorable (I see no reason to expect it to be, but without seeing scope traces I can't tell). During the second cycle of each transfer the main bus is going to hold an address that's useless to the cartridge, so the cart would have to latch the LSB of the first-cycle address in order to do anything useful in the second cycle. A key aspect of my intended approach is that the console would drive the low part of the address bus so the cart wouldn't have to output it at all.

For compatibility with the widest range of consoles, I would think it would probably best to ignore OAMDMA except for ensuring that the cart is fast enough to process those cycles "normally".


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 34 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Gilbert and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group