Adding features to discrete mapper with multipurposed CIC

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

User avatar
Memblers
Site Admin
Posts: 4044
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by Memblers »

Oh, that is nice, I didn't know about these. I rarely peeked into Digikeys "multi-purpose logic" category, assuming it would be oddball stuff. But that definitely is handy. I was about to use a 1G32 OR gate in something, seems kind of foolish now since the 1G97 will do the same and more for the same price. The only cost is connecting one extra pin to VCC/GND, and that's nothing. Looks like there is also 1G57, 58, 96, 98, 99 with other assorted functions. Cool find.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Been making some progress on my CIC implementation this week. I've got a fully functioning multi-region CIC implementation running on the STM8, fully tested in with all 4 region CICs. The current implementation has the mcu core being clocked by 4Mhz CIC clock, so the core is synchronous and uses cycle counting to ensure timing. Code changes became much simpler once I realized that pipeline placement/timing of code that followed my new changes wouldn't be affected if my changes were always in multiples of 4 bytes.

Learned a few things I was asking myself earlier in this thread, so I wanted to share that info. Perhaps at some point I'll make my investigation more complete and post differences between 6113 and 3193 CIC versions on the RE thread.

Segher's disassembly of the NES CIC seems to be a 'cartridge CIC' KEY only 6113 version. When looking through his disassembly I was confused as to how it could all be working if that code was running on the console's LOCK. In his disassembly, the LOCK set's it output high shortly after resetting the KEY. And if you read the disassembly from the key perspective, that Dout pulse after reset, sends the KEY to go perform the 'magic' code that has an unknown opcode. If the key were to go perform that code, it's timing will end up being off (behind) reguardless of what that unknown instruction performs.

As discussed in my earlier post in this thread, I wasn't seeing the LOCK take it's Dout high after reset with the logic analyzer. Which means segher's disassembly doesn't go off and perform that magic code, so it's timing is fine and everything works. At the time I wondered why Dout wasn't getting set high after resetting the Key as the disassembly showed. Just recently I socketed the CIC on my console, and replaced the 3195 CIC (always in consoles as LOCK), with a 6113 which is only found in cartridges. When watching signals with logic analyzer, low and behold the LOCK was setting Dout high after resetting the KEY just as segher's disassembly shows. And the stream ID is delayed as one would expect due to the magic code being ran. If I put the 3195 in the cartridge, nothing works just as kevtris found.

Knowing all this, it now makes sense as to why the only combo of 6113-3195 that doesn't work is 6113-LOCK, 3195-KEY. This also explains why the KEY's Dout can't be too early, even though segher's disassembly suggests that it can be. The 3195 panics if KEY's Dout is high more than a couple cycles before the bit is supposed to be transferred. The 6113 doesn't make this check, but the 3195 must be.

Beyond that, I tested out my idea of dual purposing an mcu pin with the reset and KEY Din signals. Wired together for ORing worked for the most part. Everything looked good on the logic analyzer, and my implementation was able to pick up on the reset signal embedded into the KEY Din stream. But for some reason it wasn't the most stable, as pressing reset multiple times would cause problems. If I kept tapping reset it would come back again and start working, but eventually fall out again as I kept tapping reset. I'm really not sure what the problem was. Everything looked good in the LA captures, but for some reason it was falling out after a few bit transfers at times. I should probably have connected up to the oscope to get a better visual of the signals, but my first guess is that there is too much tension between the ORed output drivers on the console CIC. I wondered if using a proper logic OR gate would resolve the issue, but then came up with a better idea. I ended up wire ORing the KEY Dout and reset signals together. This ended up working great, so the mcu just changes it's DDR for the pin after the reset signal is latched. With this trick to cut down mcu i/o to 3 pins was every bit as stable as my 4 pin version.

So next step is to give an asynchronous implementation a try, ditching the 4Mhz CIC clock and letting the mcu run off it's internal 16Mhz oscillator. I suppose it would be best to ensure there aren't any issues with the whole idea of having the NES CPU read and write directly from the mcu's pins as a mapper register. Without that working, there isn't much value in an asynchronous SMT8 CIC being dual purposed as a discrete mapper 'co-processor'.

Even without the asynchronous solution, the current implementation will be of good use for my non-discrete mapper designs. Having a mcu at my disposal for boot time tasks can be rather useful effectively providing NVM to a CPLD which doesn't have any internal flash memory available. This will be helpful for things like my VRC board to work for all variants without needing to reconfigure the CPLD. Will also allow an MMC1 board to work as multiple different configurations that are normally incompatible with each other such as SEROM, SXROM, SNROM, SKROM etc.

Although I've been eyeing the stm32f030 as a dual purpose CIC as of late too, and there's quite a few reasons it's a better choice than the STM8 depending on one's goals. The STM8 still looks to be the better choice for expanding a discrete mapper. But when looking to expand the abilities of an ASIC mapper, the features STM32 starts to shine.
  • First reason is that the STM32 has more 5v tolerant i/o than the STM8 when powered by 3v. And powering the dual purposed CIC by 3v becomes desirable when the CPLD isn't 5v tolerant itself. Rather not need to worry about level shifting signals between the mcu and CPLD.
  • The other nice feature of the STM32 is that the PLL is able to multiply the external clock input. The PLL can multiply the external clock from 4Mhz to the max core freq of 48Mhz, this can't be done with the STM8. So the STM32 can be fed with 4Mhz CIC clock, and have a synchronous timer ensuring proper CIC transaction timing. This avoids timing drift issues between the mcu and console CIC.
  • Lastly, the STM32 just has a lot more horsepower with it's 48Mhz 32bit cortex M0 core, compared to the 16Mhz 8bit STM8. Implementing semi-complex synths, high speed external interfaces (SDcard, USB, Bluetooth, Wifi, etc), are all more possible with the STM32. Additionally the stm32 has better tool chains available to it in terms of free C compilers and libraries.
So I'm not sure where I'll take this next. But that's where I'm currently at. I'll probably have to shelve this project for a bit while I focus on some other high priority projects. Now that I've got a basic CIC STM8 CIC working I can convert all my designs away from the attiny13 which was my original and initial goal.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Having a hard time putting down this whole idea of a discrete mapper with CIC mapper expansion.. The infinite number of possibilities that could be unlocked without increasing the BOM cost by a single cent is hard to keep myself from day dreaming about. I've came up with what I think is a fairly clean way to handle mcu mapper register reads and writes. But until someone comes to me and wants to write software targeting this idea, I have a hard time motivating myself to devote the time to fully developing this idea. On top of that, the idea of implementing this in an emulator on anything but a highly abstracted level sounds like living hell. So for now, I'll just document my idea here in public as best I can. Good chance I'll use it as reference in the future if there is outside interest in the event this idea does become a reality. Or perhaps someone else would like to take my idea and run with it which I'm perfectly fine with.
infiniteneslives wrote:My proposed pin assignments would allow for 4bit nibble wide read/writes at a minimum. If one wasn't looking to utilize the UART then the entirety of PORT D could be used for 6bit wide accesses.

There is a problem though as we can't be certain the mcu is always able to listen to writes to $6000. The mcu could be currently interrupted by CIC comms which must have a higher priority. I can't think of a very clean way to get around this without adding dedicated logic.
Now that I'm more familiar with the STM8 and the CIC's requirements, I've got a better idea of how to handle R/W accesses from the NES CPU. The key comes from making the mcu register r/w interrupt higher priority than the CIC comm timer. This is possible because of the relatively large 6.7usec window we have to output CIC stream bit, lets just call it a 5usec window to be conservative. With that large of a window, there is time to service the potential for a CIC transfer inside the mcu mapper register r/w isr *if* there's an explicit definition of how to r/w to the mcu mapper reg.

My hardware proposal is still similar to my original idea of dedicating one of the discrete mapper flipflop bits to interrupt the mcu. Let's say $8000.7 for discussion's sake. The NES CPU must set this bit, then r/w from the mcu register, and clear $8000.7 in rapid succession. If we explicitly state how this instruction sequence is to be performed, it also provides the benefit of simplifying address decoding. We can actually create a large number of mcu registers effectively decoded by NES CPU address, while only utilizing CPU R/W, and CPU D0-4 as mcu inputs. But how??

This is my NES CPU instruction sequence proposal on how to write a byte to the mcu:

Code: Select all

pseudo code as preparation for write is not timing sensitive, just trying to illustrate idea:
-load A with byte that would like to write to mcu 
-transfer A to Y register (Y will contain the lower nibble to write to mcu reg)
-shift A register to the right 4 times (places upper nibble of mcu reg value in bits 3-0)
-transfer A to X register (X will contain the upper nibble that'll be written to mcu reg, but it's placed in bits 3-0 as that's all the mcu sees)
-load A with current bank of discrete mapper register
-set bit 7 of A (this bit being set @ $8000.7 will interrupt the CIC mcu for mapper r/w)

;now that everything's prepared, perform the mapper write:
STA $8000   ;write to discrete mapper with bit 7 set
STY $5000   ;write lower nibble to mcu
STX $5000   ;write upper nibble to mcu
AND #$7F    ;clear bit 7 so we can disable mcu's interrupt
STA $8000   ;write to discrete mapper with bit 7 clear, CIC mcu interrupt complete
So we're defining the exact sequence of what must be done whenever $8000.7 is set. We know a STY and STX instruction will follow immediately after $8000.7 is set. And the NES CPU won't waste any time clearing $8000.7 once write is complete. This creates a very specific timing constraints from the CIC mcu's perspective.

And since we know exactly what NES CPU instructions are being used, we can simplify address decoding by sniffing the CPU data bus alone. The mcu doesn't need to decode any CPU address lines with this trick, but it has visibility of whichever CPU data pins it's connected to. For this implementation, I've chosen to only connect the STM8 to the lower nibble CPU D0-3. The tssop-20 STM8 doesn't have a full 8bit wide GPIO port pinned out, the most it gives is 6 pins with PORT D1-6.

By sniffing D0-3 during the STX/STY instructions, we can glean CPU A11-8, and CPU A3-0 for the upcoming write cycle. We can afford to cut out CPU A13/14 for mcu decoding purposes, as the mcu is no longer listening for $6000-7FFF as in my original idea. Here the mcu can only decode CPU A11-8 & A3-0. But that's pretty legit as it gives us a potential for 256 mcu mapper registers to work with. To be clear, CPU A15-A12 aren't actually being decoded with this implementation. Selection of $5000-5FFF for the location of the mcu register is arbitrary, that's simply an a convenient address space which doesn't conflict with anything else in the NES CPU memory map.

Since the mcu can sniff D3-0 during the STX/STY opcode fetch, it can differentiate between STY/STX with the lower nibble of opcodes $8E/$8C. So we don't have to require both the upper and lower nibble always be written, nor in a specific order. We just have to pick a convention of X/Y being Hi/Lo nibbles and stick with it.

The requirement to clear $8000.7 asap comes from the fact we can't tie up too much of the mcu's time, as the $8000.7 interrupt is getting set as the mcu's highest priority interrupt. So we have to free it so i can get back to CIC mangle calculations and such in it's main thread. While there's an abundant amount of time that could allow for more than 1 byte to be written at once, things get complex quick trying to define a larger r/w routine with explicitly defined timing to provide the mcu.

If one had an application where larger transfers were desired, my idea about requesting the CIC mcu to interrupt the NES CPU when there's a sufficiently large period of time that CIC comms can be ignored is the better solution and would be relatively easy to implement. We need a means to transfer a single byte before we can solve the KByte transfer solution.

Before we get too far, I want to come up with a definition of our convention for mapper register reads. I'm expecting that this can be pulled off somehow. Although details on the best way to do this didn't start to come to me until I started thinking about how the mcu ISR would work. The CIC mcu ISR for register r/w gets tricky quick. There's a lot of things it needs to ensure and they're all timing sensitive. One of the biggest issues becomes accounting for the 5 cycle jitter for when the ISR starts executing. Putting more burden on the ISR with tasks like determining if the 6502 is reading or writing really starts to become a challenge. We don't have much option with this hardware definition to use a separate ISR for both reads and writes. The only good way to have separate R/W ISRs would be to devote another discrete mapper flipflop bit, one for reads, one for writes. I don't much like that idea though, we may not have bits to spare.

Here's my KISS solution that combines the NES CPU mcu register reads into the same routine with writes:

Code: Select all

pseudo code as preparation for write is not timing sensitive, just trying to illustrate idea:
1) load A with byte that would like to write to mcu (can skip to step 5 if only care about reading from mcu)
2) transfer A to Y register (Y will contain the lower nibble to write to mcu reg)
3) shift A register to the right 4 times (places upper nibble of mcu reg value in bits 3-0)
4) transfer A to X register (X will contain the upper nibble that'll be written to mcu reg, but it's placed in bits 3-0 as that's all the mcu sees)
5) load A with current bank of discrete mapper register
6) set bit 7 of A (this bit being set @ $8000.7 will interrupt the CIC mcu for mapper r/w)

;now that everything's prepared, perform the mapper write:
STA $8000   ;write to discrete mapper with bit 7 set
STY $5x0x   ;write lower nibble to mcu
STX $5x0x   ;write upper nibble to mcu
;write complete, now read back the old value that was in the mcu register
LDY $5x0x   ;read old value from mcu register (lower nibble)
LDX $5x0x   ;read old value from mcu register (upper nibble)
AND #$7F    ;clear bit 7 so we can disable mcu's interrupt
STA $8000   ;write to discrete mapper with bit 7 clear, CIC mcu interrupt complete

;At this point we've effectively completed a SWAP operation between X/Y registers lower nibbles and mcu mapper register $5x0x
This may seems a little confusing as to why we're writing, and then reading. And what if you didn't want to overwrite the value of a register, and you only wanted to read it? My thought is that the mcu register definitions would overcome this issue. We've got up to 256 registers to work with, so just define them as read only, or write only as needed. So the NES CPU code you're writting probably only cares about read or write, but by using a swap operation, we can tackle two birds (read & write) with one stone (mcu ISR).

Additionally I'm going to discard my earlier idea that the mcu will decode STY/STX by sniffing the opcode. As I get into the details of the ISR, the more that we can simplify with convention of the 6502's r/w routine, the easier life is for the STM8. So for discussion's sake we'll require the sequence of STY-STX-LDY-LDX as lined out by the routine above. Additionally, we'll effectively require that routine to be copy pasted into 6502 assembly code, with only possible changes to be the mcu register address. The x's in $5x0x denote address nibbles that can be modified. But the address for all four load/store's addresses must match. The mcu ISR isn't going to have time to decode each and every one and adapt on the fly. If the 6502's read/write routine is running in rom, this definition would require a separate routine for each register. That may not be an issue if only using a few registers. A more versatile way would be to execute the routine from SRAM and use self modifying code to change the absolute address of the STY-STX-LDY-LDX instructions prior to executing the read/write routine.

Now to try and explain how all this would work from the CIC mcu's perspective... So now we've got an explicitly defined timing of bus operations from the time that the mcu receives it's $8000.7 interrupt, we can utilize cycle counting within the mcu mapper r/w ISR to latch address and data from the NES CPU. But since this ISR is designed to be of higher priority than the dedicated CIC comm ISR, the mapper r/w ISR must also handle necessary CIC comms should they be needed while it's running.

I've gotten into the details of how the STM8 CIC KEY would run asynchronous from the console's LOCK in previous posts in this thread. The basic idea is that there's an mcu timer which is used for counting down to when the next CIC transfer needs to occur. My plan is to use TIM2 for this purpose which in reality can only count up, but math can turn that around. The timer's ISR will account for drift of the clocks by polling LOCK's Dout when expected to be high. That ISR will also set/clear KEY Dout as necessary, but it's a lower priority routine than this mcu register r/w ISR I'm about to discuss.

The CIC mcu is running at 16Mhz with 62.5nsec period, and the NES is running at 1.79Mhz with a period of 559nsec (assuming worst case NTSC). So there are ~8.9 STM8 cycles per 6502 cycle. And we've got a window of 5usec that a CIC bit must be output when needed. That CIC window equates to ~8.9 cycles on the 6502, and 80 cycles on STM8. So it looks as though we've got plenty of time to get everything done if our ISR is smart enough.

Here's some psuedo code and STM8 assembly to give timeline of how I picture the ISR to work, cycle numbers on left are STM8 cycles. I'm sure there are some errors on exact timing of everything, but this gets the idea accross.

Code: Select all

0: NES CPU sets $8000.7 to trigger ISR (6502 end of STA $8000 cycle T3)

1-6: complete instruction in execute cycle (1-6 cycles)  -Ooof!  we'll have to account for that potential jitter...

2/7-11/16: push registers to stack (9 cycles)

8-17: jump to ISR (docs not explicit on # of cycles, assuming it's 1 cycle like the JMP instruction)

9-18: start executing ISR 
       Oops!  The 6502 has executed ~1-2 cycles by this point..  
       We don't have a good way to ensure we can sniff T0 & T1 of the first STX/STY, which means we don't know if it's STX/STY
       One possible solution would be to define that a NOP is required between STA $8000 and STX/STY.
          -No one likes wasting time!  And this still doesn't solve the jitter issue.
       Another would be to just make it convention that STY is first, however ADL in T1 (our ability to sniff CPU A3-0) may have passed us by.
          -This is the reason I made the decision to nix the ability to handle different orders of STX/STY, and require all addresses to match.
       We also have to account for the 5 cycle ISR latency jitter, and get aligned with the 6502.
       We could let the ISR spin polling CPU R/W and align itself when it goes low.
          -This is half of the reason why STore is first, and LoaD is second.
          -Other half of reason is logically this is only way X/Y registers can be preserved during a SWAP.
       Perhaps it's for the best that STY T0 & T1 have passed us by as we didn't yet have a way to account for ISR jitter anyway

So at this point we know PRG R/W will go low around cycle 27, but we're somewhere between cycle 9-18 and don't know where..
Additionally every ~80 STM8 (or ~8 6502) cycles we need to check the CIC comm timer and output a bit if necessary.

;spin until R/W low for STY T3
rw_still_high:
BTJT    rw_port, #rw_bit, rw_still_high    ;2/3 cyc

STY cycle T3 starts around STM8 cycle 27
Now we've accounted for jitter and we're ~29 STM8 cycles since 6502 set $8000.7

;Delay a few cycles until CPU D3-0 should be valid for STY T3
30:   NOP, NOP...

;Latch CPU D0-3 for STY T3
33:   MOV  low_wr_data, data_port

6502 is about to go from STY T3 to STX T0 (occurs at STM8 cycle ~36) , this is a good time to handle a CIC comm if needed.

;STM8 assembly rough idea of how check if time to output CIC comm (total 4 STM8 cycles)
LDW    X, TIM2_CNTR           ;2cyc
SUBW  X, #$FFF0                 ;2cyc
JRMI    no_comm_needed    ;1/2cyc
MOV     Dout_port, out_val  ;1cyc
no_comm_needed:

"reset" count for CIC comm window.

We're in the middle of STX T0 currently.  Delay until can sniff ADL from STX T1
NOP, NOP...

;Latch CPU D0-3 for STX T1 to sniff CPU A3-0
50:   MOV  low_addr, data_port

;Delay till STX T2 to sniff CPU A11-8
NOP, NOP...
59:   MOV  high_addr, data_port

;Delay till STX T3 to latch upper nibble of mcu register write
NOP, NOP...
68:   MOV  high_wr_data, data_port

69-99:
All data has been latched for mcu register write, we also know CPU A11-8 & A3-0 for upcoming register read.
We'll assume that the register address can be mapped to a fixed block of STM8 SRAM.
During this time we'll consume a few STM8 cycles to piece together latched high_addr:low_addr
and map that to an STM8 address we can set the X register to point to.
Copy, shift, and mask that the lower nibble of data into data_output_port for upcoming LDY T3
Copy, shift, and mask that the upper nibble of data into SRAM for quick access when time for LDX T3.

Perhaps 30 cycles isn't enough time to handle all that, but it should be for simple tasks.
Worst case require a NOP inserted between STX-LDY if needed.
Even better idea: move AND #$7F instruction between STores and LoaD instructions!

100:
register read lower nibble already stored in data port output register.
Set port register DDR to enable register data to drive 6502 data bus D3-0
Delay while 6502 is latching read for LDY T3
107:
disable data port output drivers with mcu DDR

108:
It's been ~72 cycles since we checked if a CIC comm was needed.  Perfect time to check again.

Copy prepared SRAM byte back in cycles 69-99 from SRAM to data port output register

Delay till LDX T3

136:
enable data port output DDR
Delay while 6502 is latching read for LDX T3
142:
disable data port output drivers with mcu DDR

Need to wait for NES CPU to clear $8000.7
This will take 6 cycles on 6502, STM8 can't return from interrupt until complete to prevent re-entry.
Should perform some more CIC comm timer checks during this time.
STM8 IRET takes a whopping 11 cycles, worst case a CIC comm timer interrupt occurs during that IRET.
Need to ensure adequate time for CIC comm timer interrupt to handle a comm that's needed as this ISR returns.
Additionally this routine left KEY Data high if a comm was needed.
Need to ensure the CIC comm timer ISR will clean up after this routine and clear Dout when no longer needed.

;return back to main thread where CIC mangle operations can continue.
;or whatever request made by the 6502 via this routine can be performed.
IRET
Phew, There you have it! So this mcu register r/w routine could hold a higher interrupt priority for the STM8 mcu compared to the CIC comm timer which would have second priority. Any other interrupts would have to have a lower priority than these two, and the STM8 must be set to nested interrupt management mode. That way higher level interrupts are able to interrupt lower priority ones ensuring mcu register r/w are always serviced, and no CIC comms are missed. Beyond all this one just needs to ensure the mcu isn't over worked and that it has adequate time to complete CIC mangle calculations.

The biggest risk for this would be if the NES programmer were to perform multiple mcu register r/w operations back to back. Would have to do some worst case analysis on the time required for mangle calculations. This entire ISR is ~200 STM8 cycles, which is only ~13usec, that's a relatively small amount of time on the scale of the CIC timing and calculations.

My current mangle table routine is ~100 STM8 instructions, isn't well optimized, and takes ~42usec to perform mangle, and spins for ~30usec waiting for the console CIC to perform it's calculations. So that's some where ~60% cpu utilization during the most intensive calculations. During bit transfers, the CIC timer xfr ISR should only need ~5-10% cpu utilization tops. A conservative estimate would be that 70% of CIC time is mangle calc, and 30% is bit transfers. That weighted average STM8 cpu utilization comes out to ~50% which I consider a rather conservative estimate.

A practical rule that would keep from over utilization would be to require ~20usec (~35 NES CPU cycles) between mcu register accesses.


EDIT: My original estimate was flawed in that it neglected to calculate the fact that an asynchronous STM8 CIC would be running at 16Mhz, not 4Mhz. Here's a better cpu utilization estimate:

Code: Select all

Mangle calculation: 
STM8: 100 instructions = 10.5usec
CIC average mangle 80usec = 13% CPU utilization during mangle calculations

Bit transfers:
Estimate timer ISR to run for ~5usec average maximum (time that pulse is high plus drift trimming)
CIC period of bit transfers 79usec = 6% utilization during bit transfers

Average number of bit transfers = 8 * 79usec = 632usec
Average number of mangle calcs = 16 * 80usec = 1280usec
% time bit transfers = 632 / 1912 = 33%
% time mangle calc  = 1280 / 1912 = 67%

Weighted utilization:
bit transfers 6% * 33 = 1.98%
mangle calc 13% * 67 = 8.71%
total utilization 1.98% + 8.71% = 10.7%
So in reality the CIC operations only utilize ~10% of the STM8's processing time. The register read/write ISR is ~12.5usec, with the time the 6502 is going to have to spend processing data coming in and out it's going to have a hard time overloading the STM8 with r/w accesses alone. In practice one might set a rule to not let that 12.5usec exceed 75% of the STM8's utilization. That would equate to providing the STM8 with a 4usec (~8 NES CPU cycles) between mcu register accesses. That's only a couple of instructions which isn't really enough to do anything worthwhile between accesses. In practice I wouldn't expect the STM8 cpu utilization to become an issue until it started being tasked with compute intensive tasks such as sound synthesis, or large UART data transfers perhaps? Those tasks would be make a lesser priority than register accesses, and CIC comms, so at least they wouldn't risk locking up the console/CIC.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Been sinking my teeth into USB lately, finally got a pretty good grasp of the protocol and everything necessary to implement it. Got me thinking about another possible use for this project. I believe it would be within reason for the STM8 to act as a USB 1.1 host to simple devices like a mouse/keyboard. My idea would be implemented somewhat similarly to V-usb with bitbanging on the i/o, requiring very few external components.

I couldn't easily get google to tell me what version of USB the majority of keyboards/mice utilize, so I checked a more recent Dell one I have sitting around and it reported 1.1 in the device descriptor. I'm thinking it would make sense that the majority of them use 1.1 in efforts to be more compatible, and no real need for 2.0 speeds.

In reality though, supporting a PS/2 protocol would be *MUCH* less effort, and probably be more stable/compatible. That and USB to PS/2 converters are cheap if one doesn't have a PS/2 keyboard/mouse. But that wouldn't have the same cool factor and push one of the lowest cost mcu's on the market (STM8) to it's highest limits!!

The real annoyance with a cartridge providing support for an external peripheral is making a connector accessible for plugging into, and all that can't really be considered 'free' anymore. At which point a $1-3 bluetooth module starts to make more sense. Bluetooth would have the benefit of getting as simple as protocols get for the CIC mcu with SPI. But has the drawback of compatibility with whatever device the user happens to own which I can only assume would be a nightmare. Perhaps that's not the case though, I've never tinkered with BT as a developer, only have my user experiences to taint my impression of BT. I started to look into it awhile back and the annoyances with lack of compatibility between BT versions was enough of a deterrent..
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

USB HID devices are apparently supposed to be always at 1.5Mbit/s (USB1.0) speeds.

If there's a hub in the way, the hub might retime it to 12Mbit/s speeds.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

lidnariq wrote:USB HID devices are apparently supposed to be always at 1.5Mbit/s (USB1.0) speeds.
Curious if you have a link/quote on that, suppose I should just look at the standard but I always second guess what I'm reading with those things.. If I've learned anything about USB so far it's that the standard is more like a guideline, and things work differently in practice than documents suggest.

USB HID devices are certainly not required to be 1.5mbit, plenty of projects/products utilize mcu's like the stm32 which only support 12mbit. They then take advantage of HID class to get around driver requirements. So can't see much in the way of one making a 12mbit USB keyboard. I'd assume keyboards with built in hubs step up the speed as you mention. It makes sense that keyboard manufactures would only utilize 1.5mbit for compatilbity/cost reasons, but IDK what keyboard/mouse manufactures are really doing...
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

Hm. I know I had read that somewhere, but I can't find any source corroborating it.

And certainly any of the high speed (1kHz) mice must run at at least 12Mbit/s (although they might default to a lower rate?), so...

Probably best to forget I said that.
na_th_an
Posts: 558
Joined: Mon May 27, 2013 9:40 am

Re: Adding features to discrete mapper with multipurposed CI

Post by na_th_an »

infiniteneslives wrote:I've came up with what I think is a fairly clean way to handle mcu mapper register reads and writes. But until someone comes to me and wants to write software targeting this idea, I have a hard time motivating myself to devote the time to fully developing this idea. On top of that, the idea of implementing this in an emulator on anything but a highly abstracted level sounds like living hell.
What exactly do you need? You know I can write games ;) What kind of features would it support? PRG/CHR banking? 16K/32K? CHR banking? 8K, more granularity? Or I have missed it completely and it has nothing to do with this?
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

na_th_an wrote:
infiniteneslives wrote:I've came up with what I think is a fairly clean way to handle mcu mapper register reads and writes. But until someone comes to me and wants to write software targeting this idea, I have a hard time motivating myself to devote the time to fully developing this idea. On top of that, the idea of implementing this in an emulator on anything but a highly abstracted level sounds like living hell.
What exactly do you need? You know I can write games ;)
Really I just need some interest from someone who would like to develop games/tools targetting this hardware to give me some motivation. I know personally I'm not likely to get past a demo proving the hardware functional. Other thing that's needed/helpful is if the developer/someone else is willing to add emulator support themselves if they feel it's necessary. I can provide a flashable prototype cartridge & programmer for quick build testing on real hardware. I assume creating low level emu support which will actually emulate the limits of the STM8 core properly will be a challenge for the most seasoned emu developer. So if one wanted to target this hardware, would be best to test on real hardware frequently anyway.
na_th_an wrote: What kind of features would it support? PRG/CHR banking? 16K/32K? CHR banking? 8K, more granularity? Or I have missed it completely and it has nothing to do with this?
As for what features are possible, the specifics are up in the air at the moment. Should things progress from here, I would likely focus on implementing the features said developer(s) were most interested in. This STM8 "CICOprocessor" is only capable of certain types of features if we don't equip the board with any extra logic to help it out. The original idea here is to have the ability to add the features below to nearly any discrete mapper. I only need one mapper register flipflop bit to perform what I have in mind. So the developer could pick from something like BNROM, AxROM, colordreams, or UxROM including any homebrew variants.

The idea of this CICOprocessor is for it to be an expansion that could be added to nearly any mapper. We don't necessarily need to limit it to discrete mappers, I only had the idea to target discrete mappers as it provided these features without adding expensive hardware. The CICOprocessor isn't well suited for bank switching tasks, so the memory banking would be left up to whatever mapper choice the developer made. Choosing discrete mappers still limits you to 16/32KB PRG banking, and 8KB CHR banking effectively. Desires of finer banking effectively necessitates addition of a CPLD on the board, doing that opens a can of worms. Putting a CPLD on the board allows the mcu to run at 3v at which point I'd like to target the STM32 instead and we've lost our sense of scope. So for now I'd like to focus on what features are possible for expansion of discrete mappers.

Potential features the CICOprocessor are capable of adding to a discrete mapper
Keep in mind, CICOprocessor registers would be limited to 4bits in size due to it only having access to D0-3.
  • 1) IRQ timer: there are no counter inputs available, so the counter would have to be clocked by STM8 internal 16Mhz HSI, (EDIT: or 128Khz LSI) RC oscillators. Considering traditional IRQ counters don't fire at exact PPU timings due to PPU-CPU alignment I don't see lack of a synchrous counter being an issue. A programmable prescaler that allows the timer to be set to PAL/NTSC/Dendy timing might be possible.
    [EDIT: Take back what I said, TIM1 is able to be clocked externally by one of 2 different pins. So scanline counter (potentially using PPU /RD, A13, or A12), and/or NES 6502 cycle counter are now planned features.]

    2) Expansion sound: Paired with my solderless expansion port dongle, and addition of a couple cap/resistors to the board for a PWM DAC, a few extra audio channels could be created. In that thread we address some limitations of tuning. The STM8 won't handle something on the level of FM synthesis. But the addition of a few square, triangle, sine channels are within reason. Within limitations of memory, samples could be possible as well. Top loaders would require soldered mods to work, and I assume most clones wouldn't support. I'm curious about high end clones like the NT and AVS, hopefully they support EXP6 audio.

    3) Game save data: While this feature is easily unlocked with discrete mappers using flash memory, it often requires extra logic gates, and is challenging to program requiring save routines to execute from SRAM. The STM8 has upwards of 128Bytes of eeprom, and some portion of it's 8KB flash instruction memory that it could provide the 6502 for save data use. The interface would be simpler than self-programing PRG-ROM flash, as it could be implemented as a simple set of mapper commands "write nibble #x to value y", or "read nibble #x" for example.

    4) Switchable H/V mirroring: The addition of a small 1G97 logic gate would allow the STM8 to control a mux selecting which mirroring was active. However expanding any discrete mapper with CHR-RAM to 4screen mirroring is trivial and free for homebrew use, but 4screen mirroring is incompatible with cheap clones. This CICOprocessor H/V selectable mirroring would have the benefit of working with clones.

    5) Data processing: I have a hard time envisioning what types of data processing would be worthwhile to hand off to the STM8. While the STM8 does have a hardware multiply/divide, IDK if the current nibble register access to it would make it a significant benefit to the 6502. Perhaps if one had some complex math tasks that weren't time sensitive; operands could be fed to the CICOprocessor which would interupt the 6502 when it was done crunching the numbers.

    6) Serial interfaces: This is more of a wide open category of possible features that gets unlocked with using the CICOprocessor to give the NES a serial interface. My current nibble register interface is somewhat limiting as it's not most speedy. But it should be more than fast enough if we're not handling large chunks of data. We have a hardware SPI & UART interface at our disposal which opens the door to an infinite number of devices. But it's also reasonable for the STM8 to bitbang some other interfaces as well. Here are some possible ideas:
    -SPI flash chip: access to MB/GB worth of data at low costs of serial flash chips
    -SD card slot: really just a fancy form factor of the above idea, but comes with the added challenge of file structuring which will likely consume fair amount of STM8's available instruction rom.
    -Internets: Could add a low cost ESP8266 Wifi module via UART interface for under $5.
    -Bluetooth: Could add a low cost HC-06 BT module via UART interface for under $5. Could use this for any number of peripherals, or maybe even some internet access.
    -Peripherials: bitbang P/S2 interface for keyboard/mouse connectivity is easily within our abilities here. I mentioned a bitbanged USB 1.1 host being possible, but I think this is pushing possibilities to the max..
    -Your imagination: come up with something of your own, maybe some IR led device, anything that a couple spare general purpose i/o allows you do dream up on the cartridge.
    -Real Time Clock: I probably shouldn't bring this up as it's a feature the STM8 can't support itself. But with the addition of a RTC chip and battery it would be possible. I might argue the STM32 would be a better choice as it includes an RTC but that's a separate discussion..

    7) Boot features: This is my primary intended feature set to take advantage of with my board designs. My USB programmer will be able to program mapper configuration info into the CIC. This allows the CIC to select H/V mirroring at boot time allowing me to do away with the toggle switch for H/V mirroring which requires the case to be opened and user know what H/V mirroring to select. I plan to also use this feature set to make my CPLD designs more flexible. The CIC will provide the CPLD some NVM that allows the CPLD to act as multiple different predefined hardware configurations selected at boot time.
I've already converted most of my NES designs to the STM8 CIC and planning to release my boot features once I complete my rewrite of the inlretro/kazzo software release. It's about time for me to layout a new discrete mapper board for my next PCB order. I might be able to pull of a direct swap from attiny13 to STM8 on my discrete design, but with all this I'm tempted to rearrange everything and start from scratch since I'll need a new set of stencils anyway. If I start from scratch I should be able to make space for routing CPU signals to the CIC as I've proposed so far. I plan to add the H/V mirroring mux so I can axe the toggle switch. I will probably add a cap and 1-2 resistors to provide PWM DAC allowing synth support. Beyond that, I will probably route the SPI/UART signals to the edge of the board for prototyping or direct soldering of a proposed module/wiring to the PCB. Whether or not these things will be made use of in the near future I'm not sure, I've got quite a few irons in the fire right now.. This minimalist CICOprocessor idea is fun, but without external motivation it's not very high on my development task list.
Last edited by infiniteneslives on Mon Apr 27, 2020 8:05 am, edited 4 times in total.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
FrankenGraphics
Formerly WheelInventor
Posts: 2064
Joined: Thu Apr 14, 2016 2:55 am
Location: Gothenburg, Sweden
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by FrankenGraphics »

2) Expansion sound
Some thoughts... I suspect composing music for a wholly new "sound chip" might be daunting. You'd either need to:
1) develop a branch of FT supporting it so you can write music with quick aural feedback
2) edit, write to cart, play on hardware, rinse and repeat
3) compose in a notation program or midi editor, then convert that to the specific format, by hand or script or both, or
4) do it all in theory then punch in the data (which usually turns out rather primitive, which would go against the purpose of the feature).

Then there's need of a music playing engine capable of playing it, so there's still that to do.

Implementing one of the sound chips (or parts of one) currently supported by FT might help. Actually, it'd be attractive.

Another possibility with the expansion sound is not having sfx interrupt bgm notes. You could select from an options menu wether to use internal channels or external for sfx (depending on if you have the dongle or not). If the ext sound is interfaced as the internal ones would, the sfx part of the sound engine wouldn't need to be much different for the two options (which goes against the "supported by FT" argument as far as BGM goes, but for SFX, it would be fine).
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

You bring up great points FrankenGraphics. Personally I'm not really motivated to write the FT support much like my lack of interest for writing emulator support. So this would be up to the developer/others to overcome, although I'm all for making design choices that aid in their effort.
FrankenGraphics wrote:Implementing one of the sound chips (or parts of one) currently supported by FT might help. Actually, it'd be attractive.
This was my initial thought. Having the cartridge mimic the likes of VRC6, MMC5/2A03, or sunsoft5b would be within reason. Only armed with the multitasked STM8 CIC and PWM DAC, not sure we can afford to get too caught up in the details of replicating the originals with near perfection. With the minimalist goals one would have to accept the CICOprocessor synth for what it is flaws/quirks/personality and all.
Then there's need of a music playing engine capable of playing it.
This would be an issue even if the CICOprocessor synth is to mimic existing synths as we only have the 4bit registers to work with. Not sure how many sound engines support 'standard' expansion audio right now anyway. That said the audio registers should be able to be arranged in a somewhat similar fashion to the originals if desired.
Another possibility with the expansion sound is not having sfx interrupt bgm notes.
This would be a good option if the creator wanted a minimal experience difference for consoles without expansion sound support. I'm no musician/composer but I don't think it would be the worst thing in the world if only backup/background voices were placed on the expansion synth. Gimmick! for example plays most sfx on the synth which everyone notices when missing. But there are also some voices on the synth which I didn't even realize were missing without the synth for the longest time. Personally, I think Gimmick's songs stand alone pretty well even when the synth is missing. But for the trained ear, and one who recognizes the fullness that's missing, will appreciate the extra channels. My point/opinion is, the music doesn't have to fall apart when a channel is missing in order to be appreciated for when it is present.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

Arbitrarily assuming that the desired features are "some kind of IRQ" and "some kind of sound", I suppose it's worth asking just how much synth can fit?
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

lidnariq wrote:Arbitrarily assuming that the desired features are "some kind of IRQ" and "some kind of sound", I suppose it's worth asking just how much synth can fit?
Yes that's the bigger question. My estimates put CIC operations at ~10% utilization of the STM8 core, so there should be a fair amount of CPU time available for audio synthesis/mixing. In practice I'm sure it'll come down to a trade off between number of channels and desired sample rate. I've yet to get my hands dirty with audio synthesis on these mcu's but have my sights set on doing so. I'll have to report back once I've got some hard data.

FWIW I had to take back my mention of 128KHz LSI being a possible clock source for timers. We're effectively limited to fmaster = 16Mhz with prescalers. There are 3 timer counters available, here's my current plan for each:

TIM1 16bit advanced control timer: up/down counter with auto-reload. Prescaler set to any integer from 1-65536. This being the only up/down counter available makes it the best candidate for PWM DAC providing center aligned PWM mode. This counter also has best pinout to GPIO. This is the most capable timer on chip, it shouldn't be needed for CIC operations, so we get to task it as desired.

TIM2 16bit general purpose timer:
up counter with auto-reload. Prescaler any power of 2 from 1-32768. This timer should be adequate for CIC timing management. But it sure would be nice to have a 16bit counter for an IRQ timer if TIM1 is being to work as a PWM DAC. TIM2's pinout isn't as great with a few mapped to what's been chosen for CPU D0-3. There are two other outputs available which conflict with the SPI pins, but one of those is slave select which could me mapped elsewhere via software.

TIM4 8bit basic timer: up counter with auto-reload. Prescaler any power of 2 from 1-128. This counter doesn't have any channel outputs pinned to GPIO. Being an 8bit counter makes it more challenging to use for CIC timing as the theoretical max mangle time is ~2.7msec = 44k clocks at 16Mhz. That theoretical max doesn't even fit in a 16bit counter without prescaling/rollovers. One should be able to pull off using this smaller timer for CIC operations but it will certainly be more challenging and require more CPU time counting rollovers. One solution to fit the CIC in TIM4 would be to set at a higher prescale during mangle timing, and finer prescale between bit transfers.

Suppose I'll make it my goal to pull off CIC timing using TIM4, and leave TIM1 & TIM2 for sound & IRQ timer.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
FrankenGraphics
Formerly WheelInventor
Posts: 2064
Joined: Thu Apr 14, 2016 2:55 am
Location: Gothenburg, Sweden
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by FrankenGraphics »

infiniteneslives wrote:not sure we can afford to get too caught up in the details of replicating the originals with near perfection. With the minimalist goals one would have to accept the CICOprocessor synth for what it is flaws/quirks/personality and all.
This shouldn't be a problem, i think. If we can use the current version of FT to give us a rough but good enough idea, and then verify how it actually sounds on hardware now and then, that's much easier than needing to verify it on HW after each edit. It should be ok in most cases that FT won't be What You Hear Is What You Get; it's still usable for the sake of composing. If it's close enough, interface-wise and sound-wise, it wouldn't be as dependent on true FT support. What's left then is making a converter and the engine itself. If FT support would arrive eventually, it would be a great addition, but using the synth wouldn't be dependent on it this way. So, i'd like to propose to have the synth making conversion as straightforward as possible. The more that is, so to speak, one for one (not taking into account the register size), the better. If it sounds a bit different, that's not too much of a worry.

If you as a composer/dev plan on using these "expanded" carts, you must have a kazzo anyway, so the hardware requirement is already met.
If you're a team, though, you must convince the composer (or yourself if you're the composer) to get a kazzo for testing at home. But it's not that much of a step, i think? I mean, it's a one-time 20-30 usd + shipping depending on version.
I'm no musician/composer but I don't think it would be the worst thing in the world if only backup/background voices were placed on the expansion synth.
I agree. A composer should be able to write music that would carry the idea on its own for the internal sound, and use ext. sound as support. Overtones (playing a harmony at a low volume), drum support (like snare tone while internal tri is still playing bass), extra echoes, chorus, and the occasional extra harmony. Or go all in, and add an "only works with dongle" sticker, if you really want to.

If the tuning on one, several or all notes isn't 100% perfect relative to that of the internal (which has its own temperament), you might even exploit that as a chorus effect without having to fine-pitch bend it.
na_th_an
Posts: 558
Joined: Mon May 27, 2013 9:40 am

Re: Adding features to discrete mapper with multipurposed CI

Post by na_th_an »

As far as modifying emulators to provide support, I'm afraid all I can do is to dully simulate some features, such as the ability to switch mirroring by software. It's quite a task.

The different features you mention sound great. The extra sound channels, for example, or the aforementioned mirroring toggle. I could design a simple UNROM64 game which would change mirroring mid game, for example, but I can't really go any further as, as I said, my abilities are quite limited when it comes to providing emulator support.
Post Reply