Adding features to discrete mapper with multipurposed CIC

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

Post Reply
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Adding features to discrete mapper with multipurposed CIC

Post by infiniteneslives »

Starting a more in depth conversation about an idea I recently shared in discussion on methods for parallax techniques.

So the CIC is the 'necessary evil' of a ~50cent chip that must be included on any game seeking to be published in 72pin form. The current popular choice is the attiny13, selected by both Jim's cool and Krikzz. They utilize the 4Mhz CIC clock as the mcu's clock source and instruction cycle counting to ensure proper timing of communications with the main board CIC lock. The selection of the attiny13 is a rather easy choice to target for a CIC solution. It effectively the lowest cost microchip (AVR/PIC) solution that has adaquate i/o and nvm to store the current region.

For several different NES/SNES projects I've been considering an alternate CIC solution that might allow the 50cent budget of the CIC to go towards a more powerful chip that would be capable of being dual tasked with CIC comms and some sort of mapper interfacing. The attiny13 doesn't really have the io nor time to spare for dual tasking.

Looking over Krikzz's implemenation after the first few cycles where the current stream is determined, the main loop spins for 260+ cycles (65 usec) between each Din/Dout transaction. I took some quick logic analyzer captures to confirm there is ~75usec between each transaction. So the attiny13 operating at 4Mhz is only spending ~15% of it's time processing CIC communications. With not other mcu timers available and nothing better to do with it's time the attiny13 has no choice other than cycle counting. Trying to make use of the 2 remaining i/o, during that under utilized time would be rather challenging with exact cycle counting and not options for interrupts etc.

So I started entertaining the idea of other chips available on the market. For this purpose I discounted all other microchip offerings as they will be more expensive than the attiny13. With those options discounted, and only the requirement of 5v supply and NVM/eeprom available there are a couple interesting options.

Cypress's CY8C4013SXI-400 is an interesting option with it's 16Mhz cortex M0, but only has 5io. I'm tempted to target it considering it's cheaper than the attiny13 in volume, but the lack of spare i/o significantly limits what's possible. The recent SROM cracking that might double the flash for free is an interesting bonus however..

The two most interesting prospects I found were STM offerings. The STM32F030F4 with it's 48Mhz cortex M0 was quoted to me with better pricing than the attiny13. Even though it's a 3v part, it has enough 5v tolerant i/o to get the CIC job done. This is a nice option for a cartridge with 3v mapper logic. And the 48Mhz M0 certainly has enough power to have only a small portion of it's time consumed by CIC communications. I'm still kind of interested in tasking one of the STM32F0 family parts with both CIC and mapper tasks. But for the purpose of this conversation, it's not a viable solution due to too few 5v tolerant io.

That brings me to what I've deemed the 'winner' of the STM8S003F4 datasheet here
It's a rather basic 8 bit mcu with common features including:
  • 16Mhz & 128Khz internal RC oscillators
  • 8KByte flash, 1Kbyte SRAM, 128Byte eeprom
  • Nested vector interupt controller including external interrupts
  • 2x 16bit adv/gen purp counters, 1x 8bit basic counter
  • 16x GPIO
  • UART, SPI, I2C interfaces
This all comes at a significant price drop compared to the attiny13 in volume. The price alone is enough to motivate me to create my own CIC implementation using the STM8. But there's a considerable amount of extra hardware getting left unused if only tasked to be a CIC. If only being tasked as a CIC the easilest solution would probably be to clock the mcu externally by the 4Mhz CIC clock. Then implement the CIC in much the same method that the attiny13 did. But that's not much fun, and I think I can do better than that.

My goal is to run at 16Mhz and use one of the timers to interrupt the mcu every ~75usec to handle the next CIC transaction. Doing that means the CIC CLK signal is pretty useless. But one would have to take care to keep aligned with the CIC's clock. The interrupt would have to come early and maybe poll the Din pin when it's expected to be high to sense how far the mcu has drifted and correct it's internal timer. I'm expecting that worst case 15usec out of every 75usec will be utilized for CIC transfers. That's more time than Krikzz is utilizing with the attiny13, and we're running 4x faster. Maybe this solution could get to 5usec or less, either way it doesn't matter too much, it's still some portion of time the mcu MUST prioritize CIC transfers.

Now comes the question of if this extra hardware is going to be utilized by the NES somehow, the CPU has to have a means to interface with the mcu. This is not a simple feat with the expectation of being free. I argued to myself that all the hoops that would need to be jumped through would make disinterest one to the point where you'd want to simply invest a couple dollars on a mapper more capable than a discrete mapper.

Here's the pinout and port numbering with some preliminary assignments I've came up with:

Code: Select all

                          _________________  _______________
NES CPU D3              -| PD4/UART_CLK    \/           PD3 |-	CPU D2
NES CPU D4/UART TX      -| PD5/UART_TX                  PD2 |-	CPU D1
NES CPU D5/UART RX      -| PD6/UART_RX              ISP/PD1 |-	CPU D0
                        -| /RST                    MISO/PC7 |-	 SPI?
MAPPER REG BIT          -| PA1/OSCIN               MOSI/PC6 |-	 SPI?
CIC Din                 -| PA2/OSCOUT               SCK/PC5 |-	 SPI?
                        -| VSS                          PC4 |-	NES /IRQ
                        -| Vcap                         PC3 |-  CIC Dout?
                        -| VDD                      SCL/PB4 |-	NES CPU R/W
SPI? CICrst?            -| PA3/SPI_NSS              SDA/PB5 |-	NES A13
                          ---------------------------------
The simplest method I could come up with would be to designate an unused mapper register bit to have the NES signal/interupt the mcu that it wants it's attention to start communicating to it. However this mcu interrupt must have a lower priority than CIC comms. Assuming a '377 is being used for the mapper reg we probably have an unused bit, even a BNROM utilizing a '161 has a unused bit if the PRG-ROM is <= 256KByte.

When the mapper bit is set (presumably $8000.7) the mcu would be instructed to start listening to CPU writes when CPU A13 is high. This maps mcu's register to $6000-7FFF, but also maps/overlaps the PPU $2000-3FFF. This was the fewest number of pins I could come up with for decoding that seems reasonable. Moving the mcu reg bits to SPI's PC bits would give more bits for decoding and potentially decoding CPU A14 perhaps. But A13 seems sufficient as it blocks writes to RAM and the APU which seems helpful to me. The user would have to take care to not accidentally write to the PPU and mcu at the same time, but one should already be very deliberate when writing to the PPU.

My proposed pin assignments would allow for 4bit nibble wide read/writes at a minimum. If one wasn't looking to utilize the UART then the entirety of PORT D could be used for 6bit wide accesses.

There is a problem though as we can't be certain the mcu is always able to listen to writes to $6000. The mcu could be currently interrupted by CIC comms which must have a higher priority. I can't think of a very clean way to get around this without adding dedicated logic. Maybe the simplest idea is to have the NES set the mcu interrupt bit $8000.7, then the mcu waits for upcoming CIC comm to complete. Once done, it interupts the NES CPU which uses it's interrupt routine to complete the transfer. The NES would have the maximum time (~60usec) to complete the transfer. This is probably a preferred solution if the NES CPU is looking to make big transfers to the mcu. Maybe a big transfer would be verified by reading back a checksum.

Another idea might be to write to $6000, but require the value to be read back from the mcu before being certain it stuck. This would probably be a preferred solution for small transfers as we typically have >80% chance the mcu is listening.

You could maybe combine the two ideas to remove the need to use NES /IRQs for each transfer. Maybe the NES can simply read from the mcu at $6000 after setting the $8000.7 mcu interupt bit. And the mcu provides a designated value if there is sufficent time to write a nibble or two before the next CIC xfr.

Anyway, that's my idea and here's the place to toss out any other probably better ideas you guys might have. My primary goal for such an interface is that it's effectively free being able to be implemented with wires alone. It's not out of the question to add logic gates to implement the idea, but personally I'm not interested in doing so. Start adding a gate here or there and it's no longer free. I'm not even sure I have the pcb space currently to support routing the signals I've proposed. I'll probably have to re-route a large portion of my current design to make room for the CIC to be placed closer to the PRG side of the board.

As far as ideas of what could be done with utilization of something like this it's up to the imagination. The mcu probably isn't going to be fast enough to implement any sort of CHR effects like finer backswitching or anything. Even permitting selectable NT mirroring is sorta out of the question as you'd need to add more logic.

As mentioned in my other post, unfortunately this mcu doesn't have any external pins available to clock the internal counters. So you'd have to utilize the internal 16Mhz/128Khz oscillators for an IRQ timer. The SPI bus is open on my pinout proposal above, and things could be shifted around to make the I2C bus available instead. This potentially be connected to a large serial flash rom for lots of rom storage. But it's not going to be as fast as one might like due to the limitations put on transfers. One of the pins could be routed to EXP6 to implement some basic expansion sound perhaps. You could even get crazy and utilize the UART interface to connect a cheapo BT/WiFi module, but if you're interested in that doesn't make much sense to restrict your budget to a discrete mapper..

Anyway, my guess is chances are this idea won't go anywhere, but it's fun to talk about. At this point I can say I'm going to do everything I can to migrate to the STM8 for my NES/SNES CIC solutions for the benefit of my other designs. So from that point the hardware will be sitting idle waiting to be put to good use. :)
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
Dwedit
Posts: 4921
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by Dwedit »

Heh, once you have a 16MHz Thumb processor, why even bother making a NES game at all. You can run the game logic on the coprocessor instead.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by tepples »

Except in this case, the problem with going the Hayazashi Nidan Morita Shogi 2 route of putting game logic on an ARM coprocessor is that everything has to be timed to ensure the CIC key logic executes at the appropriate time.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Adding features to discrete mapper with multipurposed CI

Post by calima »

Does the CIC keep asking at runtime? Wikipedia makes it sound like it only checks at startup.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by tepples »

calima wrote:Does the CIC keep asking at runtime?
Put a licensed Game Pak in a stock NES-001 Control Deck, turn it on, and pull out the Game Pak. The game will freeze. If the power light also starts blinking, this means the lock chip requires the key chip to continue to communicate. I just tried it with Bionic Commando, and yes, it starts blinking. That's why you need to perform the pin 4 modification to use TapeDump on a front-loader.
calima wrote:Wikipedia makes it sound like it only checks at startup.
A charge pump-driven stunner may need to run only during the opening copyright screen, but a proper clone needs to communicate continuously.
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

The NES CIC exchanges another bit of the 2^160 period random number generator output approximately every second. The lock will immediately start the reboot loop on the NES if it fails.
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by rainwarrior »

tepples wrote:A charge pump-driven stunner may need to run only during the opening copyright screen, but a proper clone needs to communicate continuously.
Does the stunner approach actually halt/crash the internal CIC until the next reset or something?
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

lidnariq wrote:The NES CIC exchanges another bit of the 2^160 period random number generator output approximately every second. The lock will immediately start the reboot loop on the NES if it fails.
The lock and key exchange the next bit approximately every 75usec. If the key doesn't produce the proper value, the lock panics and stops paying attention to the key. Once it panics it sits in a loop resetting the console every other second.

The actual time between exchanged bits of the pseudorandom number actually depends on the precise current value of the currently calculated/"mangled" number. But the lock and key are operating through the "mangle" calculations in lock step. So each takes the same exact amount of time to perform the calculation of the next bit.

Been digging through segher's disassembly to get a better understanding of how the lock behaves exactly and understand what it's expecting from the key. Good news is from what I can tell the lock doesn't pay any attention to the key's output aside from the precise moment every ~75usec. So I'm thinking this could be abused to alleviate the problems with the CIC mcu needing to prioritize CIC comms over everything else. Appears one could simply output the next stream value early. Appears the key could just toggle the Dout line to whatever the upcoming value is instead of providing a precisely timed 750nsec pulse of the current bit as the original does.

That means this implementation can give the NES CPU priority so long as the NES isn't steaming data for longer than ~50usec.

Additionally I confirmed what I was seeing on the scope with the code. The lock sets its output high for 8 cycles after resetting the key. So I'm thinking this could be used to sense reset instead of the CIC reset pin. So with a little effort there can minimize down to the mcu only needing 2 io for CIC comms.
Dwedit wrote:Heh, once you have a 16MHz Thumb processor, why even bother making a NES game at all. You can run the game logic on the coprocessor instead.
The biggest problem for running game logic on a low cost ARM coprocessor is providing a enough rom for the mcu, and a adequate interface between the mcu and PPU.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
elseyf
Posts: 72
Joined: Sat Dec 01, 2012 4:10 am

Re: Adding features to discrete mapper with multipurposed CI

Post by elseyf »

This sounds interestening. Is the CIC Firmware going to be open source, and if so, is it supplied in a manner that it could be easily worked into a custom program running on the mcu?
I see this as a free means to do some basic DRM protection for a new homebrew game, the mcu could interface some additional extertal memory (namely spi Flash) which contains encrypted data, and would decode it when requested (in this case the mcu is kind of a blackbox). Best would be to encrypt program data and only supply to the NES the code which is currently needed, but this is then left to decide by the programmer. The interface could use the NES IRQ Signal to halt the NES from transferring data when the mcu is supposed to act as a CIC, this allow at least for a seamless interface.
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

I find discussions about DRM to be somewhere between "boring" and "annoying". Regardless, the biggest problem is that because necessarily no emulator already supports whatever scheme you'd want to add, and because the NES is so limited, any DRM you add stands a very good chance of breaking any existing timing-sensitive code (and you'll be surprised at how much NES code becomes timing sensitive...) so then you need to reimplement the DRM scheme in your own in-house debugging emulator fork.


Now, the part I actually find interesting:

I spending a little time thinking about this, and I believe a modern PIC with some Configurable Logic Cells (and maybe the Peripheral Pin Select) could actually subsume all of this, adding GNROM banking, mirroring control, and an IRQ. The down sides are a serial interface to programming the mapper, and not being able to write to Flash....

A CLC would let us detect writes (/ROMSEL OR R/W); the PPS lets us forward that to the SPI module clock input (or else use an external pin). The CPU is responsible for taking the received bytes and relaying them to the various pins (≈GNROM), configuring another CLC (mirroring control), and the timer (IRQ).

Meanwhile, another CLC and a timer clocked by the CIC clock lets us actually use the timer hardware to copy the already-calculated CIC output stream at exactly the right moment. The CPU's involvement is minimal, leaving more than enough time for ... whatever's left.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by tepples »

lidnariq wrote:I spending a little time thinking about this, and I believe a modern PIC with some Configurable Logic Cells (and maybe the Peripheral Pin Select) could actually subsume all of this, adding GNROM banking, mirroring control, and an IRQ. The down sides are a serial interface to programming the mapper
Could the serial interface be made compatible with a subset of the MMC1?
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

The PIC's serial ports (MSSP/EUSART) really want to deserialize writes 8 (or 9) bits at a time, so I think that counts as "possible but not trivially so".

All of the PICs with three or four CLCs only have four CLC pin inputs, and to emulate an MMC1 we really need to pay attention to five pins: A14, A13, D0, /ROMSEL, & R/W. But I think we could probably actually get that by cleverly using the timer 1 gate function. Don't get IRQs if it's an MMC1 subset, though, and there's no good reason to be compatible with the MMC1 if one does add IRQs.

It probably mostly depends on whether the hardware can manage all of the CIC timing without the CPU involvement, and the CPU "just" needs to calculate the next bit and next time.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

elseyf wrote: Is the CIC Firmware going to be open source, and if so, is it supplied in a manner that it could be easily worked into a custom program running on the mcu?
My goal would be to make it available for community use so people could utilize it for whatever they pleased such as your DRM idea if they wanted. So at a minimum I would provide the source code on request. Not sure what licensing option I will choose when the time comes to releasing something that's fully functional. But I'd like it share it with anyone that would put it to good use.

lidnariq wrote:I spending a little time thinking about this, and I believe a modern PIC with some Configurable Logic Cells (and maybe the Peripheral Pin Select) could actually subsume all of this, adding GNROM banking, mirroring control, and an IRQ.
The configurable logic is an interesting feature that could help out quite a bit. The biggest deterrent personally is that it isn't 'free' in comparison to the stm8 solution. Looks like the lowest cost part with a CLC, eeprom, [EDIT: and enough spare i/o] is the attiny814 (appears they're called CCL on avr cores). The attiny814 is rather generous with 12 i/o, and comparably priced to the attiny13. It only has one logic cell though, but that's enough to bit bang mapper writes with the benefit dedicated logic. The attiny814 does have a little speed boost with 20Mhz internal osc compared to stm8s003's 16Mhz. But if you're running core off of the 4Mhz CIC clock then that's irrelevant.

The cheapest offering with 4x CLC's looks to be the 16F15344. In 3k volume, microchip's claiming 63cent pricing which isn't bad considering one could potentially integrate all the discrete mapper logic chips into the PIC. But it's going to be hard/impossible to reach that volume without leveraging all of my CIC consumption into a single solution. Either way this is even further from free in my personal quest. But it would be a pretty decent solution for someone looking to make a custom dedicated solution.
It probably mostly depends on whether the hardware can manage all of the CIC timing without the CPU involvement, and the CPU "just" needs to calculate the next bit and next time.
Assuming my analysis of segher's disassembly holds true, my idea of simply outputting the next bit anytime during the 79usec between bit exchanges removes the CPU from time sensitive involvement.
infiniteneslives wrote: Additionally I confirmed what I was seeing on the scope with the code. The lock sets its output high for 8 cycles after resetting the key. So I'm thinking this could be used to sense reset instead of the CIC reset pin. So with a little effort there can minimize down to the mcu only needing 2 io for CIC comms.
So I no longer see this as viable. I see in segher's disassembly (@ $007 below) where the lock appears to be setting PORT0.0 (Lock Dout, Key Din) shortly after reset, but before initializing the data stream. I was operating off of corrupt brain memory when I thought I had seen this in my logic analyzer captures. Lock Dout never goes high during this time, the first time it goes high after reset is during the stream ID nibble exchange.

Code: Select all

			;; LOCK START
05f: 75      lbmi 1	; H := 1

			; forever {
06f: 55      in		;	A := P0
077: 66      ska 2	;	if A.2 = 0
07b: f3      t 073	;		last
07d: 21      lbli 1
03e: 31      ldi 1
01f: 70      ad
04f: 43      xd		;	[1:1]++
067: ef      t 06f	; }

073: 30      ldi 0
079: 20      lbli 0
03c: 4a      s		; [1:0] := 0
05e: 32      ldi 2
02f: 21      lbli 1
057: 46      out	; P1 := 2	// reset host and key
06b: 00      nop
075: 30      ldi 0
03a: 46      out	; P1 := 0	// run key
01d: 20      lbli 0	; L := 0
00e: 31      ldi 1
007: 46      out	; P0 = 1	// *** SHOULD BE SETTING LOCK Dout high (Key Din)
043: 3e      ldi e	; A := e

			; while A <> 0 {
061: 01      adi 1	;	A++
030: e1      t 061	; }
058: 00      nop
06c: ae      t 02e	; goto 02e

			;; KEY START
			; (L = 0, A = P0)
076: 65      ska 1	; if A.1 = 0	// if not test mode
03b: 8c      t 00c	;	goto 00c
05d: 00      nop

			;; INIT LOCK, OR KEY IN TEST MODE
			; (L = 0)
02e: 47      out0	; P0 := 0            //  *** Here it appears to clear PORT.0 which looks to be set @007
017: 7d 00   tml 200	; call 200	// init stream
065: 7d a0   tml 320	; call 320	// magic
019: d1      t 051	; goto 051
What I had seen in the analyzer captures was Din and Dout being held high prior to the reset signal. I had assumed that the Lock was setting it's Dout prior to resetting the key, but it's actually the key setting both Din and Dout prior to being reset as it ends up in panic/die on initial boot prior to being reset by the lock. I removed the key from the circuit completely and confirmed the Lock was holding Din & Dout low prior to resetting the Key. So it appears the CIC's Din/Dout pins are unidirectional.

Does anyone know or have a good guess as to what type of drivers are on these PORT pins? I had assumed they were unidirectional push-pull. But this apparently isn't the case. The lock and key both clear their PORT0.1 "data input" pins during normal transactions. And when they set their PORT0.0 pin, the logic 1 beats out the other chip's logic 0. So my guess is outputting a logic 0 is effectively a pull-down, and outputting logic 1 is pushed sourcing current to override the other chip's logic 0 pull-down.

This makes sense when considering PORT0.2 for the lock's seed capacitor. The lock will set the seed pin whenever it sets it's data out pin during transactions, and then read the seed pin if it ends up getting reset. So I guess it's aiding in getting a random seed on a warm reset utilizing bidirectionality on the PORT0.2 pin.

So CIC reset appears to be a vital input. I thought other ways to get around requiring the CIC reset signal, but there's no way to log data in and determine proper timing in adequate time as the key must output it's interpretation of the stream ID being even/odd as the first bit transaction post stream ID transfer. You could figure it out in time if there were always at least two bits set in the stream ID, but that obviously won't work.. [EDIT: too bad the CIC reset signal is active high otherwise could just route it to NRESET on stm8, guess one could still do this to save an i/o but you'd need to add an inverter.]

Anyway, I've got a firm enough grasp on the mangle algorithm and timing of everything so I'm going to start working on my own implementation targeting the stm8s003. I'll start by taking the easy way out and clocking the mcu core with the 4Mhz CIC clock and ensure timing by cycle counting.

One thing I didn't realize until working through segher's disassembly and looking over my analyzer captures is that the CIC cpu core is actually running at 1Mhz. Appears to be 4 clock cycles per machine cycle. I haven't seen this mentioned anywhere else in all my research. Although I should have gathered this when peeking at Krikzz's solution, but I didn't have a good enough grasp on all the timing at that time. My discussion below uses usec and cycles interchangeably.


The Skinny on CIC transactions and timing
I'll share my little breakdown of the timing and transactions for anyone curious. thefox's tengen translated to C is the best high level reference, but it took awhile looking at segher's disassembly before I could fully understand what's happening and wrap my head around the timing of everything.

Negative time: Lock determines 4bit seed value "steam ID" 0-15 prior to resetting key
Time 0 usec: Lock resets Key
Time 33 usec: Lock outputs bit3 of stream ID
Time 48 usec: Lock outputs bit0 of stream ID
Time 63 usec: Lock outputs bit1 of stream ID
Time 78 usec: Lock outputs bit2 of stream ID
-This 4bit stream ID becomes the first nibble of the key's table nibble 1.

Time 201usec:
main loop starts: first task is to transfer 1-15 bits of current table's least significant bits, final task is to perform mangle on tables, repeat..

First transaction is always transfers all 15bits, subsequent transactions are 1-15 bits and determined by nibble 7's value on re-entry of main loop.
"effective main loop time 0": lock and key transfer their table's least significant bit of nibble 1.
-Check what the other chip sent and confirm it matched expected value

Every 79 cycles the next LSbit of ram is transferred and checked, if doesn't match expected value die/panic.

Once all bits are transferred perform mangle of both the chip's own ram table, and the table it's calculating for the other chip to know what to expect.
-Number of mangles to be performed on the table is based on nibble 15 on entry of mangle calculation plus 1. So each table will get mangled 1-15 times.
-The time it takes to perform a table mangle calculation is either 78 or 84 cycles/usec depending on if the sum of nibble 2 + nibble 3 + 1 is > 16 or not on entry of mangle calculation.
-Each table also takes another 29 cycles/usec to process separate from the mangle calc loop. 2 tables = 58 cycles/usec.

The mangle calculation time varies based on ram values on entry. The theoretic minimum would be if both tables only performed one mangle calc @ 78usec + 29usec for table = 107usec * 2 tables = 214 usec. I would estimate the average mangle calculation time to be ~80usec * 15 + 58 = ~1250usec.

Now that the new value of each table is calculated the main loop restarts.


My thoughts out loud:
What follows is a bit of me thinking out loud on this whole idea of multitasking the CIC. So at least my idea is publicly documented so someone else can seek this out further if they'd like to in the event I don't end up doing so myself.. So in a solution where the CIC is being multitasked, and we're allowed to output the next bit early instead of a precise 3usec pulse, this is my general plan after initialization and first transaction is complete:

1) Set a timer for mangle calculation time. We can quickly determine how many mangles will be performed by looking at nibble 15 of each table. We just don't know how long each mangle will take without performing each iteration of calculations. Each mangle will be 78 or 84 usec, so let's assume all calcs take the min 78usec and set our timer based on that, plus the 2x 29cycle table time.

2) Perform the mangle calculations on both tables tallying up each calc that was 84 cycles instead of 78. Once all mangles are done, increment the timer by 6cycles times our tally. Now we'll get an interrupt from the timer when it's the right time to start transacting data.

3) We can now work ahead a little to try and make the transactions easier/faster. Perhaps it would be helpful to transfer the upcoming data transaction into a temporary condensed location separate from the lock/key tables' LSbits. This would also allow us to perform the mangle calculations for the next iteration early. This improves upon my thought above as we could figure out the mangle time and perform all the mangle calculations one iteration in advance.

4) Perform next transaction getting interrupt every 79cycles to output the next bit in the stream. Once transaction is complete load counter with predetermined mangle calculation time which we've already calculated in advance from the step prior. Go back to step 2 and repeat forever...

The above is time referenced for the time that the lock will latch the key's bit. However we want to give ourselves some slack time. Since we can output the next bit early, it's better for to reference our time 'zero' to 79usec *before* the lock latches the key's bit. So we need a 79usec timer to interrupt us just *after* the lock has retrieved it's data. So long as the timer automatically reloads and is setup to interrupt us again exactly 79 cycles later, we don't need this "CIC transaction timer" interrupt to have the highest priority. The idea is for the timer to simply be keeping time for us. The NES CPU can have higher interrupt priority than the CIC, so long as the NES CPU doesn't take more than ~77usec of our time in one burst.

Well that's enough rambling for now... Time to start writing some code!
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

infiniteneslives wrote:Does anyone know or have a good guess as to what type of drivers are on these PORT pins? I had assumed they were unidirectional push-pull. But this apparently isn't the case. The lock and key both clear their PORT0.1 "data input" pins during normal transactions. And when they set their PORT0.0 pin, the logic 1 beats out the other chip's logic 0. So my guess is outputting a logic 0 is effectively a pull-down, and outputting logic 1 is pushed sourcing current to override the other chip's logic 0 pull-down.
I could have sworn that the lock/key were properly crossed over?

I just re-tested with a continuity meter and it certainly seems to be the case that pins 1/2 connect to pins 2/1 ...
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

lidnariq wrote:
infiniteneslives wrote:Does anyone know or have a good guess as to what type of drivers are on these PORT pins? I had assumed they were unidirectional push-pull. But this apparently isn't the case. The lock and key both clear their PORT0.1 "data input" pins during normal transactions. And when they set their PORT0.0 pin, the logic 1 beats out the other chip's logic 0. So my guess is outputting a logic 0 is effectively a pull-down, and outputting logic 1 is pushed sourcing current to override the other chip's logic 0 pull-down.
I could have sworn that the lock/key were properly crossed over?

I just re-tested with a continuity meter and it certainly seems to be the case that pins 1/2 connect to pins 2/1 ...
Yes of course, I wasn't questioning that. Sorry not sure what you think I meant, or where I'm not being clear.

My point was that the CIC chip can drive an output logic 1 on both PORT0.0 (data out pin 1) & PORT0.1 (data in pin 2). And it can read them as inputs as well. So it seems all (or ones of concern anyway) pins are bidirectional. But there's no sort of direction register as you would find on a modern mcu like a DDR port on an AVR. The CIC can simply always read the input and set the output. Each CIC writes a logic 0 to its data input (pin2 PORT0.0), in the same instruction that it's setting logic 0/1 to its data output (PORT0.1 pin1).

Since the data in/out are crossed over as you pointed out, it would appear the CIC always outputs a logic 0 on its input pin, and the other CIC is free to "also drive" that same line to a logic 1 without causing conflicts. Since this doesn't create an issue I'm left to conclude that a logic 0 doesn't actually sink current, and instead is just a pulldown. Presumably the other chip sources current when driving a logic 1 which overcomes the other chips weak pulldown. So kinda resembles opposite of open drain, so open source I guess? Like I2C but inverted..?

It just doesn't seem familiar to me, it doesn't appear to be a traditional RTL, TTL, or CMOS driver. Maybe I'm missing something though. [EDIT: unless RTL with a pulldown instead of a pull-up resistor was common place?]

My primary curiousity is some confusion on my part about the I/O drivers might explain why the lock isn't driving its data output pin high as it would seem to be trying to do in the code as I pointed out shortly after it resets the key. It clearly has no problems driving a logic 1 on that same pin only a few cycles later when transmitting the stream ID. But I'm also not convinced that segher's disassembly is 100% accurate, it's not like we have a legit data sheet for the sharp mcu.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
Post Reply