elseyf wrote:
Is the CIC Firmware going to be open source, and if so, is it supplied in a manner that it could be easily worked into a custom program running on the mcu?
My goal would be to make it available for community use so people could utilize it for whatever they pleased such as your DRM idea if they wanted. So at a minimum I would provide the source code on request. Not sure what licensing option I will choose when the time comes to releasing something that's fully functional. But I'd like it share it with anyone that would put it to good use.
lidnariq wrote:
I spending a little time thinking about this, and I believe a modern PIC with some Configurable Logic Cells (and maybe the Peripheral Pin Select) could actually subsume all of this, adding GNROM banking, mirroring control, and an IRQ.
The configurable logic is an interesting feature that could help out quite a bit. The biggest deterrent personally is that it isn't 'free' in comparison to the stm8 solution. Looks like the lowest cost part with a CLC, eeprom, [EDIT: and enough spare i/o] is the attiny814 (appears they're called CCL on avr cores). The attiny814 is rather generous with 12 i/o, and comparably priced to the attiny13. It only has one logic cell though, but that's enough to bit bang mapper writes with the benefit dedicated logic. The attiny814 does have a little speed boost with 20Mhz internal osc compared to stm8s003's 16Mhz. But if you're running core off of the 4Mhz CIC clock then that's irrelevant.
The cheapest offering with 4x CLC's looks to be the 16F15344. In 3k volume, microchip's claiming 63cent pricing which isn't bad considering one could potentially integrate all the discrete mapper logic chips into the PIC. But it's going to be hard/impossible to reach that volume without leveraging all of my CIC consumption into a single solution. Either way this is even further from free in my personal quest. But it would be a pretty decent solution for someone looking to make a custom dedicated solution.
Quote:
It probably mostly depends on whether the hardware can manage all of the CIC timing without the CPU involvement, and the CPU "just" needs to calculate the next bit and next time.
Assuming my analysis of segher's disassembly holds true, my idea of simply outputting the next bit anytime during the 79usec between bit exchanges removes the CPU from time sensitive involvement.
infiniteneslives wrote:
Additionally I confirmed what I was seeing on the scope with the code. The lock sets its output high for 8 cycles after resetting the key. So I'm thinking this could be used to sense reset instead of the CIC reset pin. So with a little effort there can minimize down to the mcu only needing 2 io for CIC comms.
So I no longer see this as viable. I see in segher's disassembly (@ $007 below) where the lock appears to be setting PORT0.0 (Lock Dout, Key Din) shortly after reset, but before initializing the data stream. I was operating off of corrupt brain memory when I thought I had seen this in my logic analyzer captures. Lock Dout never goes high during this time, the first time it goes high after reset is during the stream ID nibble exchange.
Code:
;; LOCK START
05f: 75 lbmi 1 ; H := 1
; forever {
06f: 55 in ; A := P0
077: 66 ska 2 ; if A.2 = 0
07b: f3 t 073 ; last
07d: 21 lbli 1
03e: 31 ldi 1
01f: 70 ad
04f: 43 xd ; [1:1]++
067: ef t 06f ; }
073: 30 ldi 0
079: 20 lbli 0
03c: 4a s ; [1:0] := 0
05e: 32 ldi 2
02f: 21 lbli 1
057: 46 out ; P1 := 2 // reset host and key
06b: 00 nop
075: 30 ldi 0
03a: 46 out ; P1 := 0 // run key
01d: 20 lbli 0 ; L := 0
00e: 31 ldi 1
007: 46 out ; P0 = 1 // *** SHOULD BE SETTING LOCK Dout high (Key Din)
043: 3e ldi e ; A := e
; while A <> 0 {
061: 01 adi 1 ; A++
030: e1 t 061 ; }
058: 00 nop
06c: ae t 02e ; goto 02e
;; KEY START
; (L = 0, A = P0)
076: 65 ska 1 ; if A.1 = 0 // if not test mode
03b: 8c t 00c ; goto 00c
05d: 00 nop
;; INIT LOCK, OR KEY IN TEST MODE
; (L = 0)
02e: 47 out0 ; P0 := 0 // *** Here it appears to clear PORT.0 which looks to be set @007
017: 7d 00 tml 200 ; call 200 // init stream
065: 7d a0 tml 320 ; call 320 // magic
019: d1 t 051 ; goto 051
What I had seen in the analyzer captures was Din and Dout being held high prior to the reset signal. I had assumed that the Lock was setting it's Dout prior to resetting the key, but it's actually the key setting both Din and Dout prior to being reset as it ends up in panic/die on initial boot prior to being reset by the lock. I removed the key from the circuit completely and confirmed the Lock was holding Din & Dout low prior to resetting the Key. So it appears the CIC's Din/Dout pins are unidirectional.
Does anyone know or have a good guess as to what type of drivers are on these PORT pins? I had assumed they were unidirectional push-pull. But this apparently isn't the case. The lock and key both clear their PORT0.1 "data input" pins during normal transactions. And when they set their PORT0.0 pin, the logic 1 beats out the other chip's logic 0. So my guess is outputting a logic 0 is effectively a pull-down, and outputting logic 1 is pushed sourcing current to override the other chip's logic 0 pull-down.
This makes sense when considering PORT0.2 for the lock's seed capacitor. The lock will set the seed pin whenever it sets it's data out pin during transactions, and then read the seed pin if it ends up getting reset. So I guess it's aiding in getting a random seed on a warm reset utilizing bidirectionality on the PORT0.2 pin.
So CIC reset appears to be a vital input. I thought other ways to get around requiring the CIC reset signal, but there's no way to log data in and determine proper timing in adequate time as the key must output it's interpretation of the stream ID being even/odd as the first bit transaction post stream ID transfer. You could figure it out in time if there were always at least two bits set in the stream ID, but that obviously won't work.. [EDIT: too bad the CIC reset signal is active high otherwise could just route it to NRESET on stm8, guess one could still do this to save an i/o but you'd need to add an inverter.]
Anyway, I've got a firm enough grasp on the mangle algorithm and timing of everything so I'm going to start working on my own implementation targeting the stm8s003. I'll start by taking the easy way out and clocking the mcu core with the 4Mhz CIC clock and ensure timing by cycle counting.
One thing I didn't realize until working through segher's disassembly and looking over my analyzer captures is that the CIC cpu core is actually running at 1Mhz. Appears to be 4 clock cycles per machine cycle. I haven't seen this mentioned anywhere else in all my research. Although I should have gathered this when peeking at Krikzz's solution, but I didn't have a good enough grasp on all the timing at that time. My discussion below uses usec and cycles interchangeably.
The Skinny on CIC transactions and timingI'll share my little breakdown of the timing and transactions for anyone curious. thefox's tengen translated to C is the best high level reference, but it took awhile looking at segher's disassembly before I could fully understand what's happening and wrap my head around the timing of everything.
Negative time: Lock determines 4bit seed value "steam ID" 0-15 prior to resetting key
Time 0 usec: Lock resets Key
Time 33 usec: Lock outputs bit3 of stream ID
Time 48 usec: Lock outputs bit0 of stream ID
Time 63 usec: Lock outputs bit1 of stream ID
Time 78 usec: Lock outputs bit2 of stream ID
-This 4bit stream ID becomes the first nibble of the key's table nibble 1.
Time 201usec:
main loop starts: first task is to transfer 1-15 bits of current table's least significant bits, final task is to perform mangle on tables, repeat..
First transaction is always transfers all 15bits, subsequent transactions are 1-15 bits and determined by nibble 7's value on re-entry of main loop.
"effective main loop time 0": lock and key transfer their table's least significant bit of nibble 1.
-Check what the other chip sent and confirm it matched expected value
Every 79 cycles the next LSbit of ram is transferred and checked, if doesn't match expected value die/panic.
Once all bits are transferred perform mangle of both the chip's own ram table, and the table it's calculating for the other chip to know what to expect.
-Number of mangles to be performed on the table is based on nibble 15 on entry of mangle calculation plus 1. So each table will get mangled 1-15 times.
-The time it takes to perform a table mangle calculation is either 78 or 84 cycles/usec depending on if the sum of nibble 2 + nibble 3 + 1 is > 16 or not on entry of mangle calculation.
-Each table also takes another 29 cycles/usec to process separate from the mangle calc loop. 2 tables = 58 cycles/usec.
The mangle calculation time varies based on ram values on entry. The theoretic minimum would be if both tables only performed one mangle calc @ 78usec + 29usec for table = 107usec * 2 tables = 214 usec. I would estimate the average mangle calculation time to be ~80usec * 15 + 58 = ~1250usec.
Now that the new value of each table is calculated the main loop restarts.
My thoughts out loud:What follows is a bit of me thinking out loud on this whole idea of multitasking the CIC. So at least my idea is publicly documented so someone else can seek this out further if they'd like to in the event I don't end up doing so myself.. So in a solution where the CIC is being multitasked, and we're allowed to output the next bit early instead of a precise 3usec pulse, this is my general plan after initialization and first transaction is complete:
1) Set a timer for mangle calculation time. We can quickly determine how many mangles will be performed by looking at nibble 15 of each table. We just don't know how long each mangle will take without performing each iteration of calculations. Each mangle will be 78 or 84 usec, so let's assume all calcs take the min 78usec and set our timer based on that, plus the 2x 29cycle table time.
2) Perform the mangle calculations on both tables tallying up each calc that was 84 cycles instead of 78. Once all mangles are done, increment the timer by 6cycles times our tally. Now we'll get an interrupt from the timer when it's the right time to start transacting data.
3) We can now work ahead a little to try and make the transactions easier/faster. Perhaps it would be helpful to transfer the upcoming data transaction into a temporary condensed location separate from the lock/key tables' LSbits. This would also allow us to perform the mangle calculations for the next iteration early. This improves upon my thought above as we could figure out the mangle time and perform all the mangle calculations one iteration in advance.
4) Perform next transaction getting interrupt every 79cycles to output the next bit in the stream. Once transaction is complete load counter with predetermined mangle calculation time which we've already calculated in advance from the step prior. Go back to step 2 and repeat forever...
The above is time referenced for the time that the lock will latch the key's bit. However we want to give ourselves some slack time. Since we can output the next bit early, it's better for to reference our time 'zero' to 79usec *before* the lock latches the key's bit. So we need a 79usec timer to interrupt us just *after* the lock has retrieved it's data. So long as the timer automatically reloads and is setup to interrupt us again exactly 79 cycles later, we don't need this "CIC transaction timer" interrupt to have the highest priority. The idea is for the timer to simply be keeping time for us. The NES CPU can have higher interrupt priority than the CIC, so long as the NES CPU doesn't take more than ~77usec of our time in one burst.
Well that's enough rambling for now... Time to start writing some code!