So been digging into the code and started implementing things with the stm8.
It probably mostly depends on whether the hardware can manage all of the CIC timing without the CPU involvement, and the CPU "just" needs to calculate the next bit and next time.
Assuming my analysis of segher's disassembly holds true, my idea of simply outputting the next bit anytime during the 79usec between bit exchanges removes the CPU from time sensitive involvement.
I've found this plan to output data early won't work. Even though in segher's disassembly, the lock only verifies the Key's Data output once every 79usec, my console is checking it twice. I've shifted the key's output around to determine the thresholds. The Lock appears to check that the Key's data out is low 6cyc/usec prior to setting it's output (Lock Data in). This agrees with the tengen disassembly where data input is checked low 5/6cyc before setting Data out; if data in is high it dies/panics.
Segher doesn't seem to mention which of the two NTSC CIC's his disassembly is for perhaps it's a 6113, and the one in my console is probably a 3193. I haven't bothered to take a screwdriver to my console to confirm. One of the few posts
that kevtris didn't delete from the RE thread reports that the only combo of 6113/3193 as lock/key that doesn't work is when 6113 is used as the lock, and the 3193 is used as the key. Perhaps this extra Data input check requiring data to be low has something do to with it.
Beyond that, since segher's disassembly is now fairly safe to assumed to not be the code running on my consoles; that may explain why I don't see the Lock setting Data out for ~9cyc immediately after reset as shown in segher's disassembly.
I was able to determine that there is a roughly 6.7usec window of when the key must transition from low to high when outputting a logic 1. Using the rising edge of Lock Dout as a reference:
-Key Dout must be low 4.9usec beforehand.
-Key Dout must be high 1.8usec afterwards for a logic 1 bit transfer.
-After the transfer, the key is able to leave it's output high all the way up until ~5.5usec before the next bit transfer.
So that's not necessarily terrible news, as a 6.7usec time allowance equates to 26 mcu cycles when running at 4Mhz, and 107 cycles when running at 16Mhz. So there's some hope for servicing the CIC LOCK, and NES CPU both during a single 6usec window of time esp when running at higher frequencies.
While working on my stm8 implementation, I realized there is a rather vital feature that is necessary from a mcu being dual tasked by the CIC and NES CPU. It's a feature that's not available on most AVR's, I'm not sure about PICs as I'm not as familiar with them. If the CIC 4Mhz clock is to be used as a source for the mcu core, the mcu needs to have the ability to switch to it's internal oscillator in application. Because one can't count on there always being a 4Mhz cic clock on toploader and clone consoles. On most AVR's the clock is selected via fuse bits which can't be modified by the application code (an external programmer is needed). So in that situation you'd have no choice but to only run the mcu core off the internal oscillator to keep NES CPU services functional. Perhaps one method around this would be to use the 4Mhz CIC clock to feed a counter input instead.
The stm8 is rather flexible with it's clock source selection and will even switch itself back to the internal oscillator if the external clock source is to fail. It always starts up on the internal osc, and all clock selection modes are available to be programmed in application.
One thing I'm finding to be a bit of an annoyance with the stm8 is cycle counting the instructions with it's 3 stage pipeline isn't very straight forward. Extra cycles get added beyond the execution cycle count when the instruction prefetch buffer needs to be flushed, etc. It also seems alignment of my instructions also has an effect presumably due to the 32Byte prefetch buffer size. I've found the most practical means to ensure proper cycle counting is to verify my 'estimates' with actual measurements with the logic analzer. Then tweak as necessary and ensure that all conditional variances are verified as well before moving on to next operation. So that's a bit annoying but seems to be stable thus far after lots of tweaking and verification.
So far I've captured the stream ID, and output the first 15 transfer bits successfully. Now the challenge of properly timing the mangle calculation. If the cycle counting gets to be too much of a headache I might just switch over to running only on the internal 16Mhz oscillator and using timer counters and interrupts to handle all the timing as I envisioned for a dual tasked version. Shouldn't be too hard to correct for drift by sampling the Lock's output once per transaction.