Have something of an update on this project... Perhaps I'm getting a little too deep for most people's reading interests. But my previous posts like this were rather helpful for my own idea development and later reference. I'll go ahead and give the "Way Too Long; Not Going To Read" version first and if you're up for some light reading you can continue...
WTL;NGTR:
Recently got SNES CIC implemented on STM8, but had issues with stability due to mcu clock source. That helped motivate me to start a more in depth planning of an async STM8 CIC which the NES CICOp project also requires. I ramble about multiplication of large numbers and my plan to keep timing calibrated. Discover that the targeted stm8s003f3 does indeed have GPIO available for clocking internal timer 1 "TIM1". This discovery opens up viability and/or additional features for the NES CICOp project I previously didn't thing possible such as legit PPU scanline counting.
SNES STM8 implementation problems:
I recently got my SNES CIC implementation running with the STM8. The first board/chip I used for testing works great. I've let it run for hours and it would run strong over night. While attempting to prototype a new design I hacked a STM8 onto a breakout board and glued it onto the backside of an old SNES flash board I had sitting around and used wires to connect all the pins. Unfortunately that setup was very flakey, and the CIC would drop out after ~1-30sec.
I tinkered around a bit, trying to determine the cause. Added extra capacitors to the breakout board as it was only powered from a pair of small wires, but that didn't help. I was a little skeptical of supply noise anyway considering the core is internally regulated to 1.8v with it's own external cap. I moved the CIC clock supply wire around from back side to front side of the board where it was more exposed, and that seemed to make the issue worse. I set my logic analyzer up to watch the CIC signals and debug pin when it dropped out. Found that the STM8 appeared to be resetting some times mid-stream. Other times it was making errors during the mangle calc, too many/few mangles, etc. Got in with the debugger to read the reset cause and found that the times it reset appeared to be due to illegal opcode execution. So seems that the CPU was faulting mis-reading instruction data. Depending on how it was mis-read it would result in a valid opcode that caused erroneous mangle calc, or an invalid opcode causing a the STM8 to reset. Bummer...
I later tried another board where the STM8 was closer to the ideal setup with it being well powered and as close to the connector as possible without all the lengthy wires of the previous setup. This improved matters, but would still fallout after a few hours of play. The setup was very similar to my first which has never fallen out. So perhaps some chips are more sensitive than others, I've only sampled 3 so far but with ~50% having problems I definitely need a solution.
The CIC clock is relatively clean looking at the
oscope shot, and the STM8 datasheet doesn't give much for external clock specifications. Calls for "about 50% duty cycle" I measured 53% pretty close.. The datasheet goes so far as to say square, trigangle, and sine wave signals are acceptable clocks. So while the rise/fall times of 14/21nsec are pretty slow, they're a far cry from sine/triangle rise fall times..
I first tried buffering the clock through a single NOR gate I had sitting around (
scope shot). That seemed to fix everything. I haven't ran it over night yet, but the second hack of a board with the breakout board ran for hours with no problems when it wouldn't even run for 1min previously. The NOR gate tightened up the rise/fall times to ~2.8nsec, and also introduced some ringing. The clock is inverted due to the NOR function, so the duty cycle became 56% which makes sense considering the virgin clock has a slower fall time.
Curious what would happen if I slowed the clock edges I tried adding a
20pF and separately a
220pF cap between the clock an ground. That only exacerbated the issue, the 3rd board which typically lasted a few hours only lasted ~min with the 220pF cap.
So I'm still not 100% sure what's going on here, ST doesn't give much of a spec for the external clock and I'm only running at 3.1Mhz which is at the low end of the 0-16Mhz spec. I never had this issue when working on the NES, and I had some pretty godawful wiring setups with 5-6inch wires going from the cart to the dev board in the beginning. Still need to do some more testing, but adding a logic gate as a clock buffer seems to be the best fix at the moment.
Asynchronous CIC implementation planning:
All that brought me back around to my idea of having an asynchronous CIC implementation that doesn't have the cart's mcu CPU core run off the 3-4Mhz CIC clock signal. One potential fix to the problem above is to cut the clock out of the equation completely! Certainly not an easy feat, but the motivation from the "NES CICOp" I figured may as well give it my best shot.
Looking at the numbers, an async SNES CIC is going to be quite a bit more challenging than NES due to the ~75% slower clock, and 3x as many mangle calcs. So the STM8 needs to be much more accurate with it's timing to meet the same ~3usec output window because it's counting "in the dark" for about 4 times as long compared to the NES CIC. So if it can be pulled off with the SNES, then NES shouldn't be a problem at all.
I took a closer look at how the STM8 timers work, and thankfully the prescalers are able to be changed on the fly. So targetting the simplest 8bit counter TIM4 looks hopeful. I can set the prescaler to it's max and divide by 128, which gets a max count of 2.048msec with 16Mhz HSI clock. The max theoretical time between bit transfers on the SNES is ~10.5msec, so software will only have to count 5 TIM4 rollovers at most, and at the last rollover, the prescaler can be tuned down to divide by 1 for fine tuning just prior to bit transferring. This allows long time periods to be measured with high precision (no jitter), but the accuracy due to timing difference between the STM8 HSI and CIC clock must be well calibrated to get the accuracy along with the precision we need.
I determined the calibration needs to allow for 0.01% tuning steps which equates to 1usec steps for the 10.5msec max theoretical SNES mangle time. NES only has a max theoretical mangle time of 2.7msec, so 1usec steps would only require 0.037% tuning steps. In binary, 1/128K gives us 0.0076% trim steps which should be more than adequate.
For a max tune step, the STM8 HSI is spec'd to be 1% accurate with factory tuning at 25C, and 5% across the temp range. If we go up to binary 1/32 step that gives a max tune of +/- 6.2% which should be enough. That means we need a 13bit calibration factor for +/- 6.2% range with 0.0076% step size. Could add a few more bits to round off to 15-16bits but it's probably overkill..
The delay count requires 14bits to measure up to 10.5msec in 1usec step size. But having a few extra bits for fractions of 1usec will be beneficial to keep us from adding jitter between timing events. The NTSC SNES CIC machine cycle is 1.3usec after all, so that fraction becomes a pain as rounding errors add up over time. Adding 4 more bits for fractions of 1usec allows us to get down to the smallest step size of the 16Mhz counter.
So in total there's 18bits of delay count to be multiplied by a 13bit calibration factor to determine a delay offset. The STM8 thankfully has a 8bit hardware multiplier. 18b * 13b factors produce a 31bit product. With an 8bit multipler that equates to 6 multiply operations, and ~7 summations to get the final product, the result gets truncated down to a 15bit offset which then gets signed depending on pos/neg calibration factor. That signed offset then gets added to the desired delay for the final timer count value.
My plan is to then use TIM4 in coarse count mode (8usec steps) until 8-16usec of the delay remain. For the final fine delay TIM4 will get switched to fine mode (62.5nsec steps). At the end of that delay the next bit will be output to the LOCK. While that 8-16usec fine count is occuring, a fixed ~8usec time delay will get pre-loaded into TIM4 for the end of bit transfer data clearing and calibration routine. At the end of that routine TIM4 will be setup to start counting down to the next bit transfer.
Since only the rising edge of the bit transfer is timing sensitive, the STM8 can use the falling edge of the LOCK's output bit (assuming it's expected to be a 1) as a timing adjust/cal point. TIM4 will be counting up since the expected rising edge, an interrupt can be enabled for the falling edge of LOCK's data. That GPIO isr will then read TIM4 value and compare it to the expected ~4usec pulse width. If it's beyond a tolerance I'm thinking that simply adding/subtracting ~1bit from the calibration factor will account for drift. Everything has to be pretty close to correct timing if we're still alive, so only minor adjustments should be needed to correct for rounding errors and slow drifts in HSI/CIC frequency.
Learning more about STM8 interrupts:
Getting a little deeper into the STM8 I've realized there's a decent way to remove the 1-5cycle jitter from when an interrupt routine starts executing by using "wait for interrupt" opcode which pre-stacks the processor status, and freezes the CPU until an interrupt occurs. With that there's only 1-2 cycle jitter due to timing edge of interrupt and execution of isr instructions. So I'm planning to make use of that.
Additionally I'm realizing an async SNES CIC is even more of a pain as the PAL CIC runs at 3.57Mhz compared to 3.08Mhz NTSC CIC. So it's 1.12usec per PAL CIC machine cycle, and 1.3usec per NTSC CIC machine cycle. So while the machine cycle count is identical between PAL/NTSC SNES CIC, the actual time differs due to operating frequency. So all the timing delays would have to be adjusted to have a multiregion SNES CIC with an asynchronous implementation.
QUESTION on NES CIC clocking in other regions:
I don't think that's the case for NES though. My PAL-A "Mattel" NES is running it's CIC at 4Mhz just like NTSC. I don't have a PAL-B, Comboy, nor other Asian/Aussie NES variants. I only have a PAL-B CIC, and Comboy CIC yanked from cartridges which I place in my CIC socketed NTSC NES for testing. Since PAL-A is 4Mhz like NTSC, I'm hopeful all others are as well. If any one has more info on that I'd appreciate it! Even just having confirmation that PAL-B console runs it's CIC at 4Mhz clock frequency would be good to know.
Discovering STM8's TIM1 has external clock pins available:
So aside from the struggles with my SNES implementation and the motivation it helped provide to making progress on an async solution, I've became more familiar with some of the STM8's details. Namely I'm better understanding how the timers work, and good news is I misunderstood TIM1's abilities previously. I was rather disappointed when I thought that there were no external clock sources (pins) available to clock any of the timers. My understanding was that "ETR" pins the ones that could be used to clock counters. And with the 20pin package the TIM1_ETR pin is unfortunately not pinned out. While I was right about the ETR pin, TIM1 is able to use any of the 4 input pins as a clock source to the counter as well. TIM2 (the other 16bit counter on chip) however does not have this ability. Both TIM2 and TIM4 must be clocked from fMASTER which we need to be running on HSI 16Mhz to allow for multitasking the CICOp.
Learning this, I'm planning to have my SNES implementation use CIC CLK to allow TIM1 to count CIC cycles exactly. So TIM1 will be synchronous with the LOCK's CIC CLK, but the STM8 core itself won't be. I presume that'll be enough to get around issues I had with STM8 core stability when using CIC CLK as a external CPU core clock source. This also resolves the annoyance of PAL & NTSC SNES CIC's running at different frequencies.
What this means for the NES CICOp project:
This realization is good news for the NES CICOp project though. Worst case, the NES CICOp can also clock TIM1 with CIC CLK 4Mhz, while allowing the core to operate on 16Mhz HSI. Most of my prior proposed features would still be viable with this setup. However TIM1 is the most advanced timer on chip, it sure would be nice to have available for PWM DAC audio synthesis, or counting a cartridge signal with the "newly discovered" TIM1 clock inputs. In the end I still think it's possible to handle NES CIC timing with TIM4 solely, so TIM1 has ability to add even more features I previously didn't think were possible.
So there are 4 pins (PC3, PC4, PC6, & PC7) which can be used for TIM1 clock sources that I didn't previously realize. That really opens up opportunities for more interesting PPU A12, A13, (or PPU /RD?) counting, or a more exact CPU cycle counter with M2. I can't really think of any other signals on the connector that would be worth counting, chime in if you have other interesting ideas.
Two of those Port C pins used for TIM1 inputs also map to the SPI pins, but dropping SPI bus support isn't really a big loss anyway. It's an I/O hog anyway with it's 4 pins. PC3 can also be mapped to TLI "top level interrupt" which is a NMI for the STM8 core. I'm thinking this would be the perfect use for the mapper interrupt pin. That would allow the mapper interrupt to be non-maskable which is exactly what we're going for. While all I/O's can be used as configurable priority interrupts, there's only one interrupt vector per port (4 ports total on this device). So allowing the mapper interrupt pin to be separable from other GPIO interrupts aids in ensuring that mapper nibble writes aren't missed or delayed. That would leave 3 TIM1 pins available, 2 could be used as input, and the 3rd as an output (2A03 IRQ). That would allow TIM1 clock source to be selectable between two chosen signals at run time.
The real limitation with using TIM1 as a counter for external signals is that TIM1 was also the timer planned to be tasked as a PWM DAC for sound synthesis. Reason being that TIM1 can perform center aligned PWM generation which improves PWM DAC fidelity. But if edge aligned PWM is acceptable, then the PWM DAC could get switched to TIM2 which can only be clocked by 16Mhz.
Perhaps there isn't as much interest in the CICOp synth since it's not compatible on all consoles and requires an external dongle or console modification. On top of that, having TIM1 count external signals is a pretty powerful feature addition. Arguably the TIM1 counter feature outweighs the increased fidelity gained with center aligned PWM. I've yet to get anywhere close enough to measure/compare the difference in fidelity. So with that my plan is to focus TIM1 on counting external signals and TIM2 for PWM DAC. If a specific project greatly values center aligned PWM, and is willing to give up TIM1 counting features then they can make that trade assuming I can build that flexibility into the PCB layout.
In the end I still have to prove my concept of using TIM4 for CIC timing asynchronously. If I'm unable to pull that off, TIM1 will end up getting consumed to handle CIC timing synchronously. That would leave TIM2 & TIM4 available for PWM DAC, and 2A03 timer but hey that's still something!
Phew... Well things are getting pretty complicated here, but overall good news and some progress being made on this project. Part of me wonders if it just might be worth upgrading to the LQFP-32 package to make pin assignments simpler. But have to resist that temptation and do more with less!
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers