Adding features to discrete mapper with multipurposed CIC

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

na_th_an wrote: but I can't really go any further as, as I said, my abilities are quite limited when it comes to providing emulator support.
To be clear, this statement has the caveat assumption that full emulator support is required for your development. If one were willing to test builds primarily on hardware that would be a way to get around full emulator support. I'm willing to provide development hardware kits at little to no cost. Lots of hardware testing will be necessary anyway especially early on while the mapper is still in 'beta' form. Typically emulator authors are more interested in supporting new mapper features when there is already a game that utilizes the mapper.

Another way around emulator support might be to test and develop on a similar mapper that's already supported by emus. But only utilize the mapper in a way that 'emulates' the target discrete mapper + CICOprocessor. That would simpify porting the mapper specific read/write routines over to the new mapper. FME7/Sunsoft5 might be a good choice especially if the emu supports CHR-RAM and the end target is UNROM + CICOprocessor. Even better if the emu supports >8KB CHR-RAM. FME7/Sunsoft5 can emulate UNROM banking, and has selectable mirroring, timer based IRQs, along with audio expansion.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
na_th_an
Posts: 558
Joined: Mon May 27, 2013 9:40 am

Re: Adding features to discrete mapper with multipurposed CI

Post by na_th_an »

For something as simple as simulating the H/V mirroring switch in software, I can modify simple emulators such as Nester. Fceux should be easy to modify as well, as I understand the code I've studied (but I can't get it to compile no matter what I try - I'll try to address that issue later in the proper subforum, btw). It's just a behaviour simulation rather than true emulation. I would trap whatever you have to do from the game code to perform the switch, and order the emulator to act accordingly.

And I can always target, as you said, FME7 and perform the required changes to turn it into a UNROM+CICO.

I mean - I wouldn't need actual hardware to test while developing. I can always finish the software and send it to you once I have tested it in emulators, so there's no need for expensive overseas shipments :)
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

na_th_an wrote:I mean - I wouldn't need actual hardware to test while developing. I can always finish the software and send it to you once I have tested it in emulators, so there's no need for expensive overseas shipments :)
I would actually prefer to put the hardware in your hands if you were taking the time to target the CICOprocessor. Would make the "build - test - report - rebuild" process much easier for both of us. The shipping costs are insignificant.

So I'll take this discussion as there being notable interest in my crazy CICOprocessor idea. I'm rather thankful I took the time to keep detailed notes in this thread about how I plan to execute everything. Being close to 2 months since I presented my nibble register interface I had pretty much forgot all the specifics on my idea..

I'll do my best to make progress on this effort sooner vice later and make progress reports in this thread.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
na_th_an
Posts: 558
Joined: Mon May 27, 2013 9:40 am

Re: Adding features to discrete mapper with multipurposed CI

Post by na_th_an »

Just the addition to H/V mirroring switching and the IRQ counter to simple discrete logic mappers is a plus. I'm sure many programmers target a more expensive ASIC board just for one of those features.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Adding features to discrete mapper with multipurposed CI

Post by calima »

Don't forget single screen.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

calima wrote:Don't forget single screen.
While single screen is possible, it would require the CICO to drive CIRAM A10 with one of it's pins directly. Which is incompatible with the tiny mux idea I plan to implement with software selectable H/V. Because of that, and the fact single screen AxROM style mirroring is a trival addition to any discrete mapper I don't think using the CICO for single screen is worthwhile.

It's not that one couldn't have single screen and CICO on the same board. You just can't have selectable H/V/single via software all at once without adding more logic chips.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Quick little update. I finally ditched SDCC (C compiler) for the STM8. I should have never bothered with C in the first place with the STM8. I thought I would take advantage of C for simplifing the initialization code and everything. And when I realized how there wasn't really an option to including asm files in a SDCC build I took the cheap way out and wrote the entire CIC operations with inline assembly. The inline assembly is pretty annoying to work with but I made it work.

I just migrated everything over to pure assembly and have been using naken_asm which has been great. I optimized everything in the process and became aware just how poor SDCC was.. My seed initialization routine ended up compiling into a horrendous mess. Hand writing that ram init routine alone cut my code by about half.

In the end I went from ~2.5KB to just over 1KB with my synchronous NES implementation by migrating init code from C to assembly. There's still room for more optimizations that would easily get me well under 1KB. When I move on to my asynchronous implementation, I expect the code to shrink by a fair amount as a decent number of timing NOP's will be removed. But some extra code will be needed to handle the timer operations too.

So in the end I'm expecting the actual CIC code to consume 1KB or less of the 8KB available on the STM8. Leaves a pretty decent program flash budget for all these potential features.

Starting to get a rough idea on how I plan to manage getting by with a 8bit TIM4 alone to handle the CIC timing. I'm expect that running without a prescaler will be helpful/necessary for more precise timing. So that will require software to count rollovers, but that code can be mid-low priority so I think it'll work okay.

But for now I've got to focus on implementing at getting a synchronous SNES CIC up and running. Once that's done, I'll start chipping away at an asynchronous NES CIC and some proof of concept with the nibble registers for adding features!
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Have something of an update on this project... Perhaps I'm getting a little too deep for most people's reading interests. But my previous posts like this were rather helpful for my own idea development and later reference. I'll go ahead and give the "Way Too Long; Not Going To Read" version first and if you're up for some light reading you can continue...


WTL;NGTR:
Recently got SNES CIC implemented on STM8, but had issues with stability due to mcu clock source. That helped motivate me to start a more in depth planning of an async STM8 CIC which the NES CICOp project also requires. I ramble about multiplication of large numbers and my plan to keep timing calibrated. Discover that the targeted stm8s003f3 does indeed have GPIO available for clocking internal timer 1 "TIM1". This discovery opens up viability and/or additional features for the NES CICOp project I previously didn't thing possible such as legit PPU scanline counting.


SNES STM8 implementation problems:
I recently got my SNES CIC implementation running with the STM8. The first board/chip I used for testing works great. I've let it run for hours and it would run strong over night. While attempting to prototype a new design I hacked a STM8 onto a breakout board and glued it onto the backside of an old SNES flash board I had sitting around and used wires to connect all the pins. Unfortunately that setup was very flakey, and the CIC would drop out after ~1-30sec.

I tinkered around a bit, trying to determine the cause. Added extra capacitors to the breakout board as it was only powered from a pair of small wires, but that didn't help. I was a little skeptical of supply noise anyway considering the core is internally regulated to 1.8v with it's own external cap. I moved the CIC clock supply wire around from back side to front side of the board where it was more exposed, and that seemed to make the issue worse. I set my logic analyzer up to watch the CIC signals and debug pin when it dropped out. Found that the STM8 appeared to be resetting some times mid-stream. Other times it was making errors during the mangle calc, too many/few mangles, etc. Got in with the debugger to read the reset cause and found that the times it reset appeared to be due to illegal opcode execution. So seems that the CPU was faulting mis-reading instruction data. Depending on how it was mis-read it would result in a valid opcode that caused erroneous mangle calc, or an invalid opcode causing a the STM8 to reset. Bummer...

I later tried another board where the STM8 was closer to the ideal setup with it being well powered and as close to the connector as possible without all the lengthy wires of the previous setup. This improved matters, but would still fallout after a few hours of play. The setup was very similar to my first which has never fallen out. So perhaps some chips are more sensitive than others, I've only sampled 3 so far but with ~50% having problems I definitely need a solution.

The CIC clock is relatively clean looking at the oscope shot, and the STM8 datasheet doesn't give much for external clock specifications. Calls for "about 50% duty cycle" I measured 53% pretty close.. The datasheet goes so far as to say square, trigangle, and sine wave signals are acceptable clocks. So while the rise/fall times of 14/21nsec are pretty slow, they're a far cry from sine/triangle rise fall times..

I first tried buffering the clock through a single NOR gate I had sitting around (scope shot). That seemed to fix everything. I haven't ran it over night yet, but the second hack of a board with the breakout board ran for hours with no problems when it wouldn't even run for 1min previously. The NOR gate tightened up the rise/fall times to ~2.8nsec, and also introduced some ringing. The clock is inverted due to the NOR function, so the duty cycle became 56% which makes sense considering the virgin clock has a slower fall time.

Curious what would happen if I slowed the clock edges I tried adding a 20pF and separately a 220pF cap between the clock an ground. That only exacerbated the issue, the 3rd board which typically lasted a few hours only lasted ~min with the 220pF cap.

So I'm still not 100% sure what's going on here, ST doesn't give much of a spec for the external clock and I'm only running at 3.1Mhz which is at the low end of the 0-16Mhz spec. I never had this issue when working on the NES, and I had some pretty godawful wiring setups with 5-6inch wires going from the cart to the dev board in the beginning. Still need to do some more testing, but adding a logic gate as a clock buffer seems to be the best fix at the moment.


Asynchronous CIC implementation planning:
All that brought me back around to my idea of having an asynchronous CIC implementation that doesn't have the cart's mcu CPU core run off the 3-4Mhz CIC clock signal. One potential fix to the problem above is to cut the clock out of the equation completely! Certainly not an easy feat, but the motivation from the "NES CICOp" I figured may as well give it my best shot.

Looking at the numbers, an async SNES CIC is going to be quite a bit more challenging than NES due to the ~75% slower clock, and 3x as many mangle calcs. So the STM8 needs to be much more accurate with it's timing to meet the same ~3usec output window because it's counting "in the dark" for about 4 times as long compared to the NES CIC. So if it can be pulled off with the SNES, then NES shouldn't be a problem at all.

I took a closer look at how the STM8 timers work, and thankfully the prescalers are able to be changed on the fly. So targetting the simplest 8bit counter TIM4 looks hopeful. I can set the prescaler to it's max and divide by 128, which gets a max count of 2.048msec with 16Mhz HSI clock. The max theoretical time between bit transfers on the SNES is ~10.5msec, so software will only have to count 5 TIM4 rollovers at most, and at the last rollover, the prescaler can be tuned down to divide by 1 for fine tuning just prior to bit transferring. This allows long time periods to be measured with high precision (no jitter), but the accuracy due to timing difference between the STM8 HSI and CIC clock must be well calibrated to get the accuracy along with the precision we need.

I determined the calibration needs to allow for 0.01% tuning steps which equates to 1usec steps for the 10.5msec max theoretical SNES mangle time. NES only has a max theoretical mangle time of 2.7msec, so 1usec steps would only require 0.037% tuning steps. In binary, 1/128K gives us 0.0076% trim steps which should be more than adequate.

For a max tune step, the STM8 HSI is spec'd to be 1% accurate with factory tuning at 25C, and 5% across the temp range. If we go up to binary 1/32 step that gives a max tune of +/- 6.2% which should be enough. That means we need a 13bit calibration factor for +/- 6.2% range with 0.0076% step size. Could add a few more bits to round off to 15-16bits but it's probably overkill..

The delay count requires 14bits to measure up to 10.5msec in 1usec step size. But having a few extra bits for fractions of 1usec will be beneficial to keep us from adding jitter between timing events. The NTSC SNES CIC machine cycle is 1.3usec after all, so that fraction becomes a pain as rounding errors add up over time. Adding 4 more bits for fractions of 1usec allows us to get down to the smallest step size of the 16Mhz counter.

So in total there's 18bits of delay count to be multiplied by a 13bit calibration factor to determine a delay offset. The STM8 thankfully has a 8bit hardware multiplier. 18b * 13b factors produce a 31bit product. With an 8bit multipler that equates to 6 multiply operations, and ~7 summations to get the final product, the result gets truncated down to a 15bit offset which then gets signed depending on pos/neg calibration factor. That signed offset then gets added to the desired delay for the final timer count value.

My plan is to then use TIM4 in coarse count mode (8usec steps) until 8-16usec of the delay remain. For the final fine delay TIM4 will get switched to fine mode (62.5nsec steps). At the end of that delay the next bit will be output to the LOCK. While that 8-16usec fine count is occuring, a fixed ~8usec time delay will get pre-loaded into TIM4 for the end of bit transfer data clearing and calibration routine. At the end of that routine TIM4 will be setup to start counting down to the next bit transfer.

Since only the rising edge of the bit transfer is timing sensitive, the STM8 can use the falling edge of the LOCK's output bit (assuming it's expected to be a 1) as a timing adjust/cal point. TIM4 will be counting up since the expected rising edge, an interrupt can be enabled for the falling edge of LOCK's data. That GPIO isr will then read TIM4 value and compare it to the expected ~4usec pulse width. If it's beyond a tolerance I'm thinking that simply adding/subtracting ~1bit from the calibration factor will account for drift. Everything has to be pretty close to correct timing if we're still alive, so only minor adjustments should be needed to correct for rounding errors and slow drifts in HSI/CIC frequency.


Learning more about STM8 interrupts:
Getting a little deeper into the STM8 I've realized there's a decent way to remove the 1-5cycle jitter from when an interrupt routine starts executing by using "wait for interrupt" opcode which pre-stacks the processor status, and freezes the CPU until an interrupt occurs. With that there's only 1-2 cycle jitter due to timing edge of interrupt and execution of isr instructions. So I'm planning to make use of that.

Additionally I'm realizing an async SNES CIC is even more of a pain as the PAL CIC runs at 3.57Mhz compared to 3.08Mhz NTSC CIC. So it's 1.12usec per PAL CIC machine cycle, and 1.3usec per NTSC CIC machine cycle. So while the machine cycle count is identical between PAL/NTSC SNES CIC, the actual time differs due to operating frequency. So all the timing delays would have to be adjusted to have a multiregion SNES CIC with an asynchronous implementation.


QUESTION on NES CIC clocking in other regions:
I don't think that's the case for NES though. My PAL-A "Mattel" NES is running it's CIC at 4Mhz just like NTSC. I don't have a PAL-B, Comboy, nor other Asian/Aussie NES variants. I only have a PAL-B CIC, and Comboy CIC yanked from cartridges which I place in my CIC socketed NTSC NES for testing. Since PAL-A is 4Mhz like NTSC, I'm hopeful all others are as well. If any one has more info on that I'd appreciate it! Even just having confirmation that PAL-B console runs it's CIC at 4Mhz clock frequency would be good to know.


Discovering STM8's TIM1 has external clock pins available:
So aside from the struggles with my SNES implementation and the motivation it helped provide to making progress on an async solution, I've became more familiar with some of the STM8's details. Namely I'm better understanding how the timers work, and good news is I misunderstood TIM1's abilities previously. I was rather disappointed when I thought that there were no external clock sources (pins) available to clock any of the timers. My understanding was that "ETR" pins the ones that could be used to clock counters. And with the 20pin package the TIM1_ETR pin is unfortunately not pinned out. While I was right about the ETR pin, TIM1 is able to use any of the 4 input pins as a clock source to the counter as well. TIM2 (the other 16bit counter on chip) however does not have this ability. Both TIM2 and TIM4 must be clocked from fMASTER which we need to be running on HSI 16Mhz to allow for multitasking the CICOp.

Learning this, I'm planning to have my SNES implementation use CIC CLK to allow TIM1 to count CIC cycles exactly. So TIM1 will be synchronous with the LOCK's CIC CLK, but the STM8 core itself won't be. I presume that'll be enough to get around issues I had with STM8 core stability when using CIC CLK as a external CPU core clock source. This also resolves the annoyance of PAL & NTSC SNES CIC's running at different frequencies.


What this means for the NES CICOp project:
This realization is good news for the NES CICOp project though. Worst case, the NES CICOp can also clock TIM1 with CIC CLK 4Mhz, while allowing the core to operate on 16Mhz HSI. Most of my prior proposed features would still be viable with this setup. However TIM1 is the most advanced timer on chip, it sure would be nice to have available for PWM DAC audio synthesis, or counting a cartridge signal with the "newly discovered" TIM1 clock inputs. In the end I still think it's possible to handle NES CIC timing with TIM4 solely, so TIM1 has ability to add even more features I previously didn't think were possible.

So there are 4 pins (PC3, PC4, PC6, & PC7) which can be used for TIM1 clock sources that I didn't previously realize. That really opens up opportunities for more interesting PPU A12, A13, (or PPU /RD?) counting, or a more exact CPU cycle counter with M2. I can't really think of any other signals on the connector that would be worth counting, chime in if you have other interesting ideas.

Two of those Port C pins used for TIM1 inputs also map to the SPI pins, but dropping SPI bus support isn't really a big loss anyway. It's an I/O hog anyway with it's 4 pins. PC3 can also be mapped to TLI "top level interrupt" which is a NMI for the STM8 core. I'm thinking this would be the perfect use for the mapper interrupt pin. That would allow the mapper interrupt to be non-maskable which is exactly what we're going for. While all I/O's can be used as configurable priority interrupts, there's only one interrupt vector per port (4 ports total on this device). So allowing the mapper interrupt pin to be separable from other GPIO interrupts aids in ensuring that mapper nibble writes aren't missed or delayed. That would leave 3 TIM1 pins available, 2 could be used as input, and the 3rd as an output (2A03 IRQ). That would allow TIM1 clock source to be selectable between two chosen signals at run time.

The real limitation with using TIM1 as a counter for external signals is that TIM1 was also the timer planned to be tasked as a PWM DAC for sound synthesis. Reason being that TIM1 can perform center aligned PWM generation which improves PWM DAC fidelity. But if edge aligned PWM is acceptable, then the PWM DAC could get switched to TIM2 which can only be clocked by 16Mhz.

Perhaps there isn't as much interest in the CICOp synth since it's not compatible on all consoles and requires an external dongle or console modification. On top of that, having TIM1 count external signals is a pretty powerful feature addition. Arguably the TIM1 counter feature outweighs the increased fidelity gained with center aligned PWM. I've yet to get anywhere close enough to measure/compare the difference in fidelity. So with that my plan is to focus TIM1 on counting external signals and TIM2 for PWM DAC. If a specific project greatly values center aligned PWM, and is willing to give up TIM1 counting features then they can make that trade assuming I can build that flexibility into the PCB layout.

In the end I still have to prove my concept of using TIM4 for CIC timing asynchronously. If I'm unable to pull that off, TIM1 will end up getting consumed to handle CIC timing synchronously. That would leave TIM2 & TIM4 available for PWM DAC, and 2A03 timer but hey that's still something!

Phew... Well things are getting pretty complicated here, but overall good news and some progress being made on this project. Part of me wonders if it just might be worth upgrading to the LQFP-32 package to make pin assignments simpler. But have to resist that temptation and do more with less!
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

infiniteneslives wrote:Additionally I'm realizing an async SNES CIC is even more of a pain as the PAL CIC runs at 3.57Mhz compared to 3.08Mhz NTSC CIC. So it's 1.12usec per PAL CIC machine cycle, and 1.3usec per NTSC CIC machine cycle.
Hold on a sec. NTSC SNES consoles come in both 4MHz (ceramic resonator, SHVC-CPU-01) and 3MHz (APU 24.576MHz÷8) versions.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

lidnariq wrote:
infiniteneslives wrote:Additionally I'm realizing an async SNES CIC is even more of a pain as the PAL CIC runs at 3.57Mhz compared to 3.08Mhz NTSC CIC. So it's 1.12usec per PAL CIC machine cycle, and 1.3usec per NTSC CIC machine cycle.
Hold on a sec. NTSC SNES consoles come in both 4MHz (ceramic resonator, SHVC-CPU-01) and 3MHz (APU 24.576MHz÷8) versions.
Oh, Well that's good to know! I'm glad mine happened to have been 3.07Mhz version otherwise I might have gleefully assumed all NES/SNES CIC were 4Mhz. Guess it might not have been an issue as I prob would have stuck with a sync solution with traditional CPU cycle counting. But, I didn't realize the differences between NTSC versions, possible I wouldn't have seen these instability issues with a 4Mhz and gotten burned when shipping to a 3.07Mhz flake like mine...? I'll have to try and hunt down a 4Mhz SHVC-CPU-01 with ceramic resonator for testing. I've got a couple SNES jr's somewhere, but I'm guessing those are 3.07Mhz APU/8 as that sounds cheaper.

All this is even more reason to keep TIM1 CIC CLK cycle counting for the STM8. A purely async solution for SNES would be quite the PITA. Thankfully/Hopefully that doesn't seem to be the case for NES as I'm still hopeful all versions are 4Mhz..? I've only got NTSC and PAL-A to test with.

EDIT: Just pulled out one of my SNES jr's and it's CIC Clock frequency is 3.57Mhz similar to my PAL SNES (apparently German and IDK if it's 1-2chip). So if 1chip SNES CIC's run at 4Mhz, then there are a total of THREE different CIC clock frequencies for NTSC..?
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Slight update..

Every time I think I've got an understanding on these STM8 timers I'm proved wrong shortly afterwards... But I suppose can't be too suprised considering the timer portions of the STM8 reference manual spans a whopping 119 pages! As with many things it's not until you final start writing code for a piece of hardware do you actually start to learn it's true behavior. Good news is the STM8 timers are nearly identical to the STM32 timers so won't be starting from square one when I get more involved with the STM32 for other projects.

So I got the basic SNES CIC operating by use of TIM1 for time keeping purposes. Still technically a synchronous implementation, but the mcu core itself is asynchronous, just the timer is CIC LOCK synchronous. This allowed the STM8 core to be clocked by internal 16Mhz oscillator. This implementation is much more stable than my previous one where the core was clocked by 3-4Mhz CIC CLK signal. It was also significantly easier as cycle counting wasn't needed at all. Although that opinion is a little biased considering it was my 3rd NES/SNES CIC implementation so all my mangle calculation bugs had been previously worked out.

If nothing else, I would strongly recommend for anyone looking to make their own CIC implementation to utilize an on chip timer/counter to ensure proper CIC timing. Especially if it has an auto-reload feature making it easy to change the rollover value of the counter without interrupt jitter. My implementation is fairly simple, but it took a bit to realize the best way to operate the timer. It's best to have the reload value preloaded, that way the next event "count down" value is automatically loaded into the counter on it's next reload. The main trouble then is that one needs to know what the upcoming delay should be prior to the event starting. This is easy for bit transfers, but mangle timing calculations isn't as straight forward as it varies based on the data. My solution was to determine the approximate min mangle time based on number of mangles needed, and assuming shortest mangle time; then loading that into the timer prior to performing mangle calculations. Subsequently during the mangle calc I kept track of how many mangle calcs were the longer 'overflow' version. Once the mangle calc is complete, simply do the math to figure out the added delay needed for exact mangle calc timing.

Having the mcu core asynchronous makes coding a breeze in comparison to cycle counting especially on a prefetched 3-stage pipeline! Wish I would have went this route from the beginning. There's benefits all around including condensed program code, and lower power consumption as the mcu can't spend large amount of time sleeping waiting for the timer interrupt. This also allows for trickery in region detection that isn't possible in a sync core. Most regions can be detected on the fly in real time as you can poll the LOCK's data prior to outputting data as the KEY. And now there's free time to adjust the seed accordingly since the mcu core is operating so much faster. Not that traditional means of region switching and saving last known good region to eeprom is that burdensome, but it's cool to have the ability to sense the region on the fly. And of course now the mcu core is operating an order of magnitude faster than the LOCK CIC, it has free time for multitasking fancy new features so long as CIC timings are sufficiently prioritized.

Biggest limitation I realized is that NOT ALL TIM1_CHx inputs can be used for clock inputs to TIM1 on the STM8. The basic diagram doesn't make this clear and so my previous presumption is wrong. The only available TIM1 external clock pin options are ETR, TIM1_CH1, and TIM1_CH2. The ETR pin is unfortunately not pinned out on the 20pin packages, but TIM1_CH1 & TIM1_CH2 are available. TIM1_CH1 & TIM1_CH2 are mostly just as capable as each other for timer clock sources. Only difference is that CH1 can be set to clock TIM1 on all edges (both rising and falling), CH2 doesn't have that setting available. Both CH1 & CH2 can clock TIM1 on rising *or* falling edges, and there's also a nice filter ability where it must be high/low for so long before the edge is actually 'valid' and TIM1 gets clocked. In reality the edges of TIM1_CH1/CH2 aren't fed directly to TIM1 counter. Instead, the internal oscillator's clock is passed to TIM1 following adequately filtered edges of TIM1_CH1/2 pins. So with a mcu core frequency of 16Mhz, an external clock source must be high/low for more than ~63nsec to be a viable clock source. For the slow poke that NES signals are, that's not an issue, and probably for the best especially since signals like PPU A12 are so noisy.

Anyway, I'm mostly done tinkering with the SNES CIC now, so can get back to work on the NES CICOprocessor! Having the SNES implementation is good proof of concept that worst case the CICOp can use externally clocked TIM1 to ensure proper CIC timing. But that's worst case as it requires TIM1/CIC timing to have highest interrupt priority. With that worst case the STM8 can't be guaranteed to catch/respond to 6502 requests. I'm still holding out that I can pull off CIC timings with the meager 8bit internally clocked TIM4 alone; and still prioritize mapper requests over CIC timing with enough trickery and convention with the mapper register read/write protocol as previously laid out.

So the only real update to the CICOp abilities is that we've only got 2 pins to choose from for TIM1 clock sources, compared to the 4 previously thought. Curious if anyone has interesting ideas for useful signal choices. PPU A12 and CPU M2 are the most traditional choices to provide the choice between MMC3 style scanline counter and FME7 style CPU cycle counters. The only other signal of interest that comes to my mind is PPU A13 or PPU /RD for increased precision compared to PPU A12, and removal of MMC3 style pattern table and 8x16 sprite restrictions. With only 2 pins available what would you guys choose for TIM1 clock sources?

Crazy to think about unlocking features like this for the "low cost" of hardware development time. Considering the cost, it's possible the CICOp might even make sense for expanding the abilities of a MMC1-3 scale mapper...

I'm getting low on my discrete mapper board inventory so it's time for me to draft the first iteration of my CICOp plans into the PCB design for my next board order. I might try to include some jumpers to select TIM1 clock sources at assembly time, but the number of jumpers is already a bit higher than I'd like to see...

One other nifty thing I've managed to work out recently is the STM8 in-circuit programming SWIM (single wire interface module) protocol with my latest kazzo/inlretro build. It's pretty slick being only 1 signal, and the entire STM8 memory map is available to read/write from via SWIM, along with ability to control the STM8 core via the debug module. When the chip's read out protection is set, debug module, flash, and eeprom are locked, but SRAM and all periphery registers are still available. So it can actually become a slick means of i/o expansion since all the STM8's gpio registers can be accessed via a single wire even on a virgin/locked STM8. This is nifty for open/short testing to find PCB assembly flaws. Also might try to abuse this for accessing on board CPLD's JTAG signals without needing a single byte of code executing on the STM8. That is especially helpful for famicom carts where there's effectively no card edge pins to spare, now for the low cost of an STM8 I can indirectly add a bunch of pins to the card edge!
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

infiniteneslives wrote:So the only real update to the CICOp abilities is that we've only got 2 pins to choose from for TIM1 clock sources, compared to the 4 previously thought. Curious if anyone has interesting ideas for useful signal choices. PPU A12 and CPU M2 are the most traditional choices to provide the choice between MMC3 style scanline counter and FME7 style CPU cycle counters. The only other signal of interest that comes to my mind is PPU A13 or PPU /RD for increased precision compared to PPU A12, and removal of MMC3 style pattern table and 8x16 sprite restrictions. With only 2 pins available what would you guys choose for TIM1 clock sources?
On the one hand, I suspect that the precision available from using PPU/RD would be lost in the 6502's variable IRQ latency. On the other hand, since your hardware can count both rising and falling edges, using PPU/RD lets you directly set IRQ coordates to a specific (X,Y) location on screen.

One of the API niceties of the MMC3 IRQ is that it doesn't let you do anything obviously wrong regarding IRQ latency: it'll always trigger at the same X position. PPUA13 and M2-based IRQs can easily be used wrong and slip a few cycles in response.

I do kinda wonder whether the VRC4's M2-based IRQ prescaler is a better choice, though. Having to rely on the screen being enabled in order to get IRQs feels like a silly constraint.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by tepples »

Two things that break an M2/(341/3) prescaler like that of the VRC4/6 are PAL NES and the Hi-Def NES's overclocking feature.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

As always, great points guys!
On the one hand, I suspect that the precision available from using PPU/RD would be lost in the 6502's variable IRQ latency. On the other hand, since your hardware can count both rising and falling edges, using PPU/RD lets you directly set IRQ coordates to a specific (X,Y) location on screen.
It lacks grace, but perhaps one way to overcome the 6502 variable IRQ latency would be to have a "double fire" mode. Have the first IRQ be variable and get the 6502 spinning on some less variable code. Either way one is really going to have to work hard to get precise interrupts though, so perhaps PPU/RD counting isn't that valuable..
I do kinda wonder whether the VRC4's M2-based IRQ prescaler is a better choice, though. Having to rely on the screen being enabled in order to get IRQs feels like a silly constraint.
Seems like inclusion of M2 as one of the two signals is an obvious choice for the first pin. I'm leaning towards PPU A12 for the second one simply because it's more traditional and thus more likely to be adopted. If it's not too much of a mess, perhaps include a jumper to swap PPU A12 out for PPU/RD if there's a desire to get fancy. Chances are such jumper is unlikely to get utilized on the first board rev though..
Two things that break an M2/(341/3) prescaler like that of the VRC4/6 are PAL NES
The nice thing about the CICOp is that it could actually support various prescaler modes to compensate for this. A software selectable prescaler setting could choose between no prescaler, VRC4 "NTSC traditional" 113-2/3, or VRC4 "PAL alternate" 106-15/16, etc.
and the Hi-Def NES's overclocking feature.
I'm not familiar with the specifics of Hi-Def NES's overclocking. Perhaps this the the wrong attitude, but from my (admittedly biased) perspective it's their job to replicate (or not change the behavior of) the original console. I enjoy supporting clone consoles whenever reasonably possible. But trying to support them all including ones which have yet to be created is futile. So I try to not loose much sleep over it nor let have strong influence on design choices. If there's something that I can add to improve compatibility and a means for me to test it I'm open to the idea though.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

infiniteneslives wrote:It lacks grace, but perhaps one way to overcome the 6502 variable IRQ latency would be to have a "double fire" mode. Have the first IRQ be variable and get the 6502 spinning on some less variable code. Either way one is really going to have to work hard to get precise interrupts though, so perhaps PPU/RD counting isn't that valuable..
My ridiculous "pipe dream" IRQ system for the NES involves requesting a specific X/Y location for the IRQ to fire, asserting the IRQ early, and an injected clockslide to get the IRQ to start with CPU cycle precision...

This pretty clearly is out of scope for the CICoprocessor
Post Reply