It is currently Tue Sep 19, 2017 1:52 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 55 posts ]  Go to page Previous  1, 2, 3, 4
Author Message
PostPosted: Thu Jul 27, 2017 12:36 pm 
Offline
User avatar

Joined: Mon Apr 04, 2011 11:49 am
Posts: 1874
Location: WhereverIparkIt, USA
na_th_an wrote:
but I can't really go any further as, as I said, my abilities are quite limited when it comes to providing emulator support.

To be clear, this statement has the caveat assumption that full emulator support is required for your development. If one were willing to test builds primarily on hardware that would be a way to get around full emulator support. I'm willing to provide development hardware kits at little to no cost. Lots of hardware testing will be necessary anyway especially early on while the mapper is still in 'beta' form. Typically emulator authors are more interested in supporting new mapper features when there is already a game that utilizes the mapper.

Another way around emulator support might be to test and develop on a similar mapper that's already supported by emus. But only utilize the mapper in a way that 'emulates' the target discrete mapper + CICOprocessor. That would simpify porting the mapper specific read/write routines over to the new mapper. FME7/Sunsoft5 might be a good choice especially if the emu supports CHR-RAM and the end target is UNROM + CICOprocessor. Even better if the emu supports >8KB CHR-RAM. FME7/Sunsoft5 can emulate UNROM banking, and has selectable mirroring, timer based IRQs, along with audio expansion.

_________________
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers


Top
 Profile  
 
PostPosted: Fri Jul 28, 2017 4:54 am 
Offline

Joined: Mon May 27, 2013 9:40 am
Posts: 348
For something as simple as simulating the H/V mirroring switch in software, I can modify simple emulators such as Nester. Fceux should be easy to modify as well, as I understand the code I've studied (but I can't get it to compile no matter what I try - I'll try to address that issue later in the proper subforum, btw). It's just a behaviour simulation rather than true emulation. I would trap whatever you have to do from the game code to perform the switch, and order the emulator to act accordingly.

And I can always target, as you said, FME7 and perform the required changes to turn it into a UNROM+CICO.

I mean - I wouldn't need actual hardware to test while developing. I can always finish the software and send it to you once I have tested it in emulators, so there's no need for expensive overseas shipments :)

_________________
http://www.mojontwins.com


Top
 Profile  
 
PostPosted: Sat Jul 29, 2017 2:10 pm 
Offline
User avatar

Joined: Mon Apr 04, 2011 11:49 am
Posts: 1874
Location: WhereverIparkIt, USA
na_th_an wrote:
I mean - I wouldn't need actual hardware to test while developing. I can always finish the software and send it to you once I have tested it in emulators, so there's no need for expensive overseas shipments :)


I would actually prefer to put the hardware in your hands if you were taking the time to target the CICOprocessor. Would make the "build - test - report - rebuild" process much easier for both of us. The shipping costs are insignificant.

So I'll take this discussion as there being notable interest in my crazy CICOprocessor idea. I'm rather thankful I took the time to keep detailed notes in this thread about how I plan to execute everything. Being close to 2 months since I presented my nibble register interface I had pretty much forgot all the specifics on my idea..

I'll do my best to make progress on this effort sooner vice later and make progress reports in this thread.

_________________
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers


Top
 Profile  
 
PostPosted: Mon Jul 31, 2017 6:57 am 
Offline

Joined: Mon May 27, 2013 9:40 am
Posts: 348
Just the addition to H/V mirroring switching and the IRQ counter to simple discrete logic mappers is a plus. I'm sure many programmers target a more expensive ASIC board just for one of those features.

_________________
http://www.mojontwins.com


Top
 Profile  
 
PostPosted: Mon Jul 31, 2017 10:44 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 519
Don't forget single screen.


Top
 Profile  
 
PostPosted: Mon Jul 31, 2017 11:00 am 
Offline
User avatar

Joined: Mon Apr 04, 2011 11:49 am
Posts: 1874
Location: WhereverIparkIt, USA
calima wrote:
Don't forget single screen.


While single screen is possible, it would require the CICO to drive CIRAM A10 with one of it's pins directly. Which is incompatible with the tiny mux idea I plan to implement with software selectable H/V. Because of that, and the fact single screen AxROM style mirroring is a trival addition to any discrete mapper I don't think using the CICO for single screen is worthwhile.

It's not that one couldn't have single screen and CICO on the same board. You just can't have selectable H/V/single via software all at once without adding more logic chips.

_________________
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers


Top
 Profile  
 
PostPosted: Wed Aug 16, 2017 1:57 pm 
Offline
User avatar

Joined: Mon Apr 04, 2011 11:49 am
Posts: 1874
Location: WhereverIparkIt, USA
Quick little update. I finally ditched SDCC (C compiler) for the STM8. I should have never bothered with C in the first place with the STM8. I thought I would take advantage of C for simplifing the initialization code and everything. And when I realized how there wasn't really an option to including asm files in a SDCC build I took the cheap way out and wrote the entire CIC operations with inline assembly. The inline assembly is pretty annoying to work with but I made it work.

I just migrated everything over to pure assembly and have been using naken_asm which has been great. I optimized everything in the process and became aware just how poor SDCC was.. My seed initialization routine ended up compiling into a horrendous mess. Hand writing that ram init routine alone cut my code by about half.

In the end I went from ~2.5KB to just over 1KB with my synchronous NES implementation by migrating init code from C to assembly. There's still room for more optimizations that would easily get me well under 1KB. When I move on to my asynchronous implementation, I expect the code to shrink by a fair amount as a decent number of timing NOP's will be removed. But some extra code will be needed to handle the timer operations too.

So in the end I'm expecting the actual CIC code to consume 1KB or less of the 8KB available on the STM8. Leaves a pretty decent program flash budget for all these potential features.

Starting to get a rough idea on how I plan to manage getting by with a 8bit TIM4 alone to handle the CIC timing. I'm expect that running without a prescaler will be helpful/necessary for more precise timing. So that will require software to count rollovers, but that code can be mid-low priority so I think it'll work okay.

But for now I've got to focus on implementing at getting a synchronous SNES CIC up and running. Once that's done, I'll start chipping away at an asynchronous NES CIC and some proof of concept with the nibble registers for adding features!

_________________
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers


Top
 Profile  
 
PostPosted: Sun Aug 27, 2017 11:35 am 
Offline
User avatar

Joined: Mon Apr 04, 2011 11:49 am
Posts: 1874
Location: WhereverIparkIt, USA
Have something of an update on this project... Perhaps I'm getting a little too deep for most people's reading interests. But my previous posts like this were rather helpful for my own idea development and later reference. I'll go ahead and give the "Way Too Long; Not Going To Read" version first and if you're up for some light reading you can continue...


WTL;NGTR:
Recently got SNES CIC implemented on STM8, but had issues with stability due to mcu clock source. That helped motivate me to start a more in depth planning of an async STM8 CIC which the NES CICOp project also requires. I ramble about multiplication of large numbers and my plan to keep timing calibrated. Discover that the targeted stm8s003f3 does indeed have GPIO available for clocking internal timer 1 "TIM1". This discovery opens up viability and/or additional features for the NES CICOp project I previously didn't thing possible such as legit PPU scanline counting.


SNES STM8 implementation problems:
I recently got my SNES CIC implementation running with the STM8. The first board/chip I used for testing works great. I've let it run for hours and it would run strong over night. While attempting to prototype a new design I hacked a STM8 onto a breakout board and glued it onto the backside of an old SNES flash board I had sitting around and used wires to connect all the pins. Unfortunately that setup was very flakey, and the CIC would drop out after ~1-30sec.

I tinkered around a bit, trying to determine the cause. Added extra capacitors to the breakout board as it was only powered from a pair of small wires, but that didn't help. I was a little skeptical of supply noise anyway considering the core is internally regulated to 1.8v with it's own external cap. I moved the CIC clock supply wire around from back side to front side of the board where it was more exposed, and that seemed to make the issue worse. I set my logic analyzer up to watch the CIC signals and debug pin when it dropped out. Found that the STM8 appeared to be resetting some times mid-stream. Other times it was making errors during the mangle calc, too many/few mangles, etc. Got in with the debugger to read the reset cause and found that the times it reset appeared to be due to illegal opcode execution. So seems that the CPU was faulting mis-reading instruction data. Depending on how it was mis-read it would result in a valid opcode that caused erroneous mangle calc, or an invalid opcode causing a the STM8 to reset. Bummer...

I later tried another board where the STM8 was closer to the ideal setup with it being well powered and as close to the connector as possible without all the lengthy wires of the previous setup. This improved matters, but would still fallout after a few hours of play. The setup was very similar to my first which has never fallen out. So perhaps some chips are more sensitive than others, I've only sampled 3 so far but with ~50% having problems I definitely need a solution.

The CIC clock is relatively clean looking at the oscope shot, and the STM8 datasheet doesn't give much for external clock specifications. Calls for "about 50% duty cycle" I measured 53% pretty close.. The datasheet goes so far as to say square, trigangle, and sine wave signals are acceptable clocks. So while the rise/fall times of 14/21nsec are pretty slow, they're a far cry from sine/triangle rise fall times..

I first tried buffering the clock through a single NOR gate I had sitting around (scope shot). That seemed to fix everything. I haven't ran it over night yet, but the second hack of a board with the breakout board ran for hours with no problems when it wouldn't even run for 1min previously. The NOR gate tightened up the rise/fall times to ~2.8nsec, and also introduced some ringing. The clock is inverted due to the NOR function, so the duty cycle became 56% which makes sense considering the virgin clock has a slower fall time.

Curious what would happen if I slowed the clock edges I tried adding a 20pF and separately a 220pF cap between the clock an ground. That only exacerbated the issue, the 3rd board which typically lasted a few hours only lasted ~min with the 220pF cap.

So I'm still not 100% sure what's going on here, ST doesn't give much of a spec for the external clock and I'm only running at 3.1Mhz which is at the low end of the 0-16Mhz spec. I never had this issue when working on the NES, and I had some pretty godawful wiring setups with 5-6inch wires going from the cart to the dev board in the beginning. Still need to do some more testing, but adding a logic gate as a clock buffer seems to be the best fix at the moment.


Asynchronous CIC implementation planning:
All that brought me back around to my idea of having an asynchronous CIC implementation that doesn't have the cart's mcu CPU core run off the 3-4Mhz CIC clock signal. One potential fix to the problem above is to cut the clock out of the equation completely! Certainly not an easy feat, but the motivation from the "NES CICOp" I figured may as well give it my best shot.

Looking at the numbers, an async SNES CIC is going to be quite a bit more challenging than NES due to the ~75% slower clock, and 3x as many mangle calcs. So the STM8 needs to be much more accurate with it's timing to meet the same ~3usec output window because it's counting "in the dark" for about 4 times as long compared to the NES CIC. So if it can be pulled off with the SNES, then NES shouldn't be a problem at all.

I took a closer look at how the STM8 timers work, and thankfully the prescalers are able to be changed on the fly. So targetting the simplest 8bit counter TIM4 looks hopeful. I can set the prescaler to it's max and divide by 128, which gets a max count of 2.048msec with 16Mhz HSI clock. The max theoretical time between bit transfers on the SNES is ~10.5msec, so software will only have to count 5 TIM4 rollovers at most, and at the last rollover, the prescaler can be tuned down to divide by 1 for fine tuning just prior to bit transferring. This allows long time periods to be measured with high precision (no jitter), but the accuracy due to timing difference between the STM8 HSI and CIC clock must be well calibrated to get the accuracy along with the precision we need.

I determined the calibration needs to allow for 0.01% tuning steps which equates to 1usec steps for the 10.5msec max theoretical SNES mangle time. NES only has a max theoretical mangle time of 2.7msec, so 1usec steps would only require 0.037% tuning steps. In binary, 1/128K gives us 0.0076% trim steps which should be more than adequate.

For a max tune step, the STM8 HSI is spec'd to be 1% accurate with factory tuning at 25C, and 5% across the temp range. If we go up to binary 1/32 step that gives a max tune of +/- 6.2% which should be enough. That means we need a 13bit calibration factor for +/- 6.2% range with 0.0076% step size. Could add a few more bits to round off to 15-16bits but it's probably overkill..

The delay count requires 14bits to measure up to 10.5msec in 1usec step size. But having a few extra bits for fractions of 1usec will be beneficial to keep us from adding jitter between timing events. The NTSC SNES CIC machine cycle is 1.3usec after all, so that fraction becomes a pain as rounding errors add up over time. Adding 4 more bits for fractions of 1usec allows us to get down to the smallest step size of the 16Mhz counter.

So in total there's 18bits of delay count to be multiplied by a 13bit calibration factor to determine a delay offset. The STM8 thankfully has a 8bit hardware multiplier. 18b * 13b factors produce a 31bit product. With an 8bit multipler that equates to 6 multiply operations, and ~7 summations to get the final product, the result gets truncated down to a 15bit offset which then gets signed depending on pos/neg calibration factor. That signed offset then gets added to the desired delay for the final timer count value.

My plan is to then use TIM4 in coarse count mode (8usec steps) until 8-16usec of the delay remain. For the final fine delay TIM4 will get switched to fine mode (62.5nsec steps). At the end of that delay the next bit will be output to the LOCK. While that 8-16usec fine count is occuring, a fixed ~8usec time delay will get pre-loaded into TIM4 for the end of bit transfer data clearing and calibration routine. At the end of that routine TIM4 will be setup to start counting down to the next bit transfer.

Since only the rising edge of the bit transfer is timing sensitive, the STM8 can use the falling edge of the LOCK's output bit (assuming it's expected to be a 1) as a timing adjust/cal point. TIM4 will be counting up since the expected rising edge, an interrupt can be enabled for the falling edge of LOCK's data. That GPIO isr will then read TIM4 value and compare it to the expected ~4usec pulse width. If it's beyond a tolerance I'm thinking that simply adding/subtracting ~1bit from the calibration factor will account for drift. Everything has to be pretty close to correct timing if we're still alive, so only minor adjustments should be needed to correct for rounding errors and slow drifts in HSI/CIC frequency.


Learning more about STM8 interrupts:
Getting a little deeper into the STM8 I've realized there's a decent way to remove the 1-5cycle jitter from when an interrupt routine starts executing by using "wait for interrupt" opcode which pre-stacks the processor status, and freezes the CPU until an interrupt occurs. With that there's only 1-2 cycle jitter due to timing edge of interrupt and execution of isr instructions. So I'm planning to make use of that.

Additionally I'm realizing an async SNES CIC is even more of a pain as the PAL CIC runs at 3.57Mhz compared to 3.08Mhz NTSC CIC. So it's 1.12usec per PAL CIC machine cycle, and 1.3usec per NTSC CIC machine cycle. So while the machine cycle count is identical between PAL/NTSC SNES CIC, the actual time differs due to operating frequency. So all the timing delays would have to be adjusted to have a multiregion SNES CIC with an asynchronous implementation.


QUESTION on NES CIC clocking in other regions:
I don't think that's the case for NES though. My PAL-A "Mattel" NES is running it's CIC at 4Mhz just like NTSC. I don't have a PAL-B, Comboy, nor other Asian/Aussie NES variants. I only have a PAL-B CIC, and Comboy CIC yanked from cartridges which I place in my CIC socketed NTSC NES for testing. Since PAL-A is 4Mhz like NTSC, I'm hopeful all others are as well. If any one has more info on that I'd appreciate it! Even just having confirmation that PAL-B console runs it's CIC at 4Mhz clock frequency would be good to know.


Discovering STM8's TIM1 has external clock pins available:
So aside from the struggles with my SNES implementation and the motivation it helped provide to making progress on an async solution, I've became more familiar with some of the STM8's details. Namely I'm better understanding how the timers work, and good news is I misunderstood TIM1's abilities previously. I was rather disappointed when I thought that there were no external clock sources (pins) available to clock any of the timers. My understanding was that "ETR" pins the ones that could be used to clock counters. And with the 20pin package the TIM1_ETR pin is unfortunately not pinned out. While I was right about the ETR pin, TIM1 is able to use any of the 4 input pins as a clock source to the counter as well. TIM2 (the other 16bit counter on chip) however does not have this ability. Both TIM2 and TIM4 must be clocked from fMASTER which we need to be running on HSI 16Mhz to allow for multitasking the CICOp.

Learning this, I'm planning to have my SNES implementation use CIC CLK to allow TIM1 to count CIC cycles exactly. So TIM1 will be synchronous with the LOCK's CIC CLK, but the STM8 core itself won't be. I presume that'll be enough to get around issues I had with STM8 core stability when using CIC CLK as a external CPU core clock source. This also resolves the annoyance of PAL & NTSC SNES CIC's running at different frequencies.


What this means for the NES CICOp project:
This realization is good news for the NES CICOp project though. Worst case, the NES CICOp can also clock TIM1 with CIC CLK 4Mhz, while allowing the core to operate on 16Mhz HSI. Most of my prior proposed features would still be viable with this setup. However TIM1 is the most advanced timer on chip, it sure would be nice to have available for PWM DAC audio synthesis, or counting a cartridge signal with the "newly discovered" TIM1 clock inputs. In the end I still think it's possible to handle NES CIC timing with TIM4 solely, so TIM1 has ability to add even more features I previously didn't think were possible.

So there are 4 pins (PC3, PC4, PC6, & PC7) which can be used for TIM1 clock sources that I didn't previously realize. That really opens up opportunities for more interesting PPU A12, A13, (or PPU /RD?) counting, or a more exact CPU cycle counter with M2. I can't really think of any other signals on the connector that would be worth counting, chime in if you have other interesting ideas.

Two of those Port C pins used for TIM1 inputs also map to the SPI pins, but dropping SPI bus support isn't really a big loss anyway. It's an I/O hog anyway with it's 4 pins. PC3 can also be mapped to TLI "top level interrupt" which is a NMI for the STM8 core. I'm thinking this would be the perfect use for the mapper interrupt pin. That would allow the mapper interrupt to be non-maskable which is exactly what we're going for. While all I/O's can be used as configurable priority interrupts, there's only one interrupt vector per port (4 ports total on this device). So allowing the mapper interrupt pin to be separable from other GPIO interrupts aids in ensuring that mapper nibble writes aren't missed or delayed. That would leave 3 TIM1 pins available, 2 could be used as input, and the 3rd as an output (2A03 IRQ). That would allow TIM1 clock source to be selectable between two chosen signals at run time.

The real limitation with using TIM1 as a counter for external signals is that TIM1 was also the timer planned to be tasked as a PWM DAC for sound synthesis. Reason being that TIM1 can perform center aligned PWM generation which improves PWM DAC fidelity. But if edge aligned PWM is acceptable, then the PWM DAC could get switched to TIM2 which can only be clocked by 16Mhz.

Perhaps there isn't as much interest in the CICOp synth since it's not compatible on all consoles and requires an external dongle or console modification. On top of that, having TIM1 count external signals is a pretty powerful feature addition. Arguably the TIM1 counter feature outweighs the increased fidelity gained with center aligned PWM. I've yet to get anywhere close enough to measure/compare the difference in fidelity. So with that my plan is to focus TIM1 on counting external signals and TIM2 for PWM DAC. If a specific project greatly values center aligned PWM, and is willing to give up TIM1 counting features then they can make that trade assuming I can build that flexibility into the PCB layout.

In the end I still have to prove my concept of using TIM4 for CIC timing asynchronously. If I'm unable to pull that off, TIM1 will end up getting consumed to handle CIC timing synchronously. That would leave TIM2 & TIM4 available for PWM DAC, and 2A03 timer but hey that's still something!

Phew... Well things are getting pretty complicated here, but overall good news and some progress being made on this project. Part of me wonders if it just might be worth upgrading to the LQFP-32 package to make pin assignments simpler. But have to resist that temptation and do more with less!

_________________
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers


Top
 Profile  
 
PostPosted: Sun Aug 27, 2017 2:01 pm 
Online

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6152
Location: Seattle
infiniteneslives wrote:
Additionally I'm realizing an async SNES CIC is even more of a pain as the PAL CIC runs at 3.57Mhz compared to 3.08Mhz NTSC CIC. So it's 1.12usec per PAL CIC machine cycle, and 1.3usec per NTSC CIC machine cycle.
Hold on a sec. NTSC SNES consoles come in both 4MHz (ceramic resonator, SHVC-CPU-01) and 3MHz (APU 24.576MHz÷8) versions.


Top
 Profile  
 
PostPosted: Sun Aug 27, 2017 2:19 pm 
Offline
User avatar

Joined: Mon Apr 04, 2011 11:49 am
Posts: 1874
Location: WhereverIparkIt, USA
lidnariq wrote:
infiniteneslives wrote:
Additionally I'm realizing an async SNES CIC is even more of a pain as the PAL CIC runs at 3.57Mhz compared to 3.08Mhz NTSC CIC. So it's 1.12usec per PAL CIC machine cycle, and 1.3usec per NTSC CIC machine cycle.
Hold on a sec. NTSC SNES consoles come in both 4MHz (ceramic resonator, SHVC-CPU-01) and 3MHz (APU 24.576MHz÷8) versions.


Oh, Well that's good to know! I'm glad mine happened to have been 3.07Mhz version otherwise I might have gleefully assumed all NES/SNES CIC were 4Mhz. Guess it might not have been an issue as I prob would have stuck with a sync solution with traditional CPU cycle counting. But, I didn't realize the differences between NTSC versions, possible I wouldn't have seen these instability issues with a 4Mhz and gotten burned when shipping to a 3.07Mhz flake like mine...? I'll have to try and hunt down a 4Mhz SHVC-CPU-01 with ceramic resonator for testing. I've got a couple SNES jr's somewhere, but I'm guessing those are 3.07Mhz APU/8 as that sounds cheaper.

All this is even more reason to keep TIM1 CIC CLK cycle counting for the STM8. A purely async solution for SNES would be quite the PITA. Thankfully/Hopefully that doesn't seem to be the case for NES as I'm still hopeful all versions are 4Mhz..? I've only got NTSC and PAL-A to test with.

EDIT: Just pulled out one of my SNES jr's and it's CIC Clock frequency is 3.57Mhz similar to my PAL SNES (apparently German and IDK if it's 1-2chip). So if 1chip SNES CIC's run at 4Mhz, then there are a total of THREE different CIC clock frequencies for NTSC..?

_________________
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 55 posts ]  Go to page Previous  1, 2, 3, 4

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group