Adding features to discrete mapper with multipurposed CIC

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Well it's happened again, get around to writing some code and taking a closer look at the STM8 register settings, and I found a little gem.

TIM2 is the mid grade counter on chip I've dedicated to the PWM DAC. It's only capable of up counting, and thus has no center aligned PWM mode of operation. HOWEVER there is an output mode that toggles the output pin on timer compare match. Taking advantage of this, one can effectively create a center aligned PWM if you're willing to dedicate the CPU resources to update the compare value on each timer overflow. I'm not yet sure I've got the resources to pull this off in my case, we'll see.

To make this work all that needs to be done is invert the DAC value every other output. We can also take advantage of Tepples tip to keep from loosing a bit of resolution, it's best of both worlds with this trick if there's CPU resources to spare.

Here's how I'm thinking this would work:
-Assuming 16Mhz counter clock, 31Khz PWM freq, counter top value of 255 (0xFF), normally give 8bit DAC, but can gain a bit back for 9bits if down doesn't have to equal up count.

Start with PWM output clear. Say the desired output value for this cycle is 100.5

1)First update cycle is "odd" operation and thus going to simulate down counting. By convention let's round down on odd/down cycles, so this half's output is 100. Being an odd cycle we subtract 100 from the top value 255. We place this difference of 155 in the compare register. When reached the output PWM pin toggles high.

2)Next output cycle occurs, this one is "even" normal up counting. Round up this time from 100.5 to 101. The 101 value is simply placed in the compare register. When reached the output PWM pin toggles low.

3) Calculate next DAC output value and go to step 1.

While this idea works, the added interrupt for step 2 will be costly for the STM8. Interrupts cost a whopping 20 cycles (9 pushes, 2 jump isr, 9 pops). This scenario requires two interrupts per 512 clock update cycle. That's 4% CPU resources for the added interrupt entry/exit, prob need another 5-10 cycles to support the added complexity running each isr, so that's another ~4%, for a total CPU resource cost of 8%. Doesn't seem worthwhile at this point anyway when we've got better things to do with that CPU time. So while this whole trick is viable, but I've now sufficiently talked myself out of it...

I sure wish all those excessive register pushes for STM8 ISRs wasn't there. Much more convenient how the 6502 only pushes PC & SR.. Leave it to the ISR to decide what registers are worth preserving! Worst part about it is, the built in pushes aren't any faster than manual pushing; all they're saving is code bytes.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

One little caveat about the "just use a count-up / count-down PWM and alternate direction every reload" — it effectively halves your sample rate (and adds one to the bit depth), producing modulation noise at IRQ frequency/2.

15.5kHz is still high enough that it's likely not an issue, but it's worth keeping in mind.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

lidnariq wrote:One little caveat about the "just use a count-up / count-down PWM and alternate direction every reload" — it effectively halves your sample rate (and adds one to the bit depth), producing modulation noise at IRQ frequency/2.
Yeah my initial thought was to run the synth engine on each count "direction" to keep from changing the sample rate. But that doubles the synth computation load in order to keep the same PWM frequency. With limited timer clock speed, and limited CPU resources, there's no way to get around some sort of trade off when going from fast/edge to center/phase correct. And with your conclusion of giving up 1 bit of resolution not being worth gaining center aligned, I question if center aligned is ever worthwhile in a system operating near it's constraints.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Quick little update, testing out the PWM DAC with a simple middle C square wave. Current PWM DAC is the standard 3.9k 4.7nF (8.7Khz cutoff) low pass filter fed directly to EXP6. Then fed through the 'standard' 47k EXP audio resistor I was able to achieve a volume comparable to one of the 2a03's squares at full volume. Things went according to my plan here with a DAC setting of ~12% the CICOp square was audibly similar to the 2a03 square in volume and timbre. I haven't came up with a means to objectively compare the two, but using my ears alone the CICOp may have sounded a little "warmer" but honestly I could be mistaking that based on slight difference in volume. In any event I'm happy with the preliminary performance of this simple setup!

I took a handful of different scope measurements, still blows me away that the PWM DAC sounds as good as it does despite how it looks! [EDIT: all measurements taken with 10x probe]

Realizing a limitation of running at PWM frequency of 31Khz is the min period resolution of 32usec which I assume will prove troublesome for keeping higher pitched notes in tune.. I may be able to pull off 62Khz, but I'm wondering if this could be made up for by counting fractional steps and then rounding each period. Where the the average of something like 4 cycles would be in tune effectively providing 8usec period resolution. Have a feeling a hack like that has some (audible) drawback but I don't really know..
Attachments
31Khz PWM noise alive and well at output level ~3.1vdc
31Khz PWM noise alive and well at output level ~3.1vdc
Middle C, measured on main board 20k mixing resistor (between 47k exp resistor)
Middle C, measured on main board 20k mixing resistor (between 47k exp resistor)
Middle C, output of PWM DAC
Middle C, output of PWM DAC
Step response from 5v->0v, then ramp to 5v in one step increments.
Step response from 5v->0v, then ramp to 5v in one step increments.
Last edited by infiniteneslives on Tue Oct 10, 2017 9:14 am, edited 1 time in total.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11430
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

infiniteneslives wrote:Realizing a limitation of running at PWM frequency of 31Khz is the min period resolution of 32usec which I assume will prove troublesome for keeping higher pitched notes in tune.
It isn't necessary to have pitches be integer divisors of your sample rate.

It's true that square waves (or anything else with more higher frequency content) will start having audible aliasing artifacts if you just use nearest-neighbor=sample-and-hold resampling, but that can be fixed or worked around in a variety of ways.
I'm wondering if this could be made up for by counting fractional steps and then rounding each period. Where the the average of something like 4 cycles would be in tune effectively providing 8usec period resolution. Have a feeling a hack like that has some (audible) drawback but I don't really know.
That's actually literally how the Namco 163 works. The waveform position there is 8.16 fixed point (and the pitch is 2.16 fixed point). The SNES does something similar (pitch is 2.12 fixed point), but it adds an interpolator ("Gaussian") to reduce aliasing noise (and everything else high frequency, oops)
User avatar
rainwarrior
Posts: 8731
Joined: Sun Jan 22, 2012 12:03 pm
Location: Canada
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by rainwarrior »

Yeah, with accumulator based tones (e.g. N163, FDS, SID) it's not a divider of the clock frequency, and your low pitch precision ends up at low frequencies instead of high.

It's the inverse of the clock divider approach (e.g. 2A03, VRC6, AY).

I think the main difference is that the divider gets a lot more range with less bits in the register, and only has to increment instead of doing a full add. (Also at 31 kHz you'll have audible aliasing, but probably acceptable. No worse than N163 in 4 channel mode.)
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Wanted to take a min and post a bit of an update on where I'm at with this project so far.

Firstly in STM8 news, ST bested themselves with the addition of a soic-8 package version with the STM8S001 (datasheet). It's the same silicon as the STM8S003, but only 5 GPIO. The pinout is rather interesting, it completely cuts out the /RESET pin, and of the 5GPIO many of the pins have multiple GPIO bonded to the same package pin. So there's a decent number of peripherals still available despite the low pin count, but you have to be cautious to not enable more than one driver for a given package pin.

Anyway, the soic-8 package gives up most of it's pins and now beats out the stm8s003 in price, but it's a pretty minor difference. The soic-8 package allowed for easier migration for some of my designs from attiny13 as the footprint didn't have to change. I included it in my latest discrete mapper board design and have started using it in production. None of that really has much to do with this project though. At most with the SOIC-8 package only ~2 i/o pins are left over for added features which isn't enough to do anything on the level of what the CICOprocessor is targetting. I'm using those 2 spare pins for mirroring control with a MUX, and PRG-ROM /WE control to separate mapper writes from PRG-ROM writes without an EXP0 pin which will be extra helpful for a 60pin famicom version I'm hoping to wrap up soon.

I successfully converted my NES CIC design over to the TIM1 counter method of keeping track of CIC cycles as I had already done with SNES. I also successfully combined the CIC RESET pin with KEY DOUT. It turned out that it works better to combine RESET with the KEY's data output instead of the LOCK's output. Reason being is that wire ORing the pins results in a lower Voh of the original CIC in the console. In my experimentation, this reduction is more significant with the SNES CIC than the NES for whatever reason. The reduction in Voh can be enough to get too close to the Vih of the cartridge CIC for comfort. The mcu's CMOS driver has no problem driving close to the full 5v on two wire ORed pins of the console though. The only important inputs for the cartridge CIC are the RESET pulse and stream ID on LOCK DOUT. So those come in on separate mcu pins, but then the mcu drives it's Dout on the combined RESET/KEY_DOUT pin.

So that's all good news for the CICOp's plan to only have 2 GPIO used for CIC RESET, KEY_IN, & KEY_OUT. Next step is to migrate CIC timing from using TIM1 clocked externally from CIC_CLK, to TIM4 being clocked asynchronously by the 16Mhz HSI. Getting my drift calibration factor working will be the biggest challenge with this. But I think I can pull it off leaving TIM1 available for scanline/CPU cycle counting.

The fact the stm8 core is always running off the internal HSI greatly simplifies communicating with it when it's plugged into something like my programmer. I've been able to implement some less impressive features I had planned. Basically the cartridge's circuit board is now serialized via the CIC with some reasonable security. I have the STM8's read out protection enabled so an external programmer can't access flash, eeprom, nor CPU registers. But it does still have access to RAM, which my programmer is able to read out via the SWIM interface. Currently when the STM8 boots up I have it copy any data that may be of interest to the bottom of the stack. So it doesn't really have dedicated RAM allocated to it, and it'll be visible so long as the stack hasn't been heavily used since boot. So I've got a string that's able to be read out by the programmer which includes things like the CIC build name and version, PCB's version, special text where I've placed things like "LizardV1" or "LizardV2" based on the homebrew game and it's build version. For a little extra flare, I've been able to put include a 96bit guaranteed unique ID number. I can't think of a great use for it, but if someone wanted to keep a registry of boards utilized for a limited edition to allow for later verification they could. I'm able to easily come up with a unique ID even though the STM8S001/3 doesn't have "96bit unique chip ID" advertised as a feature like the STM8S103. In the end the stm8s103 is the same silicon as stm8s001/3, so the unique chip ID (fablot, wafer number & X/Y location) info is all there, may as well do something potentially useful with it! :) Beyond that I do copy over a "copyright Infinite NES Lives LLC" string into RAM as well.

Coming up with detection algorithms for all the different boards I've made has always been a daunting task. So that feature is handy for the programmer to have a simple means of determining board & mapper info etc. The copyright message also provides a sizable string to read out and verify all is well with SWIM communications.

One other gotcha that I somehow didn't pick up until recently is that the "True open drain" GPIO pins lack a P driver completely. I had realized they didn't have pull-up resistors, but didn't realize they had no ability to drive the pin high until recently. So that limits what can be done with those pins a little, but shouldn't be too big of issue now that I'm aware of it.

Moving forward lidnariq has got me interested in the greenpak finally. Now that there's a tssop-20 package available which can also be reconfigured I'm starting to think the SLG46824 could make for a nice pairing with the CICOprocessor. I didn't have much planned for the I2C pins on the CICOp so giving it the ability to configure and possibly communicate with the mapper is an interesting thought. I still haven't fully wrapped my head around the greenpak's abilities, nor am I certain of it's costs. Looks like the price may be comparable to amount of logic needed for UNROM512 which means it may be within reach with this project's minimal cost goals. I messaged Dialog for a quote and will probably pick up a devkit to tinker around with soon.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CIC

Post by infiniteneslives »

Realized earlier last month was official 3year anniversary of this thread/idea and thought would be good time to give an update on my recent progress over past couple months. I never forgot about this whole idea, but without adequate motivation it ended up mostly falling off the back burner... My recent progress on a WiFi NES cartridge on my weekly twitch streams have regained my motivation for the project though.

I've completely redone my STM8 CIC code base to better support the CICOprocessor approach. The STM8 preprocesses a buffer of CIC stream data so that CIC calculations have minimal urgent processing demand on the STM8 microcontroller. Currently buffering the data stream & mangle time for the next 8 CIC transfers, which average ~2msec each, that buffer could be quite a bit larger but figure ~1 NTSC frame should be adequate. I am still using the most capable timer counter (TIM1) which is the only one that can be clocked externally (currently CIC 4MHz CLK). I haven't migrated to the lesser TIM4 which can't be clocked externally yet, but I'm pretty confident I can make that happen soon now that I've got this buffered CIC stream solution. After that step, I can start tinkering with a scanline/cycle counter using TIM1 clocked by a connector pin.

This buffered CIC solution also resolves some of the challenges I saw with this project early on in trying to ensure the 6502 can't overload the STM8 to the point where it neglects CIC transfers and causes the console to reset. The STM8 can keep an eye on the buffered CIC data and if it's getting low due to stolen CPU time, it can disable external interrupts ignoring the 6502, and focus STM8 compute resources on refilling the CIC data buffer. From the 6502 side, it the transfer operation shown below will just end up spinning for longer retrying, or maintain a retry counter, timeout, and return back later. IRQs would be another option to let the STM8 signal to the 6502 that it was ready for another (potentially large block) transfer.

Hardware wise, things currently look pretty similar to original idea of a single discrete mapper bit being used as an interrupt for the STM8 to directly sniff & output values on the 6502 data bus. I'm currently using bit 7 of a BNROM style mapper with 32KB of CHR-RAM, but idea of using any discrete mapper register bit is still viable.

The way I have things setup, the STM8 will disable it's external interrupt from the mapper bit when it's ~15usec away from needing to set it's CIC Dout pin. I've kept the STM8 isr for that mapper bit under 15msec so the STM8 will only service the 6502 when it's known to be "safe" and there are no upcoming CIC bit transfers. If the 6502 tries to communicate to the CICOprocessor when it's gpio interrupt is disabled, the 6502 will get an open bus response & know it should retry. I've got a test rom running that performs 1 transfer per NTSC frame, and allows me to measure if there were any failures in the transfer, and how many retries were needed. Mostly for exercise, the STM8 then dumps the data it got from the 6502 to an ESP-01 Wifi module via serial, which then gets sent to a PC 'server' on my network.

I've got things currently optimized as much as I think is possible. My test rom is reporting that 2-3% of the CICOp transfers attempted by the 6502 need a retry because open bus was sensed. Prior to implementing my buffered CIC data stream, the number of times multiple retries were needed (more than 1) was infrequent, but still occured a few frames per min. Sometimes as many as 16 retries were needed.. With my current buffered solution, the STM8 can give CICOp transfers much higher priority, and with ~14hrs of testing now (over 3million transfers) none of the communications required a second retry. There were also zero failed communications during that time. So things are looking pretty good with my current setup.

I've completely redone my 6502 CICOp communication routine. The old one I posted here was overly limited in it's throughput due to being run from ROM. Executing from RAM has pretty significant benefit here, because the CICOp can see the lower nibble of the 6502 data bus *every* cycle when it's listening (during it's transfer interrupt routine). Aside from the mapper bit that interrupts the CICOp to start the transfer, the CICOp isn't truely memory mapped anywhere. The only real requirement is that the 6502 reads from otherwise unmapped memory space, $5000-$5FFF is perfect for this, but in reality, the CICOp has no knowledge of CPU A15-12, nor A8-4. Since the CICOp always sees the lower nibble of the 6502 data bus, it can effectively sense the lower nibble of the opcode the 6502 is running, and for absolute addressing modes, it can effectively sense CPU A11-9, and A3-0.

Here's the current 6502 <-> CICOp communication (data transfer) routine:

Code: Select all

cicop_comm:
	;trigger CICOp interrupt to listen to 6502 data bus by toggling CICOp mapper bit high then low
        ;default is bit clear so bankswitching routines can more easily avoid
        @lda_label_set:
        LDA     #$8F    		;b7-CICOp INTERRUPT(TLI) set, b5&4=0chr bank, b3-0=prg bank
        STA     @lda_label_set+1	;write to the rom value, support bus conflicts if the bank this code came from is present
        ;CICOp is now process of getting interrupted, and will be listening to CPU D3-0 for next ~dozen cycles
        @lda_label_clr:
        LDA     #$0F
        STA     @lda_label_clr+1	;clear CICOp interrupt pin, return to default state

	;3 instructions that actually transfer data between 6502 <-> CICOp
        LDX     $5C03   ;$AE LSB MSB    write 2 nibbles, read 1 nibble
        LDY     $5108   ;$AC LSB MSB    write 2 nibbles, read 1 nibble
        LDA     $550A   ;$AD LSB MSB    write 2 nibbles, read 1 nibble

	;figure out if CICOp was listening, protocol defines CICOp replies with something besides open bus for last LD* instruction
        CMP     cicop_lda_msb	;variable name for MSB of last LD* instruction above (what would get loaded if open bus)
        BEQ     cicop_comm	;retry the transfer
        RTS
        ;return: CICOp reply is in lower nibble of A, X, & Y
In this example, the 6502 sends $C & $3 with the LDX, $1 & $8 with LDY, and $5 & $A with LDA. The CICOp would have to reply with something besides $5 in the final load (value left in Accumulator) so that open bus could be sensed by the 6502 and trigger a retry.

While that code can execute from ROM, it's best executed from RAM. So that the calling routine can load the data it would like to send to the CICOp in the lower nibbles of the LD* address MSB & LSB. The Sequence of LDA/X/Y doesn't really matter, and sending in a different order could be used to communicate more info to the CICOp since it captures the lower nibble of those 3 opcodes. Technically some/all of the LD* could be replaced with ST* allowing more data to be provided to the CICOp. Ultimately, the last instruction will likely have to be a load to allow for verifying that the CICOp was listening to the transfer and the 6502 doesn't need to retry. Changing loads to stores would require getting the CICOp in a different mode though first so it can modify it's interrupt routine to expect to read the data bus for the last cycle instead of driving data on the bus.

So as a result the minimum data the 6502 sends to the CICOp is 6nibbles per transfer, and receives 3 nibbles as a reply. Here's an example of how a calling function running from ROM could look that's modifying the code above in RAM before calling it:

Code: Select all

cicop_transferAXY:
        ;A
        sta     cicop_AL	;don't care about upper nibble of LSB
        lsr
        lsr
        lsr
        lsr
        ora     #$50  ;need to write somewhere unmapped $5000-6000 is perfect
        sta     cicop_AH

        ;X
        txa
        sta     cicop_XL
        lsr
        lsr
        lsr
        lsr
        ora     #$50 
        sta     cicop_XH

        ;Y
        tya
        sta     cicop_YL
        lsr
        lsr
        lsr
        lsr
        ora     #$50 
        sta     cicop_YH
        
        jsr	cicop_comm
That routine isn't exactly quick, but it does pack near maximum data into the single communication. With the current implementation of the CICOp only having visibility of the lower nibble of 6502 data bus, there is benefit in structuring data being sent in nibbles to minimize the swapping/shifting.

Anyway, that's my latest report. Hoping to have more progress to share soon, for the near term I'm focusing on supporting communications between the 6502 & ESP8266 WiFi module, so the CICOp is effectively just acting as a UART peripheral for the 6502. Data rate would be pretty slow, but also thinking it would be interesting to have a microSD card connected to the CICOp. I'm hoping different connectivity options like WiFi & SDcard will aid in development while emulator support is still minimal/non-exisitent. Implementing those items will also prove useful for determining what the maximum data throughput/speeds will be. Not expecting to be amazed, but for truly minimal hardware of a single mapper bit, I expect it to be respectable. Should outpace a controller port interface by a fair margin.

Still have significant hurdle ahead to migrate CIC timing to an asynchronous timer counter still, but I paved the road for this step with my recent CIC data buffering and am pretty confident I can get that to work now. Looking forward to spending more time focusing on some of the desired features of audio synthesis, scanline/cycle counters, IRQs and such. But had to get to this point of baseline communications working first.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CIC

Post by infiniteneslives »

Well before taking next steps, I decided to go ahead and investigate why my current implementation wasn't working on some clones including retronHD & AVS. Couple different things make executing from mainboard SRAM difficult or impossible depending on the system. Just wanted to share my findings & how I decided to side step them in attempts to improve compatability on various systems.

AVS was relatively easy fix, problem is that AVS appears to enable it's 6502 data bus output drivers at the beginning of the CPU cycle while M2 is still low, where original consoles leave databus open/HiZ until M2 goes high. That causes transistion time of all values to move early on the AVS. I was able to correct for it by shifting the STM8's sampling of the databus earlier. But finding a timing that worked consistently well on both the AVS & was difficult. I thought about having the STM8 tune itself to the databus edges on powerup, while make thing work, one could argue perhaps the burden should be on the AVS to be truer to original hardware with firmware updates. In the end, this was minor issue though compared to what I found on some clones including relatively popular clones like retronHD.

I ended up learning that some clones like the retronHD don't output mainboard SRAM values to the cartridge connector. Guess designers figured the cart doesn't need to see that and didn't enable output drivers...

For example the following code executing from SRAM on a retronHD:

Code: Select all

LDA #$FF  
;I didnt' measure, but I presume cpu data bus @ cart connector appears openbus, presenting whatever previously drove the bus

STA $5000 
;retronHD: cpu databus does get driven to $FF on last cycle like would expect (no surprise there)
;	but first 3 cycles are open bus from previous instruction on clones like retronHD
;original console: of course has main board SRAM wired to cartridge connector
;	so first cycle see opcode on data bus, then LSB, then MSB, followed by 6502 driving contents of A

LDA $5000
;retronHD: all 4 cycles appear open bus maintaining value from last cycle the 6502 drove A contents to databus
;	at the cartridge connector, only see 4 cycles of $FF from when A was driven on the bus last cycle
;original system: the value seen @ cartridge connector changes from opcode, LSB, MSB each cycle with open bus on last cycle
I didn't think to check what actually gets loaded into the Accumulator, I presume that retronHD will actually load the LDA MSB ($50) just like originals. The open bus mechanics still working, but only on die, not externally @ the cartridge connector...

So my previous plan to execute from main board SRAM just plain won't work on some clones because they're dumb about nuances of HW like this..

Executing from cartridge memory side steps this issue since of course the value of cart memory gets seen by the CICOprocessor each cycle. My idea could still work executing from SRAM, but the SRAM would have to be on the cartridge as PRG-RAM. This project assumes minimal hardware and no such SRAM currently, so executing from PRG-ROM flash is only viable option. Deciding to move this direction also obsolves most differences about the AVS because the cartridge is only driving the databus when /ROMSEL is low (M2 high). Benefit of executing from cartridge PRG-ROM flash is it's something within our control & makes most consoles behave nearly identically.

Draw back of executing from PRG-ROM flash is can't push as many nibbles in each transfer. But from a 6502 programming perspective, it does simplify things a bit. If want to transmit different opcodes/MSB/LSB to CICOprocessor, need separate CICOprocessor read/write functions for each combination. Would need far too many functions to cover all values of LSB & MSB's of the routine in my previous post. But if think of things more traditionally as registers which we write a single byte to a specific address, it's not too bad. In that light, the only 'cost' of adding more registers to the CICOp is a couple dozen bytes of PRG-ROM it takes to read/write to them. So I'll probably adopt the term registers, and they'll look and feel like addressable mapper registers we all know and love, but actually it's a bit of a lie in terms of how the hardware actually works.

I also needed to tackle problem of sending data both directions since can't rely on LDA/X/Y address to be the arbitrary data sent to the CICOp, the data needs to come from A/X/Y registers of STA/X/Y. Best way I could think of to do this while keeping CICOp transfer routine as short as possible is to have 2 separate read & write functions. But the the CICOp needs a way to know if this transfer is a read or write, best way I could think of is to using the ordering of X/Y opcodes to indicate read/write transfer.

Anyway, I've rewrote my STM8 code to support these new requirements, it takes quite awhile to get the timing and everything exactly right especially with a 3 stage cpu pipeline, hopefully this method will just work well enough and I won't have to change things yet again..

example of a CICOp "register" write routine:

Code: Select all

;input args= byte that would like to write is split between lower nibble of X & Y registers on entry
cicop_reg_wr_byte:
        @lda_label_set:
        LDA     #$8F
        STA     @lda_label_set+1
        @lda_label_clr:
        LDA     #$0F
        STA     @lda_label_clr+1	;clear CICOp interrupt pin, return to default state

	;3 instructions that actually transfer data between 6502 <-> CICOp
	;X opcode first: indicates to CICOp that 6502 is writting to a register
        STX     $5C03   ;$8E LSB MSB    write lower nibble of X to register
        STY     $5108   ;$8C LSB MSB    write lower nibble of Y to register
        LDA     $550A   ;$AD LSB MSB    read 1 nibble status (indicates if CICOp was listening & other info)

	;figure out if CICOp was listening, protocol defines CICOp replies with something besides open bus for last LD* instruction
        CMP     cicop_lda_msb	;variable name for MSB of last LD* instruction above (what would get loaded if open bus)
        BEQ     cicop_reg_wr	;retry the transfer
        RTS
;return: CICOp status is in lower nibble of A
similarly, here's an example of a CICOp "register" read routine:

Code: Select all

;input args= none
cicop_reg_rd_byte:
        @lda_label_set:
        LDA     #$8F
        STA     @lda_label_set+1
        @lda_label_clr:
        LDA     #$0F
        STA     @lda_label_clr+1	;clear CICOp interrupt pin, return to default state

	;3 instructions that actually transfer data between 6502 <-> CICOp
	;Y opcode first: indicates to CICOp that 6502 is reading from a register
        LDY     $5C03   ;$AC LSB MSB    half of register byte value in lower nibble of Y
        LDX     $5108   ;$AE LSB MSB     other half of register byte value in lower nibble of X
        LDA     $550A   ;$AD LSB MSB    read 1 nibble status (indicates if CICOp was listening & other info)

	;figure out if CICOp was listening, protocol defines CICOp replies with something besides open bus for last LD* instruction
        CMP     cicop_lda_msb	;variable name for MSB of last LD* instruction above (what would get loaded if open bus)
        BEQ     cicop_reg_wr	;retry the transfer
        RTS
;return: CICOp register value is split between lower nibbles of X & Y, CICOp status is in lower nibble of A
The CICOp captures lower nibble of opcode, address LSB, MSB, & X/Y registers. Since running from PRG-ROM, can only really change the written byte in X & Y lower nibbles. Would have to provide a different write routine to write to a different 'register'. What this specific register is number is a bit arbitrary. But we're effectively providing the CICOp with 6 nibbles of "register address" with STX/Y,LDA address MSB & LSB ($C3185A in the examples above). Don't expect to be able to really make use of this "24bit addressing" when executing from PRG-ROM because would need 16 million different functions to address them all. Probably just reserve them for use on consoles which can execute from mainboard RAM, or a NES cartridge with PRG-RAM. In the end, if with this setup, adding more "registers" is cheap considering it only "costs" 25Bytes of PRG-ROM. Depending on how I define things, we'll be able to do quite a bit with a single "CICOp register" which we can only read/write 1 byte per call to these 2 routines. Expecting there will be benefit of adding more registers (with new copies of read/write routines written with different addresses) to create special modes of operation, or maybe access completely different functions of the CICOp. Perhaps one set of registers for audio synthesis, another for WiFi/UART communication, yet another for scanline counter updating, etc.. So when it makes sense from the 6502 code perspective, I'll add more registers, expecting won't really need them all to be "readable" so a single "write to audio register" routine would suffice.

This approach is about as bullet proof as I can make it in terms of compatibility. Going to update my NES test rom & ESP code to log failures/error and run some a series of overnight testing on all the consoles I own to prove out the STM8 code, then I'll start taking next step.

One last aside, I did have one ultra cheapo famiclone I purchased from aliexpress a couple years back. Thing is pretty close to garbage, so I don't much care about it. But I am curious about how things behave on all the consoles I own. Turned out even this approach here didn't work. My test rom running the read routine was always returning with $40 in all regs A/X/Y. Open bus value for failed reads would be the address MSB. So it seems the console won't actually read from the cartridge for the $5000-5FFF address range. It's one of those consoles that comes with a single flash chip cartridge with 300 games included that probably runs off CHR-RAM in the console.? Maybe they put some registers in that space and won't read from the cartridge..? I really don't know what's going on here, but it's interesting.. In the end, the CICOp can't actually see the upper nibble of the address, only actually using $5000-5FFF because it should be open bus. I'm pretty sure this console works with game with PRG-RAM in $6000-7FFF, so an easy fix improve compatibility on even the junkiest of consoles would be to "move" these CICOp registers to $6000-7FFF. In the end it's arbitrary, but I was planning to reserve that space for an improve version that included PRG-RAM. I haven't tested that yet, I do have a couple other clone consoles which aren't working with differing responses. I haven't dug into the source of problems there, but from what I'm seeing I don't expect it to be related to the definition of these transfer routines I'm not sure I even really care that much about getting working on them because they're so cheap.. Most of my consoles are working with this setup, so it's good enough to move forward with more interesting things..

[EDIT: after writting this I was curious to try and see if the el-cheapo aliexpress clone that has 500 games built into it without inserting any cartridge works when "moving" registers to $6000-7FFF and it does. So seems to confirm some super cheap clones don't allow the cartridge to map to the $5000-5FFF address space. Thinking I'll just use $6000-6FFF for now in the name of compatibility since it doesn't cost anything and don't expect to have PRG-RAM. A more capable cart that doesn't care about maximum compatibility could ditch this and move CICOp registers back to $5000, it really doesn't change the CICOp at all since it can't see the upper nibble of address MSB & LSB. The 6502 code just has to read/write somewhere that's open bus.]

[EDIT 2: curiosity got the best of me after figuring out the problem on that one clone, so I decided to investigate the 3 others I was having issues with. 2 of them had same exact problem/indication: The CICOp just plain wasn't responding, took me a bit to figure out why as I couldn't even get debug signals out of the STM8. Turns out those 2 clones just happens to ground EXP5 (apparently not others though based on DMM testing while dead). That just happens to be the pin I chose to utilize for the STM8 reset pin (which I don't even really need). So the STM8 was held in reset the whole time. Broke that connection and it fixed both those clones. The 3rd clone I was having issues with was a battery powered handheld. The CICOp was responding as expected, but the ESP8266 I've got connected for 'debug print' statements didn't seem to be powering up. I suspected it was power issue & confirmed. It only generates 3.6v on Vcc (might be better if I wasn't using rechargeable batteries). After the 3v regulator for the ESP I was only getting 2.9v no load, and 2.4v when trying to run the ESP. Wasn't quite enough to get the ESP to boot as it expects 3.3v, probably has brown out or is failing because overclocking to 180Mhz..? Anyway, bypassing the regulator and simply powering the ESP from 3.6v Vcc fixed everything. So that was more of an ESP/WiFi issue than CICOp, but I already knew that beforehand..

So with that, I've got every NTSC NES/famicom console I own working with CICOprocessor. That's 4 handhelds, 3x cheapo aliexpress clones, retronHD, AVS, front & top loader NTSC NES, orig fami, twin fami, & AV fami. Haven't written PAL timing version yet, but expecting that won't be much trouble due to slower CPU clock which I'll test on my PAL front loader & AVS in PAL mode when time comes. Going to call it good after running long term tests and start working on the fun stuff...]
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
Post Reply