Adding features to discrete mapper with multipurposed CIC

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Made a little more progress!

Mostly have pinout nailed down for my first board version. Came to realization that I really just need two STM8 footprints on the board for now. Reason being that for my standard discrete 'non-CICOp' boards the goal is to have the STM8 increase user friendliness with fewer jumpers for things like PRG-ROM and CHR-RAM/ROM size trimming. Where the CICOp demands pins like CPU D0-3, and signals useful for all the hoped features. So it looks like I've got enough room for the two separate footprints with their varying pinouts. One perk of that is that it creates a bit of a backup plan if things go south with implementation of async CIC timing using TIM4 alone. Could actually have two STM8 on board, one for acting as a CIC, and the other as a dedicated co-processor. Certainly doesn't meet the goal of minimal hardware if it actually comes down to that last resort. But considering the STM8 is one of the lowest cost mcus on the market having a second one if it's actually getting well utilized isn't that crazy and much cheaper than a CPLD.

So here's the planned pinout:
PA1 & PA2: CIC Din, Dout, and wire ORed CIC Reset

PA3: CPU R/W

PB5: PRG-ROM /WE pin to support flash writes without '139 logic gate on UxROM (jumper makes optional for alt func)
PB4 & PB5 alternate function: I2C bus available pinned out to female header. Could support RTC, etc.

PC3 TLI: this is an external NMI pin for the STM8, using this for the mapper bit so any other PORTC pins can trigger interrupts with a separate isr. Using TLI gives little extra insurace that CICOp register read/writes have highest priority.

PC4 & PC5: TIM1 & TIM2 output channels. Came up with nifty way to arrange four SMT resistor pads in a square with the mcu pins in opposite corners. IRQ and PWM DAC signals are in other two corners. So placing the resistors horizontally will map TIM1 to 6502 /IRQ (support scanline/CPU cycle counting), and TIM2 to PWM DAC (edge aligned PWM). Mounting resistors vertically maps TIM1 to PWM DAC (center aligned), and TIM2 to /IRQ (async timer only). I was about to give up on option for center aligned PWM DAC till I realized this trick and 0ohm resistor for /IRQ kept routing simple.

PC6: TIM1_CH2 6502 m2 clock for cycle counting with TIM1
PC7: TIM1_CH1 PPU clock, jumper selectable between (default) PPU A12 and (alt) PPU /RD

PD1-4: CPU D0-3

PD5: H/V mirroring control with small MUX gate
PD6: Debug output trying to fit in an LED if I can
PD5 & PD6: can be dual purposed as they're routed to female header for UART to support low cost BT/WiFi modules to be added on.

So in the end giving up on the SPI bus in favor of a scanline/CPU cycle counters gave some breathing room for the pinout. In the end I'm not even sure if I need CPU R/W but not much benefit to leave it out as there's already pins to spare.

Current layout supports CICOp on a decent variety of discrete mappers. Really the only thing that's needed is a spare mapper bit to interrupt the STM8 for CICOp register access. Planning support for UxROM w/512KB PRG-ROM or less, BxROM with 512KB PRG-ROM or less, CNROM with 64KB CHR-ROM or less, and Colordreams with 256KB PRG-ROM or less (or 64KB CHR-ROM or less).

Haven't even started laying traces for the board yet, but current rat nest and component density looks manageable.

As with most things like this I typically realize dumb i/o assignment choices when I actually start writing firmware for the design. So I whipped up a little prototype board with the STM8 on a breakout board. Wrote a little NES test rom and the CICOp register access isr and have successfully transferred data between the 6502 and CICOprocessor!

Realizing some flaws and limitations to my original proposal:

Code: Select all

;now that everything's prepared, perform the mapper write:
STA $8000   ;write to discrete mapper with bit 7 set
STY $5x0x   ;write lower nibble to mcu
STX $5x0x   ;write upper nibble to mcu
;write complete, now read back the old value that was in the mcu register
LDY $5x0x   ;read old value from mcu register (lower nibble)
LDX $5x0x   ;read old value from mcu register (upper nibble)
AND #$7F    ;clear bit 7 so we can disable mcu's interrupt
STA $8000   ;write to discrete mapper with bit 7 clear, CIC mcu interrupt complete

26 cycles of timing sensitive code total
Firstly, the STY $5x0x and STX $5x0x opcodes aren't the best choice because they don't make the CICOp register being accessed variable unless the routine is self modifying.

Secondly it's limiting because we're running out of registers in this routine. I'm trying to keep it as short as possible, and using A to maintain the current bank is a bit of a waste of instruction time and 6502 registers.

Lastly I has assumed the mapper wasn't subject to bus conflicts. But ensuring that may require extra hardware on board, so coming up with a bus conflict compatible routine is helpful. I got around this by simply requiring that a specific bank is always active during this routine, so these values can now be hard coded and align with a bank table.

Remember the CICOp doesn't decode CPU addresses, it's merely snooping the CPU bus during opcode fetching. So really for the 6502 to give data to the CICOp we only need an instruction that presents info on the CPU data bus at some point. Something like STA $5000, X with the register offset in X doesn't work as X is never present on the CPU data bus. Looking at other options I landed on ZP addressing with STA (ZP), Y. This works because the 6502 fetches the ZP bytes from sram so we can sniff their value to glean which CICOp register is being accessed. And while it appears annoying that Y gets consumed/zeroed, it's value is actually moot. So Y can be any value as the CICOp can't even see it.

Having learned a bit more about STM8 interrupts, I also realized that the TLI interrupt is edge sensitive, not level sensitive. So the priority I had to clear $8000.7 mapper bit ASAP is of no value. We can simply clear, then set the TLI mapper bit to create the needed rising/falling edge for the interrupt.

So with all that, this is what I came up with:

Code: Select all

       ;cicop_reg is a ZP pointer that needs to be initialized to $5x0x
       ; where the 'x' values denote the CICOp register number being accessed
       ; the '5' and '0' in $5x0x is actually "don't care" as the CICOp can't see it their values.  
       ; The address in cicop_reg just needs to be an empty address space that won't conflict with anything else.
       ;Lower nibbles of A and X must contain the value (byte) that is desired to be written to the CICOp register

        ldy     #CICOP_BANK_DIS
        sty     CICOP_ADDR_DIS  ;8C 00 C0
        ldy     #CICOP_BANK_EN  ;A0 80

        ;trigger CICOp to start transfer operation:
        sty     CICOP_ADDR_EN
        ;8C 80 C0

        ;first allow CICOp to sniff reg number, and write low nibble of data
        sta     (cicop_reg), y          ;y doesn't actually matter CICOp can't see it
        ;91 04

        ;now the CICOp has register H:L from ZP sniffing, and data L from A lower nibble

        ;write upper nibble, contained in lower nibble of X
        stx     cicop_reg               ;doesn't actually matter what ZP byte is written to
        ;86 04
        ;cicop_reg gets stomped could use different ZP byte to avoid this

        ;now the CICOp has data H from sniffing store of X to ZP

        ;now time to read data from CICOp
        ;the data is returned in specific order,
        ;the CICOp doesn't know what register (X,Y,A) the 6502 is loading into

        ldx     CICOP_PORT      ;data L
        ;AE 00 50
        ldy     CICOP_PORT      ;data H
        ;AC 00 50
        lda     CICOP_PORT      ;data E (error/verification)
        ;AD 00 50

;END timing sensitive code.  The CICOp has now freed itself to go back to whatever it was doing
;don't actually have to clear mapper enable bit.  CICOp won't trigger until next rising edge.

25 cycles total of timing sensitive code.
So by getting a little tricky with choice of instructions I was able to reduce the number of timing sensitive cycles by 1 cycle, while also transferring 1 more nibble of 6502 read data! Granted that cycle count doesn't include all the preparation to get all the data loaded into ZP, A, & X, and also verify the transfer was successful. But those portions can be tailored/optimized by the user if desired. Would be reasonable to have separate read/write routines on the NES. Also possible some CICOp registers could be defined to have fewer read nibbles, and the STM8 would simply bail from the isr once write data was received depending on the register number. Or perhaps some registers are read/write but the actual value doesn't matter in the case of an something like an IRQ acknowledge/clear register for example.

I feel a lot more confident about the robustness of this routine now. I'm also a bigger fan of always performing both read and write to CICOp. If the register is defined as a write only register, then the last 3 LoaD instructions can actually be used for verification on the 6502 side to ensure that the transaction was successful. The CICOp can repeat back the exact data byte that it heard from the 6502. My thought for now is the 3rd LoaD could be an xor of the two nibbles of the register number. That would help verify that it also sniffed the register number correctly. But in the end this data could be defined as whatever we'd like later on. Appear to have a pretty rock solid transfer routine between the 6502 and CICOp for both reads and writes which is good enough for now!

Implementing things on the STM8 side ended up working pretty well. In the end I don't even actually poll CPU R/W level to try and remove STM8's 5 cycle isr latency variation. It didn't actually help as polling creates it's own jitter. And in practice, the STM8 isr latency variation isn't that bad. In practice it's only 3 cycles (187nsec).

Reason for that is the 1-6 cycles advertised in STM8 programming manual for time needed to complete current instruction assumes worst case. It assumes the STM8 may be executing FAR instructions, and this core doesn't even have any FAR address space. So that cuts us down to 1-5 cycles. However there are really only 3 instructions that are 5 cycles, and we can easily avoid them. They are CALL subroutine with indirect pointer, and LoaD/STore Word with indirect pointer. These 3 instruction are pretty easily avoided as immediate addressing is typically all that's needed for those 3 instructions. So that reduces number of cycles needed to complete current STM8 instruction to 1-4 cycles. Which is only 3 cycles of jitter. In practice I verified this with dozens of logic analyzer captures, so it all checks out.

One idea I had to remove jitter would be to set an STM8 GPIO interrupt on the CPU R/W pin, and the first step of the TLI isr would be to enable the CPU R/W interrupt and then WFI wait for interrupt. This would remove most of the jitter, but I fear this would suffer from incompatibility with various console versions. The edge timing of CPU R/W in relation to M2 could easily vary on clones, AVS, etc. And this current setup bases all it's timing off the mapper register bit setting which should be pretty reliable. My only real concern for timing is drift of the STM8 internal oscillator, may have to trim the oscillator if temperature variation becomes an issue.

For the 6502 being as slow as it is, 187nsec of isr jitter is managable. With properly timed and cycle counted STM8 code I was able to reliably capture and present all necessary data for the CICOp data transfer routine I just lined out above. My current tests are a simple on time check at boot/reset that's printed to the screen. Planning on running some automated tests to really exercise the routines. But early tests look good on original NTSC front loader and a portable clone I keep handy. A separate routine will be needed for PAL support, but there is enough free time early in the isr to branch between separate PAL/NTSC routines with their own fixed cycle counted timing. NTSC is worst case (faster), so I'm not too concerned about PAL.

One idea I had was for the STM8 to verify that CPU R/W was low when it should be for added robustness. One concern I realized when writing this is that extreme caution needs to be exercised to ensure the NES doesn't get interrupted in the middle of this routine. If it did, the STM8 will blindly output data onto the bus when it comes time for the last load instructions. If the 6502 isn't executing this code because it was interrupted that could easily cause a CPU crash. The only real way to guarantee that is for NMI's to be turned off during this routine along with disabling interrupts. That's probably the best call for early NES development using the CICOp until one is certain that an NMI won't occur mid transfer.

I still need to draft up an async NES CIC implementation using TIM4 alone for CIC timing so we can dedicate TIM1 to scanline/CPU cycle counting or center aligned PWM DAC. But there's quite a bit of dead/nop time in my current isr to allow for CIC transfers in the middle of the isr. So I've got a clearer outlook on my timing constraints that the async CIC will have to cater to. That part is still a decent challenge and will require lots of testing, but with successful 6502-CICOp read & write data transfers under my belt I'm pretty confident.

Now that I've got a good means to communicate between the 6502 and CICOp it's time for the fun stuff. Next up is trying out some audio synthesis and PWM DAC experiments, along with scanline/CPU cycle counting tests!
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

infiniteneslives wrote:

Code: Select all

   sty   CICOP_ADDR_EN   ; 8c 80 c0 80 <--data bus contents
   sta   (cicop_reg), y  ; 91 04 ll hh xx xx
   stx   cicop_reg       ; 86 04 ee
   ldx   CICOP_PORT      ; ae xx xx LL
   ldy   CICOP_PORT      ; ac xx xx HH
   lda   CICOP_PORT      ; ad xx xx EE
To make sure I understand:

The CICOp literally waits for the rising edge of the latch, then waits the exact number of ns for the "ll", "hh", and "ee" bytes to come along on the data bus, optionally (probably) waits the exact number of ns for the subsequent "LL", "HH", and "EE" bytes to come along, and drives the data bus at those times?

Seems ridiculous, clever, and fragile, but also like you've got a good handle on it.
infiniteneslives wrote:IRQ and PWM DAC signals are in other two corners. So placing the resistors horizontally will map TIM1 to 6502 /IRQ (support scanline/CPU cycle counting), and TIM2 to PWM DAC (edge aligned PWM). Mounting resistors vertically maps TIM1 to PWM DAC (center aligned), and TIM2 to /IRQ (async timer only). I was about to give up on option for center aligned PWM DAC till I realized this trick and 0ohm resistor for /IRQ kept routing simple.
Does center-aligned PWM get you better bit depth? I can't imagine there'd be an audible difference otherwise...
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

The CICOp literally waits for the rising edge of the latch, then waits the exact number of ns for the "ll", "hh", and "ee" bytes to come along on the data bus, optionally (probably) waits the exact number of ns for the subsequent "LL", "HH", and "EE" bytes to come along, and drives the data bus at those times?
That's correct. Although to be clear/pedantic the CICOp receives a non-maskable interrupt when the mapper register bit ($8000.7 for example) toggles from clear to set, it's not waiting/polling for the register bit. Additionally the CICOp only has connection to 6502 CPU D0-3, so it's waiting for "lower nibbles" to come along on the data bus, not "bytes".
Seems ridiculous, clever, and fragile, but also like you've got a good handle on it.
Haha thanks! Yeah as it turns out the STM8's timing for reading data off the 6502 bus is tighter because we have to wait until data is valid. Outputting data on the 6502 bus has significantly more slack time since it can be output early without concern. So 6502 STores have tighter timing contstraints than LoaDs from the CICOp perspective. That ends up working in our benefit as the STA/STX naturally have to come before LDA/LDX/LDY. For that reason it's less sensitive to STM8/CICOp's oscillator drift adding up to timing error towards the end of the data transfer during LDA/LDX/LDY time.

The biggest fragility to this setup in my mind is ensuring that the 6502 doesn't get interrupted in the middle of this data transfer routine. There is a room for a little more protection from this. Ensuring that CPU R/W is low when it should be during STA/STX helps the CICOp verify the 6502 is performing the expected routine. I also realized that it wouldn't be too hard to verify the lower nibble when the LDA/X/Y CICOP_PORT opcodes and operands are getting fetched by the 6502. Doing that would help assure that the 6502 didn't get interrupted mid routine and the CICOp is mostly safe to output data on the bus in the cycles that follow. I'm not sure if all that is necessarily worth the added complexity/bytes from the STM8 isr side or not, but it's an idea I'll have to keep in my back pocket if it becomes a problem. In the end I don't feel it's too much to ask of the NES code to ensure it doesn't get interrupted mid transfer.

Really it depends on how much access to the CICOp registers is needed. In reality most features only require the NES to write to the CICOp registers, so cutting out or reducing the LoaDs at the end of the transfer routine is an option. Once the CICOp register transfer routines are time tested and proven, perhaps it makes sense to completely remove the LoaD read back portion. They really aren't necessary for features like audio synthesis, switchable mirroring or scanline/CPU counters. The ability to read CICOp registers is really only needed for features like access to save eeprom, or external peripheral ports. Having the rule that rendering needs to be turned off for access to save eeprom shouldn't be a problem IMO. This mostly becomes a concern if looking to actually task the CICOp with digital processing for things like multiplication, division, or other math functions. I don't see data processing as the CICOp's prime feature set, but it's certainly capable of it if handled appropriately and the register interface didn't diminish returns too greatly.

Of course the real fix for fragility would be to add dedicated hardware/glue logic between 6502 and CICOp, but that detracts from the goal of minimal added hardware. Once that leap is made, I would also question if the STM8S003F3 is the best choice. Spending dollars on logic makes desires to upgrade the mcu to a bigger STM8, or a STM32 becomes tempting. So for now it helps to limit myself to the bare minimum and effectively no added parts to the board. I'm enjoying pushing this little STM8 to it's limit and seeing what it's capable of tricks and all. If nothing else I'm learning a lot in the process and will better appreciate more capable hardware in future projects.
Does center-aligned PWM get you better bit depth? I can't imagine there'd be an audible difference otherwise...
Great question, I do not yet actually have any physical experience with PWM DACs. I've only recently learned about them and considered them viable. Most of my research has been reading up on what looks to be great info on open music labs. That article goes into pretty good detail of everything while also helping to realize the practical implications of PWM DAC design choices. Note that in the article they refer to center-aligned PWM as "phase correct PWM", and edge aligned PWM as "fast PWM". Here's the best answer I have for you quoting from that article:
openmusiclabs wrote:Finally, once you have your topology and frequency selected, you can see what the bit depth trade-off is for using Fast PWM or Phase Correct PWM. Unless the signals you are generating are going to be very low frequency, it is almost always better to sacrifice a bit or two of resolution for the reduced distortion that Phase Correct PWM gives you.
Considering that we're only aspiring for "retro 8bit" audio synthesis, and not CD quality audio, I'm not sure how much the distortion will be a problem. Any distortion may simply become the CICOp's quirk and it's own characteristic sound, as the goal is not to replicate some other synth's sound perfectly. My initial goal is to mimic VRC6 audio as it's one of the simpler synths, and has a 6bit linear DAC. Getting close to VRC6 capability and resolution seems fairly achievable and a good starting point.

Lots of good info in openmusiclab's sketch as well including a helpful table for PWM DAC frequency, bit resolution, etc. Conveniently their table is at the same 16Mhz the CICOp will be operating at. My current plan/goal is to run at a PWM frequency of 31.3Khz which equates to the synth engine running every 512 STM8 instruction cycles. This equates to a 9bit edge aligned, and 8bit center aligned PWM DAC. From a STM8 CPU utilization standpoint I have a hard time expecting that the multitasked STM8 can handle more that that. But I also have not yet implemented the synth to get a good sense of how many calculations will actually need to be performed for each update. I expect the compute resources will ultimately be the limit of how many channels, or what forms each channel can take. While I may be able to create multiple different shapes and voice types, there will likely be limits as to how many can be active at one time. Also expecting square channels to be less compute intensive than triangle/saw, may help to limit volume range/step size as well. I'll be posting all the synthesis code publicly when I start digging into that, expecting there will be lots to discuss..

The other limit that I may start getting close to is memory resources of the STM8. This low end version has 1KByte of SRAM, 8KByte of program flash, and 128-640Bytes of eeprom. My current CIC implementation is around 1KB, there's some room for optimization, but I'm also expecting my async solution using TIM4 to require more code to support frequency drift correction. My current CICOp register transfer TLI isr is ~600Bytes, but I haven't made any effort to reduce it's size as it's pretty chocked full of NOPs. Adding support for PAL could easily double that code size. I'm expecting to be able to keep the total register transfer isr under well under 1KB though. So should have at least 4-5KB of flash to support CICOp features which seems like a reasonable budget. But big things like look up tables for audio synthesis could eat up large chunks of our flash budget if sights get set too high..
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

openmusiclabs wrote:Finally, once you have your topology and frequency selected, you can see what the bit depth trade-off is for using Fast PWM or Phase Correct PWM. Unless the signals you are generating are going to be very low frequency, it is almost always better to sacrifice a bit or two of resolution for the reduced distortion that Phase Correct PWM gives you.
They explain in more detail on this page, but I'm ... not convinced of their math.

Humans can't usually hear phase; we can hear amplitude, and we can beat frequencies due to differences in phase, and we can hear group delay... but the amount of group delay we're talking about here is less than one sample period. And most of the error component that they identify in their math seems to be the portion that's above the source sample rate, and hopefully inaudible anyway. And the more I play with the math, the more it looks to me like the differences in phase are actually due to using a higher effective sample rate (and commensurate lower bit depth) than due to the symmetry of the waveform.

If I try generating a 5-bit PWM sine wave with 32 underlying samples (1024 total record rate) using 'fast' PWM, and a 4-bit PWM sine wave with 64 underlying samples (i.e. still 1024) using centered PWM, I actually do get a graph that's similar to their figures 6 and 7: DC components is the same, fundamental is the same, more energy at the 2nd and 3rd harmonics is present in 'fast' PWM and more energy is present at every higher harmonic in 'centered' PWM. (This corresponds to "Ratio of Signal Frequency to PWM frequency" of 0.03)

So, it looks like they're right.

On the other hand, if I manually load those 1024-sample loops that I generated and listened to them? I find the harmonic distortion of the centered PWM to be more grating. Here's two I made. They're set to a sample rate of 96kHz so that 96kHz÷1024 = something audible; be careful that your hardware actually supports this or that you're using a good-enough resampler to be assured that the sound differences are due to the method rather than the resampling.
center-and-fast-pwm-wavs.zip
f_sample = 96kHz
f_PWM = 3kHz
f_sine = 94Hz
(759 Bytes) Downloaded 253 times
big things like look up tables for audio synthesis could eat up large chunks of our flash budget if sights get set too high..
You might be able to do (edit: hooked on phonics worked for me) something like the OPL2/OPL3 and just use a logsin and exp table...
Last edited by lidnariq on Thu Sep 28, 2017 9:46 pm, edited 1 time in total.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Thanks for taking the time to break things down to something I can better understand. I saw that page, but that type of analog analysis is certainly not my strong suit. Not a whole lot of motivation to take the time to understand the math as I'm not even sure I can make heads or tales from how everything translates to audio quality I can make sense of in the end anyways..

Interesting to hear you find the distortion of fast/edge PWM less grating to the ear. I have no clue what my PC hardware is capable of, probably safe to say it's not capable. I can hear a difference, but have a hard time deciding what's better/worse on the ears to be honest.

Any input, criticisms, or suggestions on the audio synthesis is more than welcome. I'll do my best to explain and share what I'm doing as things move along in hopes that anyone who's interested can contribute or criticize my especially noobish judgements. I'm mostly learning as I go when it comes to the audio/analog arena.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by tepples »

You don't have to lose a bit of precision by using symmetric PWM.

Say you have a sample period of 32 cycles, and the signal level is 13 out of 32. With single-ended PWM, the signal would be low 19 cycles and high 13. With symmetric PWM, it'd be low 10, high 13, and low 9. This gives a maximum phase variation of one-half cycle and mostly decorrelated (only with the level's least significant bit), which is an improvement over single-ended PWM's one-half sample period of variation correlated with the level.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

tepples wrote:You don't have to lose a bit of precision by using symmetric PWM.
So I'm not sure I follow what you're saying exactly.. Practically speaking to do what you're saying it would mean that the output compare value of the PWM channel would need to be adjusted at the top and bottom of the counter, instead of just a the bottom. So your suggestion would be to adjust the output compare value to something like 100 on the way up, and then 101 on the way down which would average to a pulse width of equivalent to if the PWM output compare had been set to 100.5 effectively gaining a bit of resolution back.?

I can't think of a practical limitation of why one couldn't do that, unless the timer hardware on the selected mcu couldn't produce an update interrupt at both the top and bottom of the counter. That said, I'm pretty sure the STM8 does permit you do do something like that. The only real drawback would be the added complexity/length (however minor) for the isr being executed every counter update event. Every little bit of execution time that can be reduced for that isr running every ~1k instruction cycles the better. 10 extra cycles equates another 1% of CPU utilization in this proposed case.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

Depends on the hardware implementation, unfortunately.



I couldn't leave well enough alone, so I played around with both PWM implementations using PureData for a bit. After trying and failing to get anything more sophistated than its built-in lowpass filter object (lop~) to work (no biquad~, no lop8_butt~) ... my conclusion:

(Implementation: 12.88MHz bit rate; 30720Hz PWM rate; sine waves from 100Hz to 1kHz)

Both kinds of PWM have comparable amounts of THD (-65dB(fast) vs -66dB(sym) @ f_sine/f_PWM=1/300; -57dB(fast) vs -60dB(sym) @ f_sine/f_PWM=1/100; -39.5 dB(both) @ f_sine/f_PWM=1/30) in the audible band. I'm hard pressed to support their assertion that symmetric PWM is more valuable than an extra bit of depth; the math and sim for real-world values I saw don't support that kind of unequivocal statement. Each loss of bit depth reduces your noise floor by 6dB; none of my tests have shown symmetric PWM enough better to warrant it.

"Symmetric" PWM does seem to put more energy at higher overtones than "fast" PWM, and which is more objectionable is a matter of personal taste and context.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by tepples »

infiniteneslives wrote:Practically speaking to do what you're saying it would mean that the output compare value of the PWM channel would need to be adjusted at the top and bottom of the counter, instead of just a the bottom. So your suggestion would be to adjust the output compare value to something like 100 on the way up, and then 101 on the way down which would average to a pulse width of equivalent to if the PWM output compare had been set to 100.5 effectively gaining a bit of resolution back.?
Correct. I thought symmetric PWM was always defined this way, with the actual comparison being against a triangle wave that takes on a distinct value for all steps, such as 0, 2, 4, 6, 8, ..., 124, 126, 127, 125, 123, ..., 5, 3, 1.
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

Again that depends on the hardware :/

Specifically in the case of the ATmega32's PWM hardware, the symmetric PWM mode always counts like the NES triangle: 0,1,2,3,...3,2,1,0,&c.

You are able to get interrupts right after each direction change, though, so it should still be possible to get higher bandwidth.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

That and you actually need a up-down timer that supports center aligned PWM at your disposal. In my specific case with the STM8s003, only TIM1 is capable of up-down counting. TIM1 is also the only counter available to be clocked externally, making it the only counter available for scanline/CPU cycle counting. I was initially disappointed to give up center aligned PWM capability when all I was going off of was openmusiclabs. But with lidnariq's analysis and input on the matter I really don't feel this way anymore.

I am still curious to experiment and compare center and edge aligned PWM DACs to try and see how they actually compare in practice. Part of the thing is depending on the synth, there isn't necessarily much one can do with more bits of resolution. If mimicking or designing something similar to the VRC6, it only has 6bits of resolution by definition. Short of adding more channels, having a 9bit vs 6bit DAC doesn't allow for any improvement. However, there would stand room for benefit by using a center aligned 6bit vs an edge aligned 6bit PWM DAC.

The benefit to be had with higher resolution DACs for sinusoidal voices of samples is there, but at this point I'm not even certain that's within the STM8's abilities... We'll see though, all the knowledge and experience that comes with this will be helpful for planning of more capable projects.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

Yeah, the big possibility for using extra depth would be for running an FM synthesizer. I'm certain it'd involve the logsin and exp tables, but I haven't yet bothered to sit down and figure out how to string together subtraction and the two tables to get an OPL out of it.

edit: this post quotes a very long thread here that says:
Olli Niemitalo wrote: out = exp(logsin(phase2 + exp(logsin(phase1) + gain1)) + gain2)
[...]
Exponential table:

x = 0..255, y = round((power(2, x/256)-1)*1024)
[...]
Log-sin table:

x = 0..255, y = round(-log(sin((x+0.5)*pi/256/2))/log(2)*256)
edit2: and NukeyKT appears to have explicitly implemented a software FM synthesizer that uses the logsin/exp lookup tables for Chocolate Doom and a few other projects.
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

infiniteneslives wrote:it turns out the STM8's timing for reading data off the 6502 bus is tighter because we have to wait until data is valid. Outputting data on the 6502 bus has significantly more slack time since it can be output early without concern. So 6502 STores have tighter timing contstraints than LoaDs from the CICOp perspective.
Tangenting back to this tiny bit...

The 6502 is actually driving the data bus during the entirety of the write cycles. So the "ll" and "hh" nybbles have to wait for the RAM's delayed output from taking M2 into account, but "ee" doesn't. edit: wrong, see my later post here
User avatar
FrankenGraphics
Formerly WheelInventor
Posts: 2064
Joined: Thu Apr 14, 2016 2:55 am
Location: Gothenburg, Sweden
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by FrankenGraphics »

Disclaimer: i don't know the first thing about DACs, so this whole post may be based on false assumptions about the output of the DAC. If the output is center-aligned; ignore the post. Anyway...

I wonder what happens when you mix center-aligned waves with 0-to-pos or 0-to-neg waves? I imagine it might be audible for two sine waves in ~unison or octaves, and in more cases with waves having more complex overtones. I also wonder what it means for the speaker - wouldn't one in this case force its peak to peak range to widen to 150% (with an assymetric center position relative to the relaxed point), assuming both waves have the same amplitude? If that's true and the speaker can't handle it, we might get distortion, or even wear.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

The 6502 is actually driving the data bus during the entirety of the write cycles. So the "ll" and "hh" nybbles have to wait for the RAM's delayed output from taking M2 into account, but "ee" doesn't.
That's interesting, because I expected that to be the case and initially allowed latching of data earlier when data was sourced from the 6502. At the time it didn't seem to be getting valid data unless I waited until late in the cycle. Entirely possible I had some other problem going on at the moment instead. I also never actually monitored the data bus with scope/logic analyzer, I was only watching the mapper bit, m2, CPU R/W, and my timing debug pin.

The other thing I didn't take into account is that the open bus after the cycle has ended, should still contain valid data until m2 goes high again. But based on your point, this is only true if the 6502 is not driving the data bus on the subsequent cycle. I need to spend some more time analyzing the bus and testing edge cases to ensure the read is best placed.
I wonder what happens when you mix center-aligned waves with 0-to-pos or 0-to-neg waves? I imagine it might be audible for two sine waves in ~unison or octaves, and in more cases with waves having more complex overtones. I also wonder what it means for the speaker - wouldn't one in this case force its peak to peak range to widen to 150% (with an assymetric center position relative to the relaxed point), assuming both waves have the same amplitude? If that's true and the speaker can't handle it, we might get distortion, or even wear.
I don't really understand what your asking, sounds like you're concerned about constructive and destructive interference? I don't see how that relates to center/edge aligned PWM DAC though.. With your concern to the speaker, I don't understand how you're proposing one could exceed 100% output.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
Post Reply