Adding features to discrete mapper with multipurposed CIC

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

User avatar
FrankenGraphics
Formerly WheelInventor
Posts: 2064
Joined: Thu Apr 14, 2016 2:55 am
Location: Gothenburg, Sweden
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by FrankenGraphics »

It can't; I got seriously confused. :oops: I somehow got the idea alignment was in the amplitude domain (ie if waves swing from +V to -V or from +V to 0v), not the frequency (like this picture illustrates). Image.

It seems these two modes would interfere differently when in unison with the internal APU squares, but since their phase isn't synced to begin with, that difference might be lost anyway.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by tepples »

I foresee some intermodulation distortion between two sines with the "fast" PWM.

I also foresee trouble if DMC cycle stealing happens while the CPU is accessing the CICO.
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

infiniteneslives wrote:That's interesting, because I expected that to be the case and initially allowed latching of data earlier when data was sourced from the 6502. At the time it didn't seem to be getting valid data unless I waited until late in the cycle.
Well, I should say that we know that the CPU drives the data bus on write cycles during both φ1 and φ2... but that doesn't say anything about how much time it takes for it to get to valid.

We know that Nintendo's original 2A03letterless exported a 3/4 duty cycle and the 2A03E/G/H use a 5/8 duty cycle; it's possible that they had issues where worst-case conditions took longer than 140ns to get to valid.
tepples wrote:I foresee some intermodulation distortion between two sines with the "fast" PWM.
I agree, there should be something. Time to throw that in the sim again.

Sim conditions:: F_bit=12.88MHz, F_pwm=30720Hz, F_sine1 = 3072Hz, F_sine2 = 3000Hz. Intermodulation products are expected to show up at 3144Hz and 2928Hz... but I don't see any. At all.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by tepples »

What does the sim show for, say, 2000 and 5000 or 2000 and 4900 Hz?
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

tepples wrote:What does the sim show for, say, 2000 and 5000 or 2000 and 4900 Hz?
2kHz & 5kHz: FastPWM shows spurious emissions at 3720, 5720, and 6720Hz, but given those offsets of 720Hz, those aren't intermodulation between just the sine waves.

2kHz & 4.9kHz: FastPWM shows spurious emissions at 4206, 5100, 6200, and 7100 Hz. Not clear on the math generating these.

Both sets, the spurious emissions peaks are 40dB to 60dB below the intended signal, over a noise floor that's 10-20dB quieter; we're still only talking about around ~1LSB of noise.

(Like I previously said, it's not that centered PWM isn't better: it clearly is. It's just not obviously consistently enough better to warrant the loss of bit depth.)
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by tepples »

Thanks for running them.

Now on to how DMC cycle stealing interacts with a scheme to blindly put stuff on data bus for reading at 4, 8, and 12 cycles after the final write.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

tepples wrote:I foresee some intermodulation distortion between two sines with the "fast" PWM.
For the CICOp and it's set limitations I'll probably be limited to simpler waveforms (square, triangle, saw) anyway. If it does get a sine channel, it likely won't be more than one. I am curious to see what how well a PWM DAC (perhaps even dual-PWM) can perform on a more capable mcu in comparison to a hardware DAC. In the end, the added cost of a mcu with built in DAC isn't too much. So using a PWM DAC is best placed in the most cost sensitive/limited projects.

Thanks for the PWM analysis guys! It is nice to hear that an extra bit of edge aligned resolution stands the chance to make up for the loss of center aligned PWM!
I also foresee trouble if DMC cycle stealing happens while the CPU is accessing the CICO.
Ahh yeah. I completely forgot about that DMC guy..
Now on to how DMC cycle stealing interacts with a scheme to blindly put stuff on data bus for reading at 4, 8, and 12 cycles after the final write.
I think we can cover this case, so here goes:

So assuming this to be accurate enough for our situation:
Likely internal implementation of the read

The following is speculation, and thus not necessarily 100% accurate. It does accurately predict observed behavior.

The 6502 cannot be pulled off of the bus normally. The 2A03 DMC gets around this by pulling RDY low internally. This causes the CPU to pause during the next read cycle, until RDY goes high again. The DMC unit holds RDY low for 4 cycles. The first three cycles it idles, as the CPU could have just started an interrupt cycle, and thus be writing for 3 consecutive cycles (and thus ignoring RDY). On the fourth cycle, the DMC unit drives the next sample address onto the address lines, and reads that byte from memory. It then drives RDY high again, and the CPU picks up where it left off.

This matters because on NTSC NES and Famicom, it can interfere with the expected operation of any register where reads have a side effect: the controller registers ($4016 and $4017), reads of the PPU status register ($2002), and reads of VRAM/VROM data ($2007) if they happen to occur in the same cycle that the DMC unit pulls RDY low.

For the controller registers, this can cause an extra rising clock edge to occur, and thus shift an extra bit out. For the others, the PPU will see multiple reads, which will cause extra increments of the address latches, or clear the vblank flag.

This problem has been fixed on the 2A07 and PAL NES is exempt of this bug.
For reference, here's my CICOp routine:

Code: Select all

   sty   CICOP_ADDR_EN   ; 8c [bank num] [bank table low] [bank table high]
   sta   (cicop_reg), y  ; 91 [ZP byte num] [x:regL] [x:regH] xx [x:wrL]
   stx   cicop_reg       ; 86 [ZP byte num] [x:wrH]
   ldx   CICOP_PORT      ; ae [addr L] [addr H] [x:rdL]
   ldy   CICOP_PORT      ; ac [addr L] [addr H] [x:rdH]
   lda   CICOP_PORT      ; ad [addr L] [addr H] [x:rdE]
The 6502 gets stalled on the next read cycle following the DMC pulling RDY low. And once the DMC is done with it's fetch/stealing the 6502 re-performs the read that was stalled. What I'm uncertain of is which of the reads actually gets caught by the 6502. I would guess the initial stalled read is executed, but the 6502 doesn't catch it. It's the second post-stall read that's actually caught (otherwise it probably wouldn't be done)..?

[edit sorry I've got some opcode name and cycle number errors here.. Think they're fixed now. Realizing there are some other cases I'm not detecting and differentiating between, but I think there's room to cover them]

If the stall were to occur on any of the read cycles during STA (T0-T4) or STX(T0-1), the CICOp could sense the stall as CPU R/W wouldn't be low during the expected write cycle (STA T5 and STX T2) due DMC stall & fetch. It would also be known that the CPU stalled for 4 cycles, the CICOp could potentially insert that delay to it's routine.

If the stall were to start on the final write cycle of STA, the write would occur normally. But the 3 cycle stall could be sensed as CPU R/W wouldn't be low at expected time for STX. It would be known that the CPU was stalled for 3 cycles, similarly the CICOp could delay it's routine by 3 cycles.

If the stall were to start on the final write cycle of STX, the write would occur normally. But the opcode and address fetch of LDX wouldn't be present on the bus. The CICOp could sense this case by checking LDX opcode being present on the bus @ T0 provided the written data from STX didn't also equal '0xE'. T1 would continue as open bus, and the DMC would hijack the bus on T2. So the CICOp could stop itself/delay outputing data on the bus for LDX T3. This case also stalls the CPU by 3 cycles, the CICOp could try to delay itself, but needs to differentiate this case from the ones below that have a 4 cycle stall.

The final 3 LDX/LDY/LDA all have similar behavior. The case that really needs to be covered is if the stall were to start during T0 when the opcode was being fetched. In this case the DMC will hijack the bus on the last cycle (T3) when the CICOp is also planning on driving the bus. This condition could be detected by verifying low address of CICOP_PORT being on the bus for T2. The CICOp could recover by delaying 4 cycles.

If the stall were to start during T1/T2 of LDX/LDY/LDA we're mostly safe to output data as the DMC will hijack the bus after the CICOp drives data to a stalled CPU. A stall during T2 could be sensed by CICOP_PORT high addr (T3) not being present. To support that, A0-3 of CICOP_PORT needs to differ from A8-11 (and also not equal 0xE, 0xC, 0xD to differ from LDX/LDY/LDA opcode) which is simple enough with a chosen address of something like $5A05.

If the stall were to occur on T3, the final LDX/Y/A cycle, there isn't a great way to sense that until T0 of the subsequent LDY/LDA. In that case the opcode/operand wouldn't be present for the next load. This can be easily caught for the first two loads (LDX/LDA), and CICOp routine delayed accordingly. But not for the final one assuming I'm correct that the initial stalled read isn't caught, and the subsequent second read is what's caught. This isn't necessarily an issue as this final read is intended to be the verification read. So reading back open bus would simply be a false failure. However, we could define the 6502 instruction that follows the current routine to allow for detection of this condition so the final nibble can be delayed and resent.

Phew.. Well in the end doing all this doesn't seem to unreasonable given the number of nops currently in my CICOp isr. It's certainly gets complicated quick though..

Now I'm curious how hard it would be to detect and protect if the 6502 were interrupted mid routine. It certainly wouldn't be as easy to transparently recover as there's no telling how long the 6502 interrupt will last. The above proposals will probably cover an interrupt prior to completion of STA/STX with CPU R/W prior to latching data. But I'll have to check into this more, hopefully there's a way to detect this case enough to prevent from outputting data on the bus when it shouldn't be thus preventing a crash. If that's possible, then one could set a flag to denote that a CICOp register "transfer is in progress" prior to this routine. Then the NMI/IRQ routine could update that flag to "current transfer failed". Once the 6502 returns to the CICOp routine that got interrupted, it would make a check at the end and reperform the transfer if "current transfer failed" was true.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
FrankenGraphics
Formerly WheelInventor
Posts: 2064
Joined: Thu Apr 14, 2016 2:55 am
Location: Gothenburg, Sweden
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by FrankenGraphics »

infiniteneslives wrote:For the CICOp and it's set limitations I'll probably be limited to simpler waveforms (square, triangle, saw) anyway. If it does get a sine channel, it likely won't be more than one.
I don't think i'd sweat it to get a sine form in - unless it was routed to modulate another waveform (hardware automated modulation, yay!) and had a range dipping into the subsonic. On itself, it adds little: Maybe some extra punch, low end for bass notes, organ overtone or air as a double/follow to another richer channel, or use it for drum synthesis where it'd fill a role on its own. But the question then becomes how one would "what you hear is what you get" it when composing. If always used in unison/octaves with another track or strictly used for kicks, toms or snare support, maybe one could live without hearing it between assembly/compile tests more easily.

Sines might also have a worse 'wanted sound' vs. digital artifacts ratio if the resolution is low, which limits its use somewhat further.

Basically, it has its distinct and possibility expanding uses, but they're somewhat limited if the choice stands between one more tri, saw* or pulse channel (potentially with finer pwm steps than the internal APU) on one hand, and a sine channel on the other.

*Saws are also less flexible than pulses when they're without a filter so in that regard they're a bit limited in use just like the sine, but at least the saw function is simple.

===

EDIT: More on saws... here's one idea on how to make them more versatile.

Assume each channel is a pulsewidth variable square channel. They all have a control bit. when 0, everything is normal. When 1, the wave will be normal when low, but output the remainder of a saw function when high.

With a wide enough pulse and the control bit set to 1, that's a saw or something very close by. Then by changing the pulse width, you could control the morph between the two: pulse and saw.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by tepples »

Or you could express saw, triangle, and pulse in terms of rise time, high time, fall time, and low time.

Saw: rise=0%, high=0%, fall=100%, low=0%
Triangle: rise=50%, high=0%, fall=50%, low=0%
Square: rise=0%, high=50%, fall=0%, low=50%
1/8 duty pulse: rise=0%, high=12%, fall=0%, low=88%
Approximate sine: rise=33%, high=17%, fall=33%, low=17%
Filter: Interpolate a waveform toward approximate sine
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: Adding features to discrete mapper with multipurposed CI

Post by lidnariq »

lidnariq wrote:2kHz & 5kHz: [...] 2kHz & 4.9kHz: FastPWM
I realized that I had set up my test wrong. I wasn't windowing my FFTs, and 30720 Hz for the PWM modulator wasn't an integer factor of the FFT size.

When I redid the tests with other choices of PWM frequency that were integer multiples of 192000Hz÷131072, then no spurious emissions showed up that were obvious results of intermodulation, at all.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Now I'm curious how hard it would be to detect and protect if the 6502 were interrupted mid routine.
A quick glance at this and detecting that the 6502 hit an IRQ/NMI mid transfer routine looks rather simple. The easiest detection would be on T2 (cycle 3) of any instruction CPU R/W will be low for the first push while the 6502 is processing it's interrupt. STX ZP is the only instruction I currently have which also has CPU R/W low on T2. In the end absolute addressing would be fine and delay CPU R/W of that instruction to T3, I only chose ZP addressing because it saved a cycle. The CICOp may need a little more time if it's making all of these checks anyway..

I'm not sure what if anything is on the data bus during the first two interrupt cycles while internal operations are being performed. I presume the data bus is open and retaining data of the last executed instruction, detecting lack of fetched STX opcode and ZP operand would allow STX to keep ZP addressing.

While the CICOp could protect against the 6502 getting interrupted mid transfer routine, it couldn't recover/delay like the DMC case. The chances of a DMC collision should be relatively low. So to keep the CICOp transfer routine simpler I'll probably opt to require any interrupted transfers to be retried by use of flags in NES code as I described in my last post. Any features/protections added in the transfer routine will bloat the isr code on the CICOp which may need to be doubled to support PAL. Still have to implement all that, but once done I'd expect the CICOp transfer routine to be bullet proof!

Thanks for the input on what may be desirable audio channel features guys. I can't speak too intelligibly on this front as I've yet to write any supporting code. But with the synth engine running each PWM update cycle (expecting every 1024 STM8 cycles), it'll be necessary to keep execution time low. I'm thinking that will favor more of the modal type settings as FrankenGraphics proposed, compared to what I expect to be calculation heavy settings of Tepples' proposal. But we'll see!

Thinking I'll call the CICOp register transfer routine good enough for now, and circle back to it later to support DMC/NMI/IRQ collisions. So next up I'm planning some basic synth & PWM tests with a single square wave. I'm concerned about what the exact output/mixing circuitry will require. If the output of the PWM DAC is too loud compared to the NES APU I may need an extra voltage divider resistor I'm thinking..? I'm also unsure about a series output DC decoupling cap. Or perhaps it'll be too quiet and require an opamp output buffer if the PWM DAC can't source enough current. Also unsure how much I can change the PWM DAC component values to help the situation. My experience level is low with these analog areas so I never trust myself until fully prototyped..

Aside from that I would also like to try out some basic scanline & CPU cycle counting before ordering the first batch of boards.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
Memblers
Site Admin
Posts: 4044
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by Memblers »

I was gonna say about the audio stuff, is there any reason to not have the waveforms in RAM? It doesn't need a lot, you can get pretty good capability even with just 32 bytes. You can still have predefined waveforms in ROM if you wanted, then just copy it into RAM. From there, the NES program could modify them or upload entirely new ones. That's what my synth does.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

Yeah putting a wave table in RAM would certainly be an option. Something like that shouldn't require much for calculations at run time. Modeling after the FDS wave table may also help out with composition tools.

The only reason against it might be the not so speedy interface of the CICOp registers, but the wave tables wouldn't have to be frequently updated to be useful.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
User avatar
FrankenGraphics
Formerly WheelInventor
Posts: 2064
Joined: Thu Apr 14, 2016 2:55 am
Location: Gothenburg, Sweden
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by FrankenGraphics »

If it can handle writes at 60hz, it would be transparent compared to fds.

A bit crazy software implementation idea:
Using the irq and any board with an updatable wavetable synth, one *could* update the waveform being used twice per frame, leading to very smooth (comparatively speaking) wave transitions. That's a bit new sonic territory for the NES. PWM, saw-to-sinoid "filters" and various other morphs would be more useful than otherwise. But you'd either have to sacrifice the irq for that or make sure it happens anyway at a point relatively opposite of the general music update during a frames' full cycle. If it varies a bit up and down the scanlines; it's not really a problem. It'd be pretty useful emulating attack portions of various instruments, above all, but also for "synthy" synth sounds.

The extra time-domain granularity has to be inserted around export-time, mostly out of convenience.
One way is to keep doubles of the instruments and their macro strings, and change the most significant digit in the instrument row to the "hi-res" versions before exporting; discarding unused instruments in the process.
User avatar
infiniteneslives
Posts: 2104
Joined: Mon Apr 04, 2011 11:49 am
Location: WhereverIparkIt, USA
Contact:

Re: Adding features to discrete mapper with multipurposed CI

Post by infiniteneslives »

FrankenGraphics wrote:If it can handle writes at 60hz, it would be transparent compared to fds.
Yeah it can easily handle multiple writes per frame. Bigger question is how much time the NES wants to spend making those writes. Current CICOp register transfer routine is 21cycles for 1Byte written, and read back verified, plus some amount of overhead for preparing for the write. I will likely have to limit the number of CICOp register writes to reserve enough processing resources for CIC operations, but it several writes per frame should be fine. Once I have async CIC operations implemented I'll have a better idea of what the practical limitations will be.
If you're gonna play the Game Boy, you gotta learn to play it right. -Kenny Rogers
Post Reply