It is currently Wed Oct 23, 2019 8:30 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Thu Sep 13, 2007 1:33 pm 
Offline

Joined: Thu May 03, 2007 3:07 pm
Posts: 155
How practical would it be to write a sound engine that wrote to the APU registers 2 times per frame instead of one time per frame? I think it would be interesting with an engine like this, maybe more depth could be added to the APU channels, like for frequency changes, or duty cycle changes for the square channels, or volume changes. For 2 times per frame, maybe a game fires a mapper IRQ near the middle of the frame, or as close as it can get if there's split-screen effects going on there. Maybe a game could use a large status bar like Kirby's Adventure and do mid-frame APU updating there... Could 3 or 4 times per frame also be practical?

What if a game did its own manual sample mixing for $4011 and mixes two sound channels together once per frame? I think such sound channels would need to be split up in one frame segments (50 or 60 HZ segments). When mapper IRQs are being fired at about every 2-3 scanlines or lower frequency, a game can just pop a value from RAM and write to $4011. A problem would be that sample bytes would need to be skipped in split-screen effects and VBlank, or using a pseudo-extra channel timer during split-screen IRQs to still write to $4011 would complicate the IRQs and lose time for calculating the split-screen data. Using conventional loops and indexes, mixing two 80-120 byte segments together would also be very time consuming and take up a lot of the frame, leaving less time to actually update $4011. Even if completely unrolled code was used, a lot of PRG code would need to be used and it would still be pretty time consuming, but it wouldn't be as bad. What are better ways to incorporate pseudo-extra channels with $4011?

I was just wondering, but did games ever use similar methods as these - for sound effects or music? Even though it's not a good game (it's an LJN game), WWF King of the Ring seems to mix audience cheers and wrestler grunts together during gameplay, but the two sound effects seem to drown out each other.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Sep 13, 2007 1:56 pm 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7744
Location: Chexbres, VD, Switzerland
Quote:
How practical would it be to write a sound engine that wrote to the APU registers 2 times per frame instead of one time per frame?


I already throught a bit about this. It can range to very simple to almost impossible in function of the mapper used and the CPU usage for the rest of the game (not the sound code).
First, the problem is that to call the sound programm 2 times (or more) a frame, you need to have a second reliable time base (the first being the VBlank NMI). I trought of using APU frame IRQ for this (by waiting half a frame while booting then start the frame APU from there), but then it would be slightly faster (or slower, I cannot remeber) and would eventually run of duty, and there is no way of making it keep in synch, exept if you want to regulary sacrifice a frame to re-sync it (it could cause gaps in the gameplay each 16 frames or so, it could be tolerable in a RPG, maybe not in an action game). You could of course use a music engine that could split between normal mode and dual more automatically, for example in a RPG the music would be normal on the field, but fast in battle (where the sound code is called twice a frame) to have more detailed sound effects, and then skip a frame ocasionally to re-synch the APU frame IRQ used to tigger the additional call of the sound code. Another problem is that the PAL NES has the frame IRQ that is not a little faster or a little slower than the VBlank like the NTSC NES, but it is just a whole lot faster than the VBlank, so it would need to re-synch very often (maybe each 4-5 frames or so).

This is for a mapper that has no particular timebase IRQ. If you use the MMC3 (but do not use split-screen effect) it would be a lot easier to reliably trigger two or more times the music code in the frame. The only real downside is that you will eat a more lot CPU time, as the sound programm is running a couple of time per frame, so it has to not be too slow to not screw up the rest of the programm. Finally, it would be harder to use the IRQs for something else at the same time. Again, a game could use the IRQ for some graphical stuff at one place, and for sound at another place in function of the game needs. In a very standard action game with a status bar, it wouldn't be hard to use 2 IRQs per scanline (approximately at 1/3 and 2/3 of the frame) and split the screen to the status bar on the second IRQ too (before calling the sound routine).

Finally, on the MMC5 it would be even simpler than on the MMC3, as IRQs are absolute (and not relative) so it would help a lot if you want to merge IRQs for graphic uses and for sound uses (you would need to write a small IRQ handler, that would trigger the first needed, then init the second when the first happen, etc... for the whole frame).

So in conclusion this is possible, with and without special mapper, but it sure does complicate things, so unless you really want very detailed sound effects or special effects in you music it's simpler to just use the regular way to do things, and the hardware sweep registers can change the pitch faster than a one-frame basis (combined with also changing the pitch evey frame can create interesting effects).

Quote:
What if a game did its own manual sample mixing for $4011 and mixes two sound channels together once per frame? I think such sound channels would need to be split up in one frame segments (50 or 60 HZ segments)

It depends what you call "mixing". I mean even one single channel cannot be played, unless the whole programm is completely frozen. You *could* come up with a programm that cuts all its tasks into very small timed codes of a fixed CPU lenght then manage to call those small task in a good order and write to $4011 between them, but this would be a real headache to handle (not technically impossible).
If you want to mix 2 channels, in theory you would have to mix the equivalent of one sample in software, write to $4011 AND call a small piece of the rest of the code regulary, which is even more a headache. If you just add the values of two samples together before feeding $4011 I don't think it changes much things, however if you want to come with volume mixing, resampling and such things I'd say forget about that unless you seriously overclock the NES.

With extra hardware of course this make things differently, I guess the Squeedo car has a microcontroller that mix audio and then IRQ the main programm, directly sending it data the cart just has to copy to $4014 regulary. This works, but eats considerable CPU %.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Sep 13, 2007 6:43 pm 
Offline

Joined: Thu May 03, 2007 3:07 pm
Posts: 155
Bregalad wrote:
With extra hardware of course this make things differently

Would "extra hardware" include mappers with their own IRQs? Gauntlet II and King of the Ring used the MMC3's IRQs for driving $4011 during rendering only but they don't slow down. Gauntlet II updated $4011 every 3 scanlines, King of the Ring updated $4011 every 2 scanlines. King of the Ring seems to use pre-mixed sound samples for playing both cheers and grunts at the same time, as it just loads data from the PRG ROM.

Empire Strikes Back actually drives $4011 nearly every rendered scanline during gameplay without pausing or slowing down the game, but that might be because of there's very little enemies. Ultimate Stuntman seems to skip sample bytes lost for the game engine and plays drum samples during free time in the game, as the drums' quality worsen when there's more enemies. That approach, however, would not be good in case too much stuff is going on, but there's not much intense action in the game.

Would $4011 sound channel mixing be more practical in puzzle or text/graphic adventure games where there's usually not as much action?

EDIT: Changed "drives $4011 every scanline" to "drives $4011 nearly every rendered scanline" to be more accurate. Also added "during rendering" to "used the MMC3's IRQs for driving $4011" to reflect this.


Last edited by strangenesfreak on Fri Sep 14, 2007 9:11 am, edited 2 times in total.

Top
 Profile  
 
 Post subject:
PostPosted: Thu Sep 13, 2007 8:48 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
The DMC can be used to generate a mid-frame interrupt or generate many interrupts per frame (up to around 4 kHz, 66 per frame). By using a sample made up of $55 or $AA, you can have just a quiet square wave at high pitch.

About the only use for sample mixing is playing more than one drum sample at once. Trying to play notes would probably result in crappy music like on the Game Boy Advance.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Sep 13, 2007 8:58 pm 
Offline

Joined: Wed Mar 22, 2006 8:00 am
Posts: 354
Bregalad wrote:
Another problem is that the PAL NES has the frame IRQ that is not a little faster or a little slower than the VBlank like the NTSC NES, but it is just a whole lot faster than the VBlank, so it would need to re-synch very often (maybe each 4-5 frames or so).

This isn't true. On both NTSC and PAL, frame IRQ's occur at (approximately) the same rate that NMI's occur (60Hz NTSC, 50Hz PAL). Blargg verified this some time ago.

That said, I strongly discourage the use of fram IRQ's (for any reason) because they aren't in sync with PPU timing.

_________________
"Last version was better," says Floyd. "More bugs. Bugs make game fun."


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 14, 2007 8:40 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21644
Location: NE Indiana, USA (NTSC)
strangenesfreak wrote:
Empire Strikes Back actually drives $4011 every scanline during gameplay without pausing or slowing down the game, but that might be because of there's very little enemies.

Which mapper is that? And how would anything drive $4011 during OAM DMA?

Quote:
Would $4011 sound channel mixing be more practical in puzzle or text/graphic adventure games where there's usually not as much action?

Possibly, but would $4011 mixing keep up with this kind of puzzle game or this kind of puzzle game?

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 14, 2007 8:57 am 
Offline

Joined: Thu May 03, 2007 3:07 pm
Posts: 155
tepples wrote:
Which mapper is that? And how would anything drive $4011 during OAM DMA?

That game uses MMC3, but it doesn't use its IRQs for $4011. Actually, it doesn't update $4011 during VBlank, sorry that I forgot to mention that. None of the games I mentioned update $4011 during VBlank, only when the screen's rendering.

Quote:
Possibly, but would $4011 mixing keep up with this kind of puzzle game or this kind of puzzle game?

I don't think $4011 mixing would work easily with complex puzzle games like Lumines, but it could work with simpler games like Tetris. A bit off topic, but speaking of that Tetris video, woah, that guy is FAST. :shock:


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 14, 2007 10:53 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7744
Location: Chexbres, VD, Switzerland
Quote:
Trying to play notes would probably result in crappy music like on the Game Boy Advance.

Exept that the NES CPU is abot 20 times less powerfull than the GBA one. In fact I doubt it would be possible to play simples on the NES with resampling&cie without any co-processor on the cartridge, even if all action and screen stuff is completely paused.

And of course it's possible to play samples AND keep action on the screen, HOWEVER the main programm has to regulary write to $4011 at mostly regular intervals, and this is a hard things to do, and would complex the whole game engine. I guess fighting games needs very low CPU usage, allowing the rest of the CPU time used for such things.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 14, 2007 11:31 am 
Offline

Joined: Thu May 03, 2007 3:07 pm
Posts: 155
For $4011 mixing, assuming the gameplay can allow for it without too much difficulty, would it be plausible to mix drum beats (realistic or using sound waves) with simple wave channels (squares, triangles, saws, etc.) instead of realistic non-drum sound samples (guitar sounds, etc.)? Would simple waves sound better instead of realistic samples here?


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 14, 2007 12:01 pm 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7744
Location: Chexbres, VD, Switzerland
Drums beats should be doable, as if their pitch is sligtly modulated it won't sound too bad. Simple wave channels or looped samples would sound bad anyway, because the pitch would be slightly modulated and sound fuzzy. Unless of course you can write to $4011 at EXACT intervals, which is almost impossible (unless you sacrify a lot of VBlank time for this).


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 14, 2007 5:47 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21644
Location: NE Indiana, USA (NTSC)
It's probably safer just to use DPCM for drums (and possibly bass if you're feeling especially Sunsoftish) and the 2A03 tone generators for anything pitched.

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 15, 2007 10:58 am 
Offline

Joined: Thu May 03, 2007 3:07 pm
Posts: 155
If one wanted to play DPCM samples during intense split-screen effect IRQs, would it be practical to read from the same samples converted from DPCM to RAW and update $4011, then determine where to continue the DPCM sample? With this method, if it needs to be updated, DPCM would always be updated at the same time every frame. Would it be alright if DPCM was temporarily disabled and then the sample is continued?


Top
 Profile  
 
PostPosted: Sat May 25, 2019 5:56 pm 
Offline

Joined: Mon Jul 07, 2008 7:40 pm
Posts: 63
A very-much-necro post here because this thread is high on Google and I am currently doing more or less exactly what the OP is asking (with the original aim of a sound engine for EXCLUSIVELY music-demo use; no need to count precisely mid-frame or leave time for game code).

Tl;dr: *death-metal dial-up modem noises*

Using a fairly simple testbed that gives every register of every channel a looping queue of up to 8 values held, sub-frame, for literally a number of main-loop iterations rather than anything e.g. NMI-clocked,
- 1 "tick" is basically every 16th iteration of the main program loop, as its ongoing register index hits a given value
- anything less than about 64 "ticks" between updates on the square or triangle channels sounds like... death. (although since Triangle doesn't reset its step clock on register updates, you can achieve some really funky largely-uncontrolled modulation artifacts...)
- 64 "ticks" between register updates produces an end result acoustically close-enough to 1 update per frame at NMI-59.9Hz
-- I'm counting some 0x380 iterations of the loop per frame, so each of the 16 registers should be clocking through at about 0x38 (decimal 56) "ticks" per frame
-- indeed, the same Final Fantasy VI 6-tone block chord I'm testing with now was part of a frame-timed arpeggiation demo I did years ago, and I can definitely hear the top note of that arpeggio going by at about the same speed as the frame-clocked demo when holding for 64 ticks of the loop-clocked demo, barely-maybe going by at all when holding for 32 ticks of the loop-clocked demo, and getting utterly lost in the noise with any shorter wait times.

Results are consistent between Nestopia and a PowerPak in a vintage toploader.

Best guesses so far:
- the pAPU internally uses perhaps the Frame Counter to only "latch" values from the control registers on a 60Hz-or-so interval, so if you are literally writing mid-latch you get undefined garbage, otherwise you get whatever is stable once per frame or so.
AND/OR
- it takes CPU-clock-nontrivial time for the pAPU to adjust the line levels after noticing a register change, and interrupting this process leaves the frequency generators in very unhappy middle-ground states

Either way, I am disappoint.
I'll probably keep banging on it a bit more this weekend, but I suspect that the closest thing possible to CPU-driven mixing/multiplexing on NES is going to be on-the-fly DMC sample building, which is outside my current scope of experimentation.

Update:
Values closer to a clean factor of the de-facto number of ticks per frame do seem to work better.
I can push the Triangle and Square down to 42-tick holds (3/4 frame) with good results when I'm getting ~56 ticks/frame.
28 is dicey, 14 I thought I had working markedly better than 16, exclusively on square, but then I listened more closely in headphones and 14 is still garbage.
And of course, changing the hold time changes the number of times per frame that the loop has to do more heavy lifting, which in turn changes the average loop iterations per frame, which then throws the math off.

Using the same trick to try to mock-volume the triangle channel with a sub-frame-timed on/off duty cycle also does something passable (definitely not perceived as a drop in "volume" but at least a drop in sound intensity through the quite-obvious in/off pattern) at 42-tick holds. Much lower and you're back into the wonderful world of frequency aliasing as you keep interrupting the continuous step-waveform at odd points.

Experiments were generally intended to try to get on the NES what this guy has gotten out of hardware with only a single-channel square beep. Sadly, it seems this won't happen. https://soundcloud.com/mister_beep

_________________
Psych Software- failing to profit from retro/indie development since before it was cool
http://www.psychsoftware.org


Top
 Profile  
 
PostPosted: Sun May 26, 2019 4:32 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 8626
Location: Seattle
LoneKiltedNinja wrote:
Best guesses so far:
- the pAPU internally uses perhaps the Frame Counter to only "latch" values from the control registers on a 60Hz-or-so interval, so if you are literally writing mid-latch you get undefined garbage, otherwise you get whatever is stable once per frame or so.
AND/OR
- it takes CPU-clock-nontrivial time for the pAPU to adjust the line levels after noticing a register change, and interrupting this process leaves the frequency generators in very unhappy middle-ground states
Both of those are definitely untrue. Multiplexing of audio channels will produce unpleasant FM sounds, like you seem to be hearing, unless you multiplex at ultrasonic rates. (e.g. the N163 emits a new sample every 15 CPU cycles, for a net mixing rate around 15-30kHz).

The various ZX spectrum channel multiplexing examples I've been able to find need sample rates closer to 8kHz, perhaps higher.


Top
 Profile  
 
PostPosted: Sun May 26, 2019 9:41 am 
Offline

Joined: Thu Apr 18, 2019 9:13 am
Posts: 161
lidnariq wrote:
The various ZX spectrum channel multiplexing examples I've been able to find need sample rates closer to 8kHz, perhaps higher.


The BTP2 music player I used in Stella's Stocking for the 2600 used a rate of 15.75Khz (one sample scan line). If I were designing a CPLD-based mapper chips, I'd be inclined to include an option to generate a scan line interrupt with master-clock cycles, with a "bump cycle" strobe that code could use to establish initial sync, and could then hit once every other frame on NTSC to maintain it. If I were trying to do a "period correct" discrete hardware version on the cheap and didn't need any raster splits or complex graphics but wanted nice audio playback, I might use a 555 timer simply wired to the IRQ.

Having an IRQ which was reliably triggered every scan line wouldn't be as nice for raster splits as one that was programmable, but a minimal-length IRQ:
Code:
    dec ctr
    beq ready
    rti

would add 20 (7 for the IRQ, plus 5+2+6) cycles overhead every 113 cycles when it wasn't doing anything. Annoying, but hardly a showstopper. Code in vblank could run with interrupts disabled if it kept track of how many lines would be skipped, or it could enable interrupts if it could deal with the extra delays.

A minimal audio-playback IRQ for audio that was buffered in the mainline twice per frame would be something like (running from ZP RAM)
Code:
    sta irq_reload_a+1
irq_load_data:
    lda buff
    sta $d011
    inc irq_load_data+1
irq_reaload_a:
    lda #00
    rti

From zero-page RAM, that would cost 7+3+4+4+5+2+6, i.e. 10+13+8 or 31 cycles. Not too bad, save for the necessity of putting the data into the buffer first. Adding an extra 2 cycles to the common case (and 8 to rare cases) would allow for buffers that go beyond 256 bytes.

A four-voice audio-playback IRQ which generated samples individually might be something like:
Code:
irq:
    jmp irq_handler ; Patchable JMP instruction in ZP RAM
typicalHandler:
    sta IRQ_A
    stx IRQ_Y
    lda #<nextHandler
    sta irq+1
    ; clc -- skip if all amplitudes are even, and anti-distort table compensates
    ldy phase0
    lda (phase0l),y
    sta phase0
    lda (phase0a),y
    ldy phase1
    adc (phase1b),y
    ldy phase2
    adc (phase2c),y
    ldy phase3
    adc (phase3d),y
    tay
    lda antidistort,y
    sta $4011
    ldy IRQ_Y
    lda IRQ_A
    rti

The BTP2 music driver used 46 cycles every 76 to generate data then and there for four-voice audio; here we have more cycles/scan line, but add in IRQ support, so we end up with:
Code:
25 : 7+6+6+6 -- interrupt enter/return and register save/restore
 8 : 3+2+3 -- The jmp and then the update of the next jump
40 : Five pairs of a 3-cycle ldy (or in one case sta) with a 5-cycle (zp),y
10 : Two "weenie" instructions (CLC/TAY), distortion correction, and final store

A total of 83 cycles/scan line out of 113. Rather high loading, but still practical for a low-action game. The code would also have a significant zero-page burden of 20 pointers, five phases, and the jump vector.

It might be possible to improve efficiency by having the IRQ handler output a previouly-generated sample, generate data for the next sample, store that, start generating data for the next sample, taking a break at just the right time to output the one it had just generated, finish generating that, and store it in the spot needed for the handler to output it next time. This would add an extra pair of store/reload operations since samples would be stored in advance of need, but would eliminate 25 cyles worth of interrupt entry/exit, for a net win of 13 cycles every two scan lines, which is pretty huge at this level of CPU loading.

That sort of approach might even make such a driver practical using DMC interrupts alone if one tolerates a little extra noise from the DMC going up and down at its own speed. The most practical way to do that would probably be to set the DMC to run once every 54 samples and handle three samples every interrupt, with time-padding code between (average of 144 cycles/interupt). A sample rate a bit below the BTP music driver, but probably still decent. If the IRQ handler starts and ends with (instructions in ZP RAM)
Code:
    sta irq_reload_a+1
    lda #outputValue
    sta $d011
    jmp irq_service_main ; Get out of ZP RAM
irq_out_and_reload_a
    sta $d011
irq_reload_a:
    lda #savedVal
    rti

the last store to $d011 should happen 288 cycles after the first, leaving a minimum IRQ time of 7+3+2+288+4+2+6 = 12+288+12 = 312 cycles every 432. In addition, the DMC would steal around 20 cycles within the IRQ handler and 12 afterward. The 288 cycles in the middle would need to generate data for three samples (which would be doable for three voices if not for four). My big concern would be that the DMC can't be silenced by using all-zero data if code wants to feed audio to $4011.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 29 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Google [Bot], kikutano and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group