Wow, the INC $4011 idea is great. If you want 44khz sound like the MP3s (those are 16-bit though) then you would doing this about once per scanline, with that optimization freeing up well over 2000 CPU cycles per frame.
Today I got the MIDI polyphony working (using some wacky on-the-fly channel assignment / note-stealing stuff), using 24 channels so it's up the General MIDI standard. 4, 8, and 16 channels is more like I was thinking to use on the NES version, 24 channels is a bit much.
Bregalad: For the line-out I would suggest an adapter
like this one (very cheap if you buy it at the right place). I think it also should be possible to pass through and mix the regular NES channels on the cart, but of course that still needs the aforementioned audio mod to the NES (or use a Famicom with a modded NES adapter for all these cases

). It was also thinking it could be fun to hook the NES audio up to the PIC's ADC, and display the waveform while playing NSFs on the actual system, heheh.
koitsu: Back in 2004 or so I had started writing a sound synth to run on the NES, but realized I was quickly running out of CPU cycles and needed an IRQ, I was also designing a game that needed more than 8kB WRAM (and being able to save), so I set out to design a board that would have everything I needed. So with it I'm (ab)using an MCU and using it like a DSP, have FlashROM on board, lots of RAM.. Then since the MCU has all these hardware peripherals (especially the serial ports), I coded a small kind of API/BIOS thing so the NES can use the RS232/USB and stuff like that, right over the data bus. There are pictures of the various boards
here, but that's all old stuff from 5 years ago and I've learned a lot since then, heheh. I've had a lot of help from kevtris too, on the original and this newer one with solving and optimizing some of the main design issues.
This new design is still in it's infancy though, there's not even a schematic yet, just a parts list and a pretty good idea of how it will all fit together. I'm trying not to get too freaked out about how much it will cost to have a batch produced, heheh.
tokumaru: There is the CPU cycle draining issue (the older Squeedo had to wait for the IRQ acknowledge to be handled, so it was possibly a little worse), but another issue that the using the IRQ for something else like a scanline timer would tie it up even more when checking the IRQ source. On the CHR-side at least, there is one thing that helps out - the CHR (and perhaps nametable) switching can be automated to happen at a specified time, so no interrupt is needed for that.