I had a design for PPU that had its own instructions that it could execute during hblank/vblank (and the program always restarts from the beginning during each vblank), and then the PPU is clock-interleaved with the CPU, so that they can share memory without conflicting.
My ideas for another computer it had the "video instruction set" as follows:
- 0: JMP: Jump to the operand's address. (If the operand is immediate, treats it as an instruction.)
- 1: WAI: Same as JMP but wait for the next hblank before continuing.
- 2: ROL: Rotate left operand value through carry, and store result in A.
- 3: ROR: Rotate right operand value through carry, and store result in A.
- 4: ADC: Add operand value with carry to A.
- 5: CMC: Complement carry flag; operand is read but ignored.
- 6: SBC: Subtract operand value with carry from A.
- 7: CMP: Subtract operand value from A (without carry); update carry flag but do not update A.
- 8: AND: Bitwise AND operand with A and store in A.
- 9: ORA: Bitwise OR operand with A and store in A.
- 10: EOR: Bitwise XOR operand with A and store in A.
- 11: SAR: Write value from A into video register specified by operand. (This includes the playfield address, palette, modes, and other stuff.)
- 12: LDA: Load operand value into A.
- 13: LDB: Load operand value into B.
- 14: STA: Store A into operand. (You can write to an immediate value.)
- 15: STB: Store B into operand.
The video instruction set also has one flag (carry), two 8-bit accumulator registers (A and B), and each instruction also has an addressing mode (A, B, 8-bit immediate, or 16-bit absolute), and a condition code (never, if carry clear, if carry set, or always). Note also that this enables tile height to be any number, because it is programmed by software.
Note that the CPU would have its own instruction set, not the above.
For audio, there is some kind of ideas. You could have different channels that act differently, and maybe even some like SID, some like GameBoy, etc. One idea I had is you could have a "IDFX" channel. There are four 4-bit numbers, called I, D, F, X (in addition to the period, which is separate). If the phase is p, and output volume is v (0 to 15), then (as a C code):
v=(((p&I)^(p<D?15:0))&F)^X
You could also have additional RAM in the cartridge if 4K is insufficient, like it can be on NES/Famicom.
I also think there should be both keyboard and game controls, and possibly even Forth built-in that can be executed when no cartridge is present (you may then save the program on a tape, if you want to keep it; usually you would use ROM cartridges though; the Forth is just something extra).
You may vary these things; all of the above are just ideas, and you can do differently if you want to do.