It is currently Sun Dec 17, 2017 4:55 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Sun Sep 13, 2015 2:10 am 
Offline
User avatar

Joined: Sat Jul 12, 2014 3:04 pm
Posts: 950
Memblers wrote:
The OAM data shows up on the data bus, so a cart can interact with it. I think that would be a neat way to copy data to the mapper, but that's for another topic. :)
DMA theft? Sure, why not.

pseudocode for verilog
0. wait for enable/ 0b. wait for write of destination to mapper
1. on next write of #$xx to $4014 (requires all CPU address lines!! And data lines, since we want to spy.)...
2. for yy = 0...$ff
3. wait for CPU_ADDR = $xxyy [alternately, $2004]
4. copy CPU_DATA to [destination]+yy on cart
===
Meanwhile, register-spying, as it would mainly require memory space, having included all of
on read/write to 1'b001x_xxxx_xxxx_xyyy
--copy to memory
(If 2005/6, twiddle address latch; on 2002 read, clear it)
on read/write to 1'b0100_0000_000y_yyyy
--copy to memory

Then, make accessible (on $4018? $4019? $4009? $400D?) after writing for which register you want (there are only 30, but with several have multiple modes- 2005/6 have two bytes, 4016/7 have reads and writes with different meaning, plus we might want what the internal-scroll variables are or something)


Top
 Profile  
 
PostPosted: Sun Sep 13, 2015 7:53 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19353
Location: NE Indiana, USA (NTSC)
Myask wrote:
DMA theft [...] pseudocode for verilog
0. wait for enable/ 0b. wait for write of destination to mapper
1. on next write of #$xx to $4014 (requires all CPU address lines!! And data lines, since we want to spy.)...
2. for yy = 0...$ff
3. wait for CPU_ADDR = $xxyy [alternately, $2004]
4. copy CPU_DATA to [destination]+yy on cart

You don't need all pins. You just need M2, R/W, /ROMSEL, and A14-A12 to decode a $4xxx write and a $2xxx write.
  • The host program sets up mapper ports to receive the next DMA
  • The mapper decodes a write to $4000-$4FFF
  • If a write to $2000-$2FFF is not decoded within 8 cycles, cancel the request
  • The mapper decodes each of 256 writes to $2000-$2FFF


Top
 Profile  
 
PostPosted: Sun Sep 13, 2015 11:51 am 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6535
Location: Seattle
I've been playing around with putting an ethernet IC – Microchip's ENC624J600 – in a NES cart. It has a bunch of indirect addresses that seem to be ideal for hijacking the NES's OAM DMA for, one of which is at a local-to-the-IC address of 0x7E84 (a post-increment indirect memory access), which isn't too hard to make appear overlapping with $2004.

My real problem is that I haven't figured out what getting data rapidly into this IC is useful for, especially since the rest of the IC's 24 KiB of RAM can be memory-mapped, so it's not like one'd want to assemble an ethernet packet in NES-local RAM and then copy it in.


Top
 Profile  
 
PostPosted: Sun Sep 20, 2015 2:21 pm 
Offline
User avatar

Joined: Mon Feb 07, 2011 12:46 pm
Posts: 941
I have thought of DMA theft and register spying too, although there is still many more possibilities too. But another possibility I have thought of is just the DMA is used to access the range of memory in sequence, which the mapepr might somehow use

_________________
.


Top
 Profile  
 
PostPosted: Tue Sep 29, 2015 11:58 am 
Offline
Formerly 65024U

Joined: Sat Mar 27, 2010 12:57 pm
Posts: 2257
I could think of one: Tons of tiles being written to CHR-RAM. That alone would be worth it for a few people. Would allow for lots of smooth animation. CPU-Side preparation for it during gameplay would be pretty simple, too. As long as the update is in Vblank, you could take control of the chip entirely without worrying about bus collisions. Would be interesting to see.


Top
 Profile  
 
PostPosted: Tue Sep 29, 2015 12:45 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6535
Location: Seattle
DMA to CHR-RAM either requires an FPGA that can entirely interpose the PPU's address bus, or using FPGA-internal block RAM.

And because the NES CPU is so slow, DMA isn't clearly better than just making a dual-ported RAM, unless you're blitting uncompressed tiles from PRG ROM.


Top
 Profile  
 
PostPosted: Tue Sep 29, 2015 1:51 pm 
Offline
User avatar

Joined: Sun May 27, 2012 8:43 pm
Posts: 1312
lidnariq wrote:
DMA to CHR-RAM either requires an FPGA that can entirely interpose the PPU's address bus, or using FPGA-internal block RAM.

And because the NES CPU is so slow, DMA isn't clearly better than just making a dual-ported RAM, unless you're blitting uncompressed tiles from PRG ROM.


Not only that - doesn't the 6502 demand bus mastering at all times? You'd have to pull it off of both busses, making the interposer just a little bigger, just so you can DMA with the PRG bus as a source.


Top
 Profile  
 
PostPosted: Tue Sep 29, 2015 2:01 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6535
Location: Seattle
This is (I think?) still assuming hijacking OAM DMA, so the 2A03 as bus master isn't really a problem...


Top
 Profile  
 
PostPosted: Tue Sep 29, 2015 7:29 pm 
Offline
User avatar

Joined: Sat Jul 12, 2014 3:04 pm
Posts: 950
lidnariq wrote:
DMA to CHR-RAM either requires an FPGA that can entirely interpose the PPU's address bus, or using FPGA-internal block RAM.
The NES drives PPU address lines during vblank when not accessing it? huh. I was suggesting catching it on the CPU read cycle half rather than the write.

lidnariq wrote:
And because the NES CPU is so slow, DMA isn't clearly better than just making a dual-ported RAM, unless you're blitting uncompressed tiles from PRG ROM.

Other simple uses: blit to name/attribute tables, blitting initial data to [W]RAM, saving from RAM to WRAM.
Yes, a dual-ported RAM would also serve for CHR-RAM, and have the advantage of being usable outside of VBlank...but DMA is faster than what the CPU can do otherwise; the best a program can do is 8 cycles per byte (full-unrolled LDA abs-STA abs) to copy from one place to another. [You can set up LDA imm, but that's just moving the cost elsewhere.]
DMA is 2 cycles per byte, isn't it? That's a big savings if you aren't going for dual-ported RAM.
Quote:
You don't need all pins. You just need M2, R/W, /ROMSEL, and A14-A12 to decode a $4xxx write and a $2xxx write.
That seems a little iffy and/or misfire-capable. Certainly wouldn't work if you wanted to map anything into $4xxx.


Top
 Profile  
 
PostPosted: Tue Sep 29, 2015 8:17 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19353
Location: NE Indiana, USA (NTSC)
Myask wrote:
Quote:
You don't need all pins. You just need M2, R/W, /ROMSEL, and A14-A12 to decode a $4xxx write and a $2xxx write.

That seems a little iffy and/or misfire-capable. Certainly wouldn't work if you wanted to map anything into $4xxx.

If there are things mapped into $4xxx, then there are obviously more CPU address lines going into the mapper to decode "this is a mapper port, not an APU port".

I think the idea is that the program does SEI and writes a mapper port to specify the destination for the copy. This sets up a state machine inside the mapper with the following states:
  1. CPU writes a destination to the DMA destination port on the mapper
  2. CPU writes source address bits 7-0 to $4xxx
  3. Mapper waits for a CPU read where source bits 7-4 match !(PRG /CE) and A14-A12
  4. DMA is on


Top
 Profile  
 
PostPosted: Tue Sep 29, 2015 8:20 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6535
Location: Seattle
Yes, the PPU's address bus drivers never turn off. Why would they? That would increase complexity for no reason. And anyway, the PPU's bus is completely busy during rendering: every cycle is either dealing with the PPU's multiplexed address bus or actively transferring data over it.

Myask wrote:
Yes, a dual-ported RAM would also serve for CHR-RAM, and have the advantage of being usable outside of VBlank...but DMA is faster than what the CPU can do otherwise;
Because the PPU's address bus drivers never turn off, hijacking DMA requires something functionally equivalent to dual-ported RAM anyway.

DMA is really just another way of moving time costs around. Regardless of whether one prepares a buffer in RAM, and transfers it using an slow indexed loop, or unrolled LDA $x/STA $y, or more aggressively LDA #im / STA $y, or using DMA, it's still just additional time costs on top of the original data setup. Dual-ported RAM is the logical extreme—"no copy" transfers, because it's already where you want it to be.

Which is why I said that the only use for DMA in preference to dual-ported RAM is specifically DMAing uncompressed data from ROM ... or copying data from a coprocessor like on the SNES with the S-DD1.


Top
 Profile  
 
PostPosted: Tue Sep 29, 2015 8:57 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19353
Location: NE Indiana, USA (NTSC)
Or if DMA can be done using fewer logic resources than dual-ported RAM that can be both written and read back. Some codecs refer to previous decompressed data, and they wouldn't work quite as well on a write-only pseudo-dual-port scheme that uses a FIFO to queue mapper writes to be committed to VRAM during the 14 dummy fetches on each line. The obvious example is LZ77-family codecs. The tile codec I've used in my recent projects is mostly RLE, but it does have a few commands involving back-references:
  • Plane 0 $82 repeats the previous 16-byte packet verbatim.
  • Plane 0 $83 repeats a tile from the previous half of the circular buffer. This is used when decoding pattern tables $0000 and $1000 in parallel, and it dramatically improves compression ratio in NROM games that use the background pattern table select bit to animate some tiles (such as many Shiru games).
  • Plane 1 $82 repeats the previous 8 bytes, which produces a tile with colors 0 and 3.
  • Plane 1 $83 repeats the previous 8 bytes XOR'd with $FF, which produces a tile with colors 1 and 2.

Or if we want to shift updates to vblank to avoid tearing. People bring up tearing when someone mentions CHR HDMA on Game Boy Color.


Top
 Profile  
 
PostPosted: Sat Oct 03, 2015 10:32 am 
Offline
User avatar

Joined: Sat Jul 12, 2014 3:04 pm
Posts: 950
lidnariq wrote:
Yes, the PPU's address bus drivers never turn off. Why would they? That would increase complexity for no reason.

For no reason (read: I didn't think about it) I was thinking that the CPU would be controlling it for a DMA. Which is, of course, the exact opposite point of DMA- something ELSE is directly accessing memory.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group