Porting mmc5 PPU cycle counter from mister_nes

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderator: Moderators

User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt »

lidnariq wrote: Sun Feb 21, 2021 6:19 pm Fair enough. It would be awfully weird if they'd bother to store 13 bits of address when the PPU is only going to be using 12 of them .
I don't follow what you mean -- which bit(s) doesn't the PPU use? My memory isn't that great but I think I remember testing each and every PPU address bit A0-A13 (total 14 bits) to see if each mattered for the matching address and I found that they all did matter to the MMC5.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by lidnariq »

Ben Boldt wrote: Sun Feb 21, 2021 8:28 pm I don't follow what you mean -- which bit(s) doesn't the PPU use? My memory isn't that great but I think I remember testing each and every PPU address bit A0-A13 (total 14 bits) to see if each mattered for the matching address and I found that they all did matter to the MMC5.
A13 must be 1 and A12 must be 0; not A13 and A12 must be same as the previous two fetches.
User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt »

Oh I see what you are saying now. Either it keeps enforcing A13=1, A12=0 and just compares A0-A11, or it only enforces it the first time and then has to compare all 14 bits (which would produce the same function). And you are saying that the former is more likely to be how it is actually implemented inside the chip. I totally agree with that. I think that ladder is simpler to explain and probably more efficient in an emulator so we might want to go with that but I do not disagree with your point.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by lidnariq »

Ben Boldt wrote: Sun Feb 21, 2021 8:41 pm you are saying that the former is more likely to be how it is actually implemented inside the chip.
Not only more likely; that was the entire point of what krzysiobal said:
krzysiobal wrote: Tue Oct 02, 2018 3:34 pm No, you are right, only $2000-$2fff can set it. I haven't checked that before with so many datails.
User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt »

I'm sorry I still don't follow how we can know any difference beyond just likelihood, maybe I am not understanding what you are trying to say.

I see these 2 scenarios about the 3 consecutive reads:
  • PPU reads from a nametable (A13=1, A12=0), then reads from the same 14-bit address 2 more times
  • PPU reads from a nametable (A13=1, A12=0), then reads from the same bottom 12 bits 2 more times with (A13=1, A12=0)
That seems indistinguishable to me (??) I think I missed something. It is probably right in front of me and I don't get it...

Edit:
I also can't see how what krzysiobal said indicates a choice between these two. I think we're on different wavelengths on this somehow, we are using the same words but we mean something different.
User avatar
aquasnake
Posts: 515
Joined: Fri Sep 13, 2019 11:22 pm

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by aquasnake »

lidnariq wrote: Sun Feb 21, 2021 8:22 pm
aquasnake wrote: Sun Feb 21, 2021 9:12 am The remaining key graphics technologies of mmc5 must be solved to emulate:
1. 8x16 spr mode
2. vsplit mode
What's hard about these? We know the MMC5 must keep track of the current sliver # for the scanline, so it can just return (name+pattern table #1) for the first N, (name+pattern table #2) for the next 34-N, and (pattern table #3) for the final 8 sliver fetches. The only obnoxious part is that the "stutter" that the MMC5 looks for is between background sliver fetches #2 and #3. But that doesn't matter if you are breaking the abstraction between the emulated PPU and the emulated MMC5.
3. ex-attr mode
The MMC5 has to be doing the fetch from the internal EXRAM from the same address and same time as the nametable fetch is happening. After that, the rest is straight-forward: it has to route the relevant bits from that extra 8 bits of nametable to the PPU's data bus during the attribute fetch, and to the CHR's address bus during the pattern fetches.
For ex-attr mode it needs to generate nametable dynamically and switch pattern bank dynamically (with 4K granularity of single screen bg tile). The former needs to cheat ppu_data line, which also needs to redesig the hardware, to add 16 IOs to isolate the bi-directional data bus. That has exceeded the IO usage of qfp-100 package, should be the last mmc5 function supported with priority
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by lidnariq »

Just to be clear: I'm not in any way disagreeing with you. I just forgot about A12 being involved in the mask. And was bad at reading you mentioning A12 in this post just a few moments prior: viewtopic.php?p=265486#p265486

After that, it was confusion about definitions of "use" and "latch". We know it "uses" all the address lines. But "use" is different from "latch". The 2A03 and 2C02 dice show they mostly remembered to remove silicon that was always in a fixed state (compare visual2a03 pcm_lc0 through pcm_lc3 to higher bits pcm_lc4 through pcm_lc11), although they didn't always (pcm_a14 exists).
User avatar
Ben Boldt
Posts: 1149
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt »

OK that makes sense lidnariq.

Sorry I hijacked your thread aquasnake, that got away from me a little.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by lidnariq »

aquasnake wrote: Sun Feb 21, 2021 9:10 pm For ex-attr mode it needs to generate nametable dynamically and switch pattern bank dynamically (with 4K granularity of single screen bg tile). The former needs to cheat ppu_data line, which also needs to redesig the hardware, to add 16 IOs to isolate the bi-directional data bus. That has exceeded the IO usage of qfp-100 package
So it's a resource utilization problem instead of a implementation problem? Getting data out of the emulated EXRAM to the PPU when it's needed?

It looks like 8x16 sprites should work fine despite that constraint? Because there you just have to use PPUA[10..12] during the 8 sprite slivers to select the bank. Right?
User avatar
aquasnake
Posts: 515
Joined: Fri Sep 13, 2019 11:22 pm

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by aquasnake »

Ben Boldt wrote: Sun Feb 21, 2021 9:21 pm OK that makes sense lidnariq.

Sorry I hijacked your thread aquasnake, that got away from me a little.
Any detail involved is welcome :)
Last edited by aquasnake on Tue Feb 23, 2021 10:54 pm, edited 1 time in total.
User avatar
aquasnake
Posts: 515
Joined: Fri Sep 13, 2019 11:22 pm

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by aquasnake »

lidnariq wrote: Sun Feb 21, 2021 9:23 pm
aquasnake wrote: Sun Feb 21, 2021 9:10 pm For ex-attr mode it needs to generate nametable dynamically and switch pattern bank dynamically (with 4K granularity of single screen bg tile). The former needs to cheat ppu_data line, which also needs to redesig the hardware, to add 16 IOs to isolate the bi-directional data bus. That has exceeded the IO usage of qfp-100 package
So it's a resource utilization problem instead of a implementation problem? Getting data out of the emulated EXRAM to the PPU when it's needed?

It looks like 8x16 sprites should work fine despite that constraint? Because there you just have to use PPUA[10..12] during the 8 sprite slivers to select the bank. Right?

Code: Select all

				-SPR Mode-  -PRG Mode-  -CHR Mode-  -Ext RAM Mode-   -Name----------------------
				8x16        4x8         8x1         ex-attr*       - Just Breed
It seems that Just Breed uses both of them at the same time, which should not affect it. Ext-attr only affects the read operation of the single screen attribute table ($23c0 - $23ff), while during the acquisition of the pattern table or the write operation of the 3rd nametable, ppu_data bus is bypassed to none mapping
User avatar
aquasnake
Posts: 515
Joined: Fri Sep 13, 2019 11:22 pm

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by aquasnake »

Ben Boldt wrote: Sun Feb 21, 2021 8:41 pm Oh I see what you are saying now. Either it keeps enforcing A13=1, A12=0 and just compares A0-A11, or it only enforces it the first time and then has to compare all 14 bits (which would produce the same function). And you are saying that the former is more likely to be how it is actually implemented inside the chip. I totally agree with that. I think that ladder is simpler to explain and probably more efficient in an emulator so we might want to go with that but I do not disagree with your point.
I think just detecting 3 consecutive read operations(including 2 virtual reads) when ppu_addr[13:12] = = 2'b10 to mark the start of a new scan line. It is not necessary to monitor all ppu address lines
User avatar
aquasnake
Posts: 515
Joined: Fri Sep 13, 2019 11:22 pm

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by aquasnake »

krzysiobal wrote: Sun Nov 25, 2018 4:58 pm Ok, I did further researchs around how MMC5 detects 8x8/8x16 sprites and on which scanline cycles it uses $5120-$5127 and on which $5128-$512B. In order for this, I made a special test case for KrzysioKazzo that simulates CPU/PPU cycles (after each PPU cycle there is CPU cycle so MMC5 never thinks that reset state occured)

I set it into CHR 8x1k mode, write $FF to $5120-$5127 (sprites) and $00 to $5128-$512b (tiles) and observe CHR-A10 so that it is easy to distinguish, when it uses sprite/tiles banks.

1. MMC5 switches to 8x16 mode when
* $2000.5 is written with 1
AND
* at least one of those bits ($2000.3 or $2000.4) is written with 1
MMC5 sniffs writes to $2000/$2001 (it only checks for those two addresses, no mirrors are taken into account).

2. I will count the PPU reads in every scanline as #1, #2, #3, .. #170 (so that cycle 0->idle, cycles 1-2 -> #1, cycles 3-4 -> #2, .., 339-340 -> #170)

3. During the pre-render scanline it never uses sprites banks (logical, cause it needs three consecutive reads from $2000-$2fff to detect scanline)

4. During the scanline 0, it uses sprite banks for reads: #2, #3, #4 and #130-#161
For further scanlines - only #130-#161
Image
Image
Take look that CHR-A10 is changing on both edges of PPU_!RD which might reveal how the MMC5 ppu cycle counter is implemented (if I'd do that in VHDL, everything would change on the falling edge)

5. The counter is not only counting passing edge of PPU_!RD but it also looks at the addresses. So if PPU will be fetching from $0000, $0001, $0002, $0003, etc - it will not work. There must be some more logic underneath that.

6. The counter won't count passing scanlines - if the frame has even 400 scanlines, the above logic will work for all of them.

Guest wrote: Sun Oct 16, 2005 9:25 am I think the way the MMC5 detects a new scanline is by looking for three consecutive nametable fetches. This only happens once per scanline, with the third fetch coming at PPU cycle 1 of a new scanline (numbering cycles from 0-340 ). I'm thinking that when the MMC5 sees three straight NT fetches, if checks the in-frame flag and, if clear, sets it and clears the scanline counter. If the in-frame flag is set, the scanline counter would be incremented and the IRQ flag set if the value matches what was written in $5203. The in-frame flag remains set until at least three PPU cycles pass without a VRAM fetch, at which point the flag is cleared. That's my theory, anyway - I'm sure there are other ways to do it.

I would be particularly interested in how the MMC5 knows if 8x8 or 8x16 sprites are in use. The only way I can think of is to monitor writes to $2000. Maybe someone can try writing to $3FF0 to try to trick it?
so here is the code:

Code: Select all

	// detects a new scanline by looking for three consecutive nametable fetches
	// with the third fetch coming at PPU cycle 1 of a new scanline
	always @ (negedge ppu_rd)
	begin
		ppu_rd_counter <= ppu_rd_counter + 1;
		if (ppu_addr_in[13:12] == 2'b10)
		begin
			if (ppu_nt_rd_counter < 3)
				ppu_nt_rd_counter <= ppu_nt_rd_counter + 1;
			else
				ppu_rd_counter <= 0;
		end else 
			ppu_nt_rd_counter <= 0;
	end
无标题.png
Combined with krzysiobal's research, it looks like:

Code: Select all

wire [8:0] ppu_cycle = {ppu_rd_counter[7:0], 1'b1};
wire spr_fetch = (ppu_rd_counter >= 129) & (ppu_rd_counter <= 160);
During the scan line 0 (the first line in the upper left corner of the screen), several pixels may be lost?
Last edited by aquasnake on Mon Feb 22, 2021 7:20 pm, edited 1 time in total.
lidnariq
Posts: 11432
Joined: Sun Apr 13, 2008 11:12 am

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by lidnariq »

aquasnake wrote: Sun Feb 21, 2021 11:18 pm // detects a new scanline by looking for three consecutive nametable fetches
// with the third fetch coming at PPU cycle 1 of a new scanline
You could get a false positive here from CPU-triggered reads; I don't know if any games will rely on this but if so you may need to add some extra logic to protect against that.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by tepples »

Would counting M2s between PPU /RDs work?
Post Reply