Porting mmc5 PPU cycle counter from mister_nes

Discuss hardware-related topics, such as development cartridges, CopyNES, PowerPak, EPROMs, or whatever.

Moderators: B00daW, Moderators

User avatar
aquasnake
Posts: 207
Joined: Fri Sep 13, 2019 11:22 pm

Porting mmc5 PPU cycle counter from mister_nes

Post by aquasnake » Sun Feb 21, 2021 9:12 am

Porting mmc5 PPU cycle counter from mister_nes

The NESDEV community has discussed the implementation of scanline detection, in-frame detection and other details. My mmc5 verilog has been able to make some games work properly, such as Metal Slader Glory.

For games in 8x16 sprite mode, some can be opportunistic to make the display look normal, but only half of sprite pattern table can be obtained (8K sprite pattern is actually used, while 4K can only be selected for speculation), such as Castlevania III, Gun Sight (no matter the display problem of sunk city in ex NT mode). The following two games of 8x16 SPR mode look normal:
Gradius AC2007_ op4 [h]
Gradius II AC Death20 [h]

The remaining key graphics technologies of mmc5 must be solved to emulate:
1. 8x16 spr mode
2. vsplit mode
3. ex-attr mode

The above three are based on the implementation of accurate PPU cycle counter. In the open source code of MISTER_NES, I intercepted a part and sorted it out as follows:

Code: Select all

// For NTSC only, the *last* cycle of odd frames is skipped.
// In Visual 2C02, the counter starts at zero and flips at scanline 256.
always_comb begin
	case (sys_type)
		2'b00,2'b11: begin // NTSC/Vs.
			vblank_start_sl = 9'd241;
			vblank_end_sl   = 9'd260;
			skip_en         = 1'b1;
		end

		2'b01: begin       // PAL
			vblank_start_sl = 9'd241;
			vblank_end_sl   = 9'd310;
			skip_en         = 1'b0;
		end

		2'b10: begin       // Dendy
			vblank_start_sl = 9'd291;
			vblank_end_sl   = 9'd310;
			skip_en         = 1'b0;
		end
	endcase
end
wire skip_pixel = is_pre_render & ~even_frame_toggle & rendering_sr[3] & skip_en;
wire end_of_line = (cycle[8:3] == 42) & (cycle[3:0] == (skip_pixel ? 3 : 4));

// Set if the current line is line 0..239
always @(posedge clk) if (reset) begin
	cycle <= 338;
end else if (ce) begin
	// On a real AV famicom, the NMI even_odd_timing test fails with 09, this SR is to make that happen
	rendering_sr <= {rendering_sr[2:0], in_frame};
	cycle <= end_of_line ? 9'd0 : cycle + 9'd1;
end

For the different timing of TV systems, the expression can be simplified by ignoring the difference of the last PPU cycle

Code: Select all

wire end_of_line = (cycle[8:3] == 42) & (cycle[3:0] ==3);

// Set if the current line is line 0..239
always @(posedge clk) 
if (reset) begin
	cycle <= 338;
end else if (ce) begin
	// On a real AV famicom, the NMI even_odd_timing test fails with 09, this SR is to make that happen
	cycle <= end_of_line ? 9'd0 : cycle + 9'd1;
end

Because 340 PPU cycles (0.. 339) are intercepted, the last cycle is ignored, so the head alignment must be synchronized. How to find the initial PPU cycle of each rendering line?

I think that once this problem is solved, the hardware emulation of mmc5 will be completed to 90%, only the last ex-attr may be a problem left over.

Finally, the parameters obtained by setting write breakpoints for some mmc5 games are sorted into a table:

Code: Select all

				-MMC5 mapper, originally much complex bankswitching
				-SPR Mode-  -PRG Mode-  -CHR Mode-  -Ext RAM Mode-   -Name----------------------
				8x16        4x8         8x1         ex-attr*       - Just Breed
				8x16        4x8         8x1         ex-nt,vsplit*  - Uchuu Keibitai SDF
				8x16        4x8         8x1         ex-nt,3s       - Gun Sight
				8x16        16+8+8      8x1         ex-nt,3s       - Castlevania III
				8x8         4x8         8x1         ex-ram         - Super Chinese 2 [T+ChS]
				8x16        4x8         8x1         ex-ram         - Hoshi no Kirby [T+ChS]
				8x16        2x16        8x1         ex-ram         - Gekikame Ninja Den [T+ChS]
				8x8         4x8         8x1         ex-ram         - Super Mario All-Stars NES [h]
				8x16        2x16        8x1         ex-ram         - Zelda - The Legend of Link [h]
				8x8         4x8         8x1         none           - Ultimate Mortal Kombat 3 [h]

				8x8         4x8         2x4         ex-ram         - Metal Slader Glory
				8x8         16+8+8      2x4         ex-ram         - Saint Seiya - Ougon Densetsu [T+ChS]
				8x8         2x16        2x4         ex-ram         - Grand Master [T+ChS]

				8x16        4x8         4x2         none           - Salamander AC Delta09a [h]

				8x16        4x8         8x1         none           - Gradius AC2007_op4 [h]
				8x16        4x8         8x1         none           - Gradius II AC Death20 [h]

User avatar
Ben Boldt
Posts: 751
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt » Sun Feb 21, 2021 10:04 am

Don't forget SimCity!

lidnariq
Posts: 10241
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by lidnariq » Sun Feb 21, 2021 12:08 pm

Don't we already know that everything in the MMC5 is synchronized to finding the same address fetched from the nametables three times in a row?

User avatar
Ben Boldt
Posts: 751
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt » Sun Feb 21, 2021 1:46 pm

As far as synchronization, lidnariq is absolutely right that the 3 identical nametable reads in a row is the single thing that sets that up.

The "in frame" flag gets set and scanline counter gets updated actually at the 4th PPU read, which can be any address. There is also more logic involved to detect when a scanline ended, and logic that detects when a whole frame ended. Since these sequences depend both on CPU and PPU clocks, it is actually broken into 2 interacting state machines with plenty of corner conditions (which may or may not reasonably happen in a real Famicom...).

We have some detailed info in the wiki here:
https://wiki.nesdev.com/w/index.php/MMC ... anline_IRQ

User avatar
Ben Boldt
Posts: 751
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt » Sun Feb 21, 2021 2:54 pm

Ben Boldt wrote:
Sun Feb 21, 2021 1:46 pm
As far as synchronization, lidnariq is absolutely right that the 3 identical nametable reads in a row is the single thing that sets that up.

The "in frame" flag gets set and scanline counter gets updated actually at the 4th PPU read, which can be any address. There is also more logic involved to detect when a scanline ended, and logic that detects when a whole frame ended. Since these sequences depend both on CPU and PPU clocks, it is actually broken into 2 interacting state machines with plenty of corner conditions (which may or may not reasonably happen in a real Famicom...).

We have some detailed info in the wiki here:
https://wiki.nesdev.com/w/index.php/MMC ... anline_IRQ
Edit:
There is more work to be done on the MMC5 scanline counter in the wiki. The pseudocode and the diagram do not match, they are quite different. The diagram only shows when we are in frame and not in frame -- it doesn't even show anything about when scanlines get counted or reset to 0! The pseudocode isn't waiting for the 4th PPU read AFAIK. I think we have all the data, I just need to go back and read carefully and tidy things up.

User avatar
Ben Boldt
Posts: 751
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt » Sun Feb 21, 2021 3:26 pm

I think it works like this but I am sure there are problems:

Code: Select all

onCpuCycle()  // M2 Rising Edge
{
    if( (FFFA == cpu_address) || (FFFB == cpu_address) )
    {
        frame_state = 0;
    }
    else
    {
        switch( frame_state )
        {
            case 0:  // In Frame: Note: frame_state 0 = in frame.
            case 1:  // In Frame
            case 2:  // In Frame
                if( 1 == PPU_RD )  // If PPU is not doing a read during this CPU cycle
                {
                    frame_state++;
                }
                break;
            case 3:  // Not in frame
                // Do nothing.
                break;
        }
    }
}
// In-frame status bit reflects frame_state.

onPpuRead()  // PPU /RD Falling Edge
{
    switch( lock_state )
    {
        case 0:  // Locked
            if( (ppu_address >= 2000) && (ppu_address <= $2FFF) )  // Nametable Read
            {
                lock_state++;
            }
            if( frame_state < 3 )
            {
                frame_state = 0;
            }
            break;
        case 1:  // Locked
        case 2:  // Locked
            if( ppu_address == prev_ppu_address )  // Matching Nametable Read
            {
                lock_state++;
            }
            else
            {
                lock_state = 0;
            }
            if( frame_state < 3 )
            {
                frame_state = 0;
            }
            break;
        case 3:  // Unlocked
            if( 3 == frame_state )
            {
                scanline_counter = 0;  (Should this prevent incrementing scanline_counter below??)
            }
            if( (ppu_address < 2000) || (ppu_address > $2FFF) )  // Non-Nametable Read
            {
                lock_state = 0;
                if( frame_state < 3 )
                {
                    scanline_counter++;  // <-- Need to verify scanline count incrementing.
                    if( scanline_counter == [$5203] )
                    {
                        irq_pending = true;  // <-- Need to verify when IRQ goes low, including any additional cycles
                    }
                }
            }
            frame_state = 0;  // frame_state resetting does not check the PPU address range.
            break;
    }
    prev_ppu_address = ppu_address;
}
Please help tweak this if anyone has corrections/comments. I have now lost all confidence in our understanding of the MMC5 scanline counter. We have a lot of really good data but not well-organized. Basic stuff like, when does the scanline counter reset to 0, does the IRQ happen at the beginning or end of the scanline, I can't access the answers to these basic things.
Last edited by Ben Boldt on Sun Feb 21, 2021 6:09 pm, edited 1 time in total.

tepples
Posts: 22281
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by tepples » Sun Feb 21, 2021 3:33 pm

Does it require three identical address reads or just several consecutive reads with A13 [redacted]?

EDIT: This comment had a mistake. I was rushed to leave for grocery shopping.

User avatar
Ben Boldt
Posts: 751
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt » Sun Feb 21, 2021 4:55 pm

tepples wrote:
Sun Feb 21, 2021 3:33 pm
Does it require three identical address reads or just several consecutive reads with A13 low?
As I remember, they have to be fully identical, all address bits.

lidnariq
Posts: 10241
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by lidnariq » Sun Feb 21, 2021 4:56 pm

Also, A13 high.

User avatar
Ben Boldt
Posts: 751
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt » Sun Feb 21, 2021 5:31 pm

lidnariq wrote:
Sun Feb 21, 2021 4:56 pm
Also, A13 high.
My understanding (which is not on solid ground) is that the first read has to be in the range $2000-2FFF. A13 high alone covers the range $2000-3FFF, so this suggest that A12 also need to be low. Since the next 2 reads must match, they automatically must match the same criteria.

Going back and reading some early things in the "MMC5 Hacking Adventures" thread:
krzysiobal, Oct 02, 2018 wrote:Actually, the scanline counter is quite simple - at start of each PPU read cycle it just looks for the last three 3 PPU read addresses and whenever it sees three from same address, it increments its value.
There may be newer posts than this with more detail, and what I am about to describe would not be a valid PPU sequence. But I don't think that if you just keep reading the same nametable address repeatedly, it increments the scanline counter on the 4th, 5th, 6th, 7th... read. I do not recall that reading the same address like that repeatedly would just march the scanline counter right up like that. This has been a couple years but that does not seem like it was the case. Also I remember some things where the IRQ happened 1 additional cycle later, stuff like that. None of that stuff is captured clearly in the wiki or diagram.

lidnariq
Posts: 10241
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by lidnariq » Sun Feb 21, 2021 5:37 pm

My point was that - although I can't find this right now - that someone found that it was specifically three reads from the same address, and that same address was in the range of $2000-$3FFF.

Just describing the MMC5, not the NES context.

User avatar
Ben Boldt
Posts: 751
Joined: Tue Mar 22, 2016 8:27 pm
Location: Minnesota, USA

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by Ben Boldt » Sun Feb 21, 2021 6:06 pm

I don't know if I have it in me right now to dig back into all of this... I found something along the lines what you are talking about though:
Ben Boldt wrote:
Tue Oct 02, 2018 3:18 pm
I still see a very persistent range of PPU addresses that can set the status which is different that yours:

$0000
-> Doesn't set
$1FFF

$2000
-> Does set after counting PPU /RD falling edges
$2FFF

$3000
-> Doesn't set for me, does set for you.
$3FFF

How can yours set in range $3000 - 3FFF but not mine? Do you write to any registers or drive any other inputs low at the beginning of your test?
krzysiobal wrote:
Tue Oct 02, 2018 3:34 pm
How can yours set in range $3000 - 3FFF but not mine? Do you write to any registers or drive any other inputs low at the beginning of your test?
No, you are right, only $2000-$2fff can set it. I haven't checked that before with so many datails.

lidnariq
Posts: 10241
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by lidnariq » Sun Feb 21, 2021 6:19 pm

Fair enough. It would be awfully weird if they'd bother to store 13 bits of address when the PPU is only going to be using 12 of them .

User avatar
aquasnake
Posts: 207
Joined: Fri Sep 13, 2019 11:22 pm

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by aquasnake » Sun Feb 21, 2021 8:12 pm

Mister is equivalent to exposing part of the signal line of PPU to mmc5, but the real cart is different, only through part of the slot interface (ppu_ addr[13:10], /ppu_rd, m2) transmits temporal logic and generates timing. Mister and software emulator can actively synchronize PPU cycle, while the real hardware cart has to be passive monitoring.

Dynamic switch of sprite/background pattern based on PPU cycle:

Code: Select all

`ifdef SPR_8X16_MODE
	wire spr_fetch = ppu_cycle[8] & ~ppu_cycle[6]; // fetch sprite pattern between 256..319 cycles
	wire chr_bank_set = (spr_mode && in_frame && !in_frame_clear) ? ~spr_fetch : chr_bank_last;
`endif
Moniting $2000(to minimize the input address lines, in mirror address mode)

Code: Select all

`ifdef SPR_8X16_MODE
if (cpu_addr_in[14:13] == 2'b01 && cpu_addr_in[2:0] == 3'b000) // $2000
begin
	if (spr_mode ^ cpu_data_in[5]) begin
		chr_bank_last = 0; // set to 0 when changing sprite size
		spr_mode = cpu_data_in[5]; // 8x16 sprite mode
	end
end
if (cpu_addr_in[14:13] == 2'b01 && cpu_addr_in[2:0] == 3'b001) // $2001
	in_frame_clear = ~(cpu_data_in[4] & cpu_data_in[3]);
`endif
Judging the last write to $5120-$512B to route chr_bank_set A or B when accessing $2006-$2007 during vblank

Code: Select all

if (cpu_addr_in[14:4] == 11'h512)
	chr_bank_last = cpu_addr_in[3] & spr_mode;
Last edited by aquasnake on Sun Feb 21, 2021 8:48 pm, edited 3 times in total.

lidnariq
Posts: 10241
Joined: Sun Apr 13, 2008 11:12 am
Location: Seattle

Re: Porting mmc5 PPU cycle counter from mister_nes

Post by lidnariq » Sun Feb 21, 2021 8:22 pm

aquasnake wrote:
Sun Feb 21, 2021 9:12 am
The remaining key graphics technologies of mmc5 must be solved to emulate:
1. 8x16 spr mode
2. vsplit mode
What's hard about these? We know the MMC5 must keep track of the current sliver # for the scanline, so it can just return (name+pattern table #1) for the first N, (name+pattern table #2) for the next 34-N, and (pattern table #3) for the final 8 sliver fetches. The only obnoxious part is that the "stutter" that the MMC5 looks for is between background sliver fetches #2 and #3. But that doesn't matter if you are breaking the abstraction between the emulated PPU and the emulated MMC5.
3. ex-attr mode
The MMC5 has to be doing the fetch from the internal EXRAM from the same address and same time as the nametable fetch is happening. After that, the rest is straight-forward: it has to route the relevant bits from that extra 8 bits of nametable to the PPU's data bus during the attribute fetch, and to the CHR's address bus during the pattern fetches.

Post Reply