My emu vs. Nintendulator: branch cycles

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
lemmurg
Posts: 25
Joined: Fri May 07, 2021 3:22 pm

My emu vs. Nintendulator: branch cycles

Post by lemmurg »

Hi, I'm here to ask stupid question about NES emulation.

I'm running nestest.nes, then compare my results against Nintendulator's.

This is the first location where the results diverge:

Code: Select all

; Nintendulator
CFFE  F0 05     BEQ $D005                       A:5A X:81 Y:69 P:27 SP:FB PPU:292, 22 CYC:2605
D005  A9 AA     LDA #$AA                        A:5A X:81 Y:69 P:27 SP:FB PPU:301, 22 CYC:2608

; my stupid emu
CFFE  F0 05     BEQ                             A:5A X:81 Y:69 P:27 SP:FB PPU:292, 22 CYC:2605     ..-..IZC
D005  A9 AA     LDA                             A:5A X:81 Y:69 P:27 SP:FB PPU:304, 22 CYC:2609     ..-..IZC
Nintendulator says this branch takes three CPU cycles, my stupid emu says it takes four: 2 + 1 (branch taken) + 1 (page boundary crossed).
I suppose I'm wrong on this, but why?
lidnariq
Posts: 11429
Joined: Sun Apr 13, 2008 11:12 am

Re: My emu vs. Nintendulator: branch cycles

Post by lidnariq »

The page boundary is relative to the end of the branch instruction. Hence why the range is +129 to -126 bytes from the beginning of the instruction.
lemmurg
Posts: 25
Joined: Fri May 07, 2021 3:22 pm

Re: My emu vs. Nintendulator: branch cycles

Post by lemmurg »

lidnariq wrote: Fri May 07, 2021 10:29 pm The page boundary is relative to the end of the branch instruction. Hence why the range is +129 to -126 bytes from the beginning of the instruction.
Things the opcode reference didn't tell me. Thanks!
lemmurg
Posts: 25
Joined: Fri May 07, 2021 3:22 pm

Re: My emu vs. Nintendulator: branch cycles

Post by lemmurg »

All right, here's another one, this time the other way around. Why four cycles for this branch?

Code: Select all

; Nintendulator
F2FC  F0 02     BEQ $F300                       A:52 X:02 Y:E9 P:67 SP:FB PPU:339,204 CYC:23308
F300  C8        INY                             A:52 X:02 Y:E9 P:67 SP:FB PPU: 10,205 CYC:23312

; my flawless emulator
F2FC  F0 02     BEQ                             A:52 X:02 Y:E9 P:67 SP:FB PPU:339,204 CYC:23308    .V-..IZC
F300  C8        INY                             A:52 X:02 Y:E9 P:67 SP:FB PPU:  7,205 CYC:23311    .V-..IZC
Branch taken, but no page crossed should be 3 cycles?
User avatar
Quietust
Posts: 1918
Joined: Sun Sep 19, 2004 10:59 pm
Contact:

Re: My emu vs. Nintendulator: branch cycles

Post by Quietust »

lemmurg wrote: Sat May 08, 2021 7:11 am All right, here's another one, this time the other way around. Why four cycles for this branch?

Code: Select all

; Nintendulator
F2FC  F0 02     BEQ $F300                       A:52 X:02 Y:E9 P:67 SP:FB PPU:339,204 CYC:23308
F300  C8        INY                             A:52 X:02 Y:E9 P:67 SP:FB PPU: 10,205 CYC:23312

; my flawless emulator
F2FC  F0 02     BEQ                             A:52 X:02 Y:E9 P:67 SP:FB PPU:339,204 CYC:23308    .V-..IZC
F300  C8        INY                             A:52 X:02 Y:E9 P:67 SP:FB PPU:  7,205 CYC:23311    .V-..IZC
Branch taken, but no page crossed should be 3 cycles?
But the page was crossed - the instruction ended at $F2FE and the branch landed at $F300.
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
lemmurg
Posts: 25
Joined: Fri May 07, 2021 3:22 pm

Re: My emu vs. Nintendulator: branch cycles

Post by lemmurg »

Quietust wrote: Sat May 08, 2021 7:32 am But the page was crossed - the instruction ended at $F2FE and the branch landed at $F300.
But F300 - F2FE = 2, and
lidnariq wrote: Fri May 07, 2021 10:29 pm The page boundary is relative to the end of the branch instruction. Hence why the range is +129 to -126 bytes from the beginning of the instruction.
I'm a little bit confused now.
User avatar
Quietust
Posts: 1918
Joined: Sun Sep 19, 2004 10:59 pm
Contact:

Re: My emu vs. Nintendulator: branch cycles

Post by Quietust »

lemmurg wrote: Sat May 08, 2021 7:40 am
Quietust wrote: Sat May 08, 2021 7:32 am But the page was crossed - the instruction ended at $F2FE and the branch landed at $F300.
But F300 - F2FE = 2, and
lidnariq wrote: Fri May 07, 2021 10:29 pm The page boundary is relative to the end of the branch instruction. Hence why the range is +129 to -126 bytes from the beginning of the instruction.
I'm a little bit confused now.
What lidnariq meant was that the page crossing is measured based on the "starting" address and the "ending" address, and the "starting" address is the byte immediately after the branch instruction.

In the first example with "CFFE: BEQ $D005", the starting address was $D000 and the ending address was $D005, and since those were in the same page, no page crossing happened.

In the second example with "F2FC: BEQ $F300", the starting address was $F2FE and the ending address was $F300, and since those were in different pages, a page crossing happened.

If you're still confused, a "page" is defined as an aligned 256-byte region of memory (i.e. beginning at $xx00 and ending at $xxFF), and the 6502 has 256 of them. The first page is $0000-$00FF (and is called the "zero page"), the second page is $0100-$01FF (and contains the Stack), and they continue as such to $FF00-$FFFF.
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
lemmurg
Posts: 25
Joined: Fri May 07, 2021 3:22 pm

Re: My emu vs. Nintendulator: branch cycles

Post by lemmurg »

Quietust wrote: Sat May 08, 2021 7:47 am If you're still confused, a "page" is defined as an aligned 256-byte region of memory (i.e. beginning at $xx00 and ending at $xxFF), and the 6502 has 256 of them
Yeah I know that, that's why the explanation confused me in the first place. I get it now: the end of the instruction determines what the current page is, not the PC. Thank you.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: My emu vs. Nintendulator: branch cycles

Post by tepples »

One trick that helps me remember things like this is understanding how different behaviors are special cases of the same general rule. The rule for page crossing penalties on the 6502 is that a penalty happens when the upper byte (bits 15-8) of the base address differs from the upper byte of the effective address. A difference in bit 8 alone ((base XOR effective) & $0100) is sufficient to determine this. Penalties apply to the following addressing modes:
  • For absolute indexed read instructions, such as lda $A3F0,X or lda $A3F0,Y, the base address is $A3F0, and the effective address is $A3F0 plus the value in register X. If X is at least $10 (decimal 16), the addition will cause the effective address to become at least $A400, incurring the penalty.
  • For indirect indexed read instructions, such as lda ($02),Y, the base address is whatever is in $0002 and $0003, and the effective address is the base address plus the value in register Y. If the value in $0002 plus the value in register Y is at least $0100 (decimal 256), the penalty applies.
  • For branch instructions, such as bne loop, the base address is the address of the following instruction (sometimes called the "untaken" address), and the effective address is the address of the following instruction plus the signed offset (the "taken" address). If the branch is taken, and this changes bit 8 compared to the untaken branch, the penalty applies. When the untaken address ends in $00 or $01, corresponding to a branch instruction that began on an address ending $FE or $FF, this may be a bit counterintuitive at first. Q helped you identify the base address in this case.
No penalty applies to indexed write (such as sta $03F0,X), indirect indexed write (such as sta ($02),Y), or indexed read-modify-write (such as inc $03F0,X). This is because indexed writes and RMWs are timed as if the penalty always applied, and thus the penalty is included in the instruction's base cycle count.


EDIT: Removed a zero page-related digression from the last paragraph after Q made a good point about it being confusing
User avatar
Quietust
Posts: 1918
Joined: Sun Sep 19, 2004 10:59 pm
Contact:

Re: My emu vs. Nintendulator: branch cycles

Post by Quietust »

tepples wrote: Sat May 08, 2021 8:27 am No penalty applies to indexed write (such as sta $03F0,X), indirect indexed write (such as sta ($02),Y), indexed read-modify-write (such as inc $03F0,X), or zero page indexed read (such as lda $A0,X). This is because indexed writes and zero page indexed accesses are timed as if the penalty always applied, and thus the penalty is included in the instruction's base cycle count.
Writes always take the penalty because they have to - you can get away with a speculative "wrong" read (unless you're hitting an I/O register with side effects), but a speculative write to the wrong address would be disastrous.

Zero-page indexed modes, on the other hand, never have page-cross penalties because they're actually incapable of crossing page boundaries - an absolute indexed read from "$00FF,X" with X=1 will read from $0100 with a penalty, but a zero-page indexed read from "$FF,X" will read from $0000.
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: My emu vs. Nintendulator: branch cycles

Post by tepples »

You make a good point about how my wording about zero page indexed was confusing.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: My emu vs. Nintendulator: branch cycles

Post by tokumaru »

lemmurg wrote: Sat May 08, 2021 8:07 amI get it now: the end of the instruction determines what the current page is, not the PC. Thank you.
But the PC *is* after the end of the instruction by the time the branch offset is applied to it. The PC doesn't sit still at the beginning of the instruction as the CPU is executing code, the PC is actually incremented as each byte of the instruction is fetched, so by the time the whole instruction has been read the PC is already pointing at the byte that comes immediately after.

That's why the branch offsets and the page crossing penalties are relative to the end of the instruction, that's where the PC will *naturally* be after the instruction has been fetched for execution.
lemmurg
Posts: 25
Joined: Fri May 07, 2021 3:22 pm

Re: My emu vs. Nintendulator: branch cycles

Post by lemmurg »

I used a table with instruction sizes and incremented PC after the instructions executed. It made logging and debugging somewhat easier for me in the beginning when I just wanted to get the cycle counts right, and it was less code to write and debug.

All the off-by-one bugs are fixed now, and PC is properly incremented when fetching the instructions, as it should be.
Post Reply