Taken branch delays interrupt

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

Post Reply
User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Taken branch delays interrupt

Post by blargg » Fri Jun 18, 2010 10:03 pm

I've discovered that a taken non-page-crossing branch ignores IRQ/NMI during its last clock, so that next instruction executes before the IRQ. Other instructions would execute the NMI before the next instruction in this case. This doesn't occur for non-taken branch, or one that crosses a page. It also doesn't occur for JMP. The cpu_interrupts_v2 test on the Wiki now tests this behavior.

I encountered this while improving the new PPU synchronization scheme. I was using a HERE: BCC HERE wait loop for NMI, and was having my NMI occur later than expected. When I changed it back to JMP HERE, it worked fine. It made absolutely no sense, as I thought they were identical. I made sure there was no page crossing, that the carry flag wasn't being set, etc. and finally realized that its timing must actually be different. This behavior is probably already known in 6502 circles, maybe even here, but it was definitely news to me.

The test has an IRQ occur at each cycle within a test sequence, starting at some arbitrary point, and shows how many clocks delayed the IRQ was. T+ is how many clocks since the arbitrary starting point the IRQ was requested, and CK is how many clocks delayed it was, also relative to some arbitrary value. Only the relative values of these matter. PC is the saved PC of the next instruction that was on the stack within the IRQ handler, relative to some starting point. The example code has comments showing the offsets, so you can see where the IRQ was actually vectored.

The first three tests show nothing out of the ordinary, but not the fourth:

Code: Select all

        nop
        ; 04
        jmp :+
        ; 07
:       nop
        ; 08
:       jmp :-

test_jmp
T+ CK PC
00 02 04 NOP
01 01 04 
02 03 07 JMP
03 02 07 
04 01 07 
05 02 08 NOP
06 01 08 
07 03 08 JMP
08 02 08 
09 01 08 

        clc
        ; 04
        bcs :+
        ; 06
        nop
        ; 07
:       lda $100
        ; 0A
:       jmp :-

test_branch_not_taken
T+ CK PC
00 02 04 CLC
01 01 04 
02 02 06 BCS
03 01 06 
04 02 07 NOP
05 01 07 
06 04 0A JMP
07 03 0A 
08 02 0A 
09 01 0A JMP

        clc
        ; 0D
        bcc :+
        ; 0F
        nop
        ; 00
:       lda $100
        ; 03
:       jmp :-

test_branch_taken_pagecross
T+ CK PC
00 02 0D CLC
01 01 0D 
02 04 00 BCC
03 03 00 
04 02 00 
05 01 00 
06 04 03 LDA $100
07 03 03 
08 02 03 
09 01 03 

        clc
        ; 04
        bcc :+
        ; 06
        nop
        ; 07
:       lda $100
        ; 0A
:       jmp :-

test_branch_taken
T+ CK PC
00 02 04 CLC
01 01 04 
02 03 07 BCC
03 02 07 
04 05 0A LDA $100 *** This is the special case
05 04 0A 
06 03 0A 
07 02 0A 
08 01 0A 
09 03 0A JMP
The timing looks similar to the NOT taken branch. Note how the IRQ being requested during the last cycle of the BCC doesn't cause an IRQ immediately after (07), but rather after the LDA (0A). So you get a 5-cycle delay for this case, even though there are no 5-cycle instructions in the test sequence.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg » Thu Jul 01, 2010 8:29 am

I further just found/realized that this effectively increases the number of cycles the next instruction takes. It behaves just as if the taken non-page-crossing branch was a two-cycle instruction, but then the instruction branched to is one cycle longer. This means that if the instruction branched to is an ROL $1234,X, then interrupts will be delayed longer than you thought possible; it means that you must consider the longest instruction 8 cycles with regard to calculating maximum interrupt latency, rather than 7. This is very significant when doing critical timing, and makes me wonder whether the 6502 suffers from it as well, and not just the NES CPU.

User avatar
Bregalad
Posts: 8008
Joined: Fri Nov 12, 2004 2:49 pm
Location: Chexbres, VD, Switzerland

Post by Bregalad » Thu Jul 01, 2010 1:11 pm

Oh I guess this make sense altough it's weird. The 3rd cycle (that is adding the 2nd fetched byte to PC) is considered part of the next instruction. But does this apply as well to branch which cross pages ?

You should ask this question to 6502.org I think.
Useless, lumbering half-wits don't scare us.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg » Thu Jul 01, 2010 2:30 pm

A page-crossing taken branch doesn't have this oddity; it acts like a normal 4-cycle instruction. See timing results in first post. Apparently it only applies to taken non-page-crossing branches.

User avatar
Zepper
Formerly Fx3
Posts: 3223
Joined: Fri Nov 12, 2004 4:59 pm
Location: Brazil
Contact:

Post by Zepper » Fri Jul 02, 2010 6:54 am

- What should be the correct output for test 4-nmi_and_dma ? I don't know an emulator that passes ok.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg » Sat Jul 03, 2010 8:05 am

I updated the cpu_interrupts_v2test to include the correct output, and also renamed 4-nmi_and_dma to 4-irq_and_dma, since it wasn't NMI that it was testing. If you have further questions about this test, start a new thread, since 4-irq_and_dma isn't related to this branch timing issue (5-branch_delays_irq is the one that is).

Post Reply