Every 6502 cycle has a top half (clock = HI) and a bottom half (clock = LO). A read or write may take effect only at one half.Fx3 wrote:Too bad here, unfortunately. I call "CPU clock" one PPU access, rendering 3 pixels. My CPU core is simple regarding the instruction set, of how each opcode is emulated. For some obscure reason, your test ROM is giving +1 cycle error for opcodes $01 and $04 (right now). Opcode $04 is odd... if I take out 1 PPU access, it displays -1 cycle; else, +1 cycle. Go figure...Any help?
Top and bottom halves of 6502 clock cycle
Moderator: Moderators
In this post, Fx3 wondered why a CPU tester was displaying +1 cycle or -1 cycle.
Last edited by tepples on Tue Sep 19, 2006 7:02 am, edited 1 time in total.
No.
I mean using emulation terms. If an instruction takes 1 cycle to read the opcode, 1 cycle to read the argument (next byte, address), 1 cycle to read from address XXh and 1 cycle to do the operation... what am I missing here? How could I think about rising/falling edge of CPU cycles? Looks like that weird MMC3 IRQ clocking!
I mean using emulation terms. If an instruction takes 1 cycle to read the opcode, 1 cycle to read the argument (next byte, address), 1 cycle to read from address XXh and 1 cycle to do the operation... what am I missing here? How could I think about rising/falling edge of CPU cycles? Looks like that weird MMC3 IRQ clocking!

Zepper
RockNES developer
RockNES developer
-
- Posts: 345
- Joined: Fri Jul 29, 2005 3:40 pm
- Location: near chicago
- Contact:
No++.
I need an example. I don't understand the meaning of rising/falling edge of CPU cycles.
I need an example. I don't understand the meaning of rising/falling edge of CPU cycles.
Zepper
RockNES developer
RockNES developer
The CPU clock signal is roughly a square wave:
Each cycle consists of a rising edge, a high state, a falling edge, and a low state. Different things can be defined to happen on the rising or falling edge of the clock signal. For example, when performing a memory read, a CPU may put an address on the address bus on a rising edge and then read the data bus on the next falling edge.
Now if you have two processors running at different speeds:
Then the rising and falling edges of a single CPU cycle may occur a PPU cycle or two apart, which may cause a read or write to appear to be delayed by one or two PPU cycles (a fraction of a CPU cycle).
Code: Select all
high clock state rising edge
vvvvv v
_____ _____ _____ _____
| | | | | | | |
_____| |_____| |_____| |_____| |_ ...
^^^^^ ^
low clock state falling edge
Now if you have two processors running at different speeds:
Code: Select all
CPU
_____ _____ _____ _____
| | | | | | | |
_____| |_____| |_____| |_____| |_ ...
PPU
_ _ _ _ _ _ _ _ _ _ _ _
| | | | | | | | | | | | | | | | | | | | | | | |
_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_| |_ ...
Awesome. ^_^;;
By the way, in terms of emulation of a certain instruction (like LDA $00), how a rising/falling edge would be detected?
By the way, in terms of emulation of a certain instruction (like LDA $00), how a rising/falling edge would be detected?
Zepper
RockNES developer
RockNES developer
LDA $00 has six edges, more or less like the following:
- a rise and fall for reading opcode LDA,
- a rise and fall for reading address $00,
- a rise while address $0000 is put on the address bus, and
- a fall while the value is read from the data bus.
Hmm... interesting. By the way, as far as I know, there's no public docs that cover this rising/falling thing, but only "numeric" CPU cycles as units. Plus, it's picky to emulate something like 1 cycle being broken into 2 steps (rise/fall). Anyway, it makes sense as I could detect unexpected errors in most of other blargg's tests.tepples wrote:LDA $00 has six edges, more or less like the following:Each rise or fall may affect other hardware connected to the CPU bus, such as the PPU registers.
- a rise and fall for reading opcode LDA,
- a rise and fall for reading address $00,
- a rise while address $0000 is put on the address bus, and
- a fall while the value is read from the data bus.
The LDA $xx takes 1 cycle to fetch the opcode, 1 cycle to fetch the immediate byte and 1 cycle to read from RAM[$xx]. By the way, each cycle takes 2 'steps'. This way, I must create a SINGLE PPU access function to correct the problem. By default, mine takes 3 PPU cycles per CPU cycle, and it seems incorrect... -_-;; That's it. Thanks for the help.
Zepper
RockNES developer
RockNES developer
I thought the 6502 also used a two-phase clock like this, where actions occur on the rising edge of each of the phases:
But perhaps this is just another way of describing a single clock at double the rate with actions occurring on the rising and falling edges.
I still don't see how this would help explain Fx3's problem with instruction timing.
Code: Select all
___ ___ ___
_| |___| |___| |___
___ ___ ___
___| |___| |___| |___
I still don't see how this would help explain Fx3's problem with instruction timing.
Just forgive me. ^_^;; Ah yes, thanks for the test ROMs, it's awesome, no joke.blargg wrote:I still don't see how this would help explain Fx3's problem with instruction timing.
Zepper
RockNES developer
RockNES developer
Any test ROM that measures the timing of an instruction by comparing it with the length of a PPU frame will have different behavior if the NMI comes a half-cycle early or late, especially if that triggers the bug where a PPUSTATUS read cancels NMI.blargg wrote:I still don't see how this would help explain Fx3's problem with instruction timing.
My CPU timing test has large margins for timing, since it uses NMI to time thousands of executions of the instruction, not just one. It further allows an error of up to +/- 6 iterations of the loop as compared to the reference values. For instructions which differ in execution by one clock, the iteration count in differs by at least by 200.
In other timing tests which do timing down to 1 PPU clock accuracy, they first synchronize the CPU clock with the PPU clock such that the error is at most 3/4 PPU clock. The PPU clock is master / 4, the CPU master / 12, so there are four different possible fixed synchronizations at power-up, depending on the random state of the dividers (P = PPU, C = CPU, one character = one ~21.5 MHz master clock):
In other timing tests which do timing down to 1 PPU clock accuracy, they first synchronize the CPU clock with the PPU clock such that the error is at most 3/4 PPU clock. The PPU clock is master / 4, the CPU master / 12, so there are four different possible fixed synchronizations at power-up, depending on the random state of the dividers (P = PPU, C = CPU, one character = one ~21.5 MHz master clock):
Code: Select all
P---P---P---P---P---P---
C-----------C-----------
-C-----------C----------
--C-----------C---------
---C-----------C--------