It is currently Thu Oct 19, 2017 10:18 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 70 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next
Author Message
PostPosted: Thu Dec 17, 2015 5:32 pm 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3064
Location: Brazil
aLaix wrote:
@Zepper
Sorry for the misunderstanding, we pass suppression test, what we are not passing is 07-NMI_on_timing test.


Oh, okay. Do what I described and tell me the result.
zeroone wrote:
Zepper wrote:
I don't know in software meanings, but in an easy manner, are you talking about "finetunning" the CPU/PPU time sync?


Does RockNES advance the CPU by 1 microcode at a time or by 1 full instruction at a time?

Runs 1 instruction per "loop". For each CPU cycle, the PPU runs for 3 cycles before the current CPU cycle.


Top
 Profile  
 
PostPosted: Thu Dec 17, 2015 9:59 pm 
Offline
User avatar

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 710
Location: New York, NY
Zepper wrote:
Runs 1 instruction per "loop". For each CPU cycle, the PPU runs for 3 cycles before the current CPU cycle.


For such a loop, the PPU can be out of sync with the CPU for up to 21 PPU cycles (some instructions take up to 7 CPU cycles and for NTSC, the ratio is 3 PPU cycles per every CPU cycle). Since it's a loop, neither really runs before the other; they just alternate. Any apparent order has no effect on that the size of that gap.

In the real hardware, the PPU and CPU run in parallel. Meaning, there is no gap at all. And, a zero gap is achievable with a transistor-level simulation of the A203. But, it's not practical for emulation.

For emulation, the size of gap can be shrunk considerably by executing microcodes in the loop instead of full instructions. Each microcode takes 1 CPU cycle, resulting in a gap size no larger than 3 PPU cycles out of sync.

However, that gap is apparently still too large for some games to work properly. Luckily, that can be compensated by a hack. In the PPU timing diagram, all the dot times can be shifted over by 3 PPU cycles (or by 4 PPU cycles to handle PAL as well).


Top
 Profile  
 
PostPosted: Fri Dec 18, 2015 4:44 am 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3064
Location: Brazil
zeroone wrote:
Since it's a loop, neither really runs before the other; they just alternate.

Yeah.
Quote:
Any apparent order has no effect on that the size of that gap.

In emulation terms, yes, it does a big difference! Just think about running the PPU clocking before reading $200x or after reading $200x.
Quote:
In the real hardware, the PPU and CPU run in parallel. Meaning, there is no gap at all. And, a zero gap is achievable with a transistor-level simulation of the A203. But, it's not practical for emulation.

I got what you mean... but still... Unable to find a way of doing such thing.


Top
 Profile  
 
PostPosted: Fri Dec 18, 2015 8:57 am 
Offline
User avatar

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 710
Location: New York, NY
Zepper wrote:
In emulation terms, yes, it does a big difference! Just think about running the PPU clocking before reading $200x or after reading $200x.


Think about it from the point of view of the program being executed by the CPU. For instance, a sprite 0 hit does not interrupt the processor. Instead, the program has to poll the PPU Status Register ($2002) in a loop. If the PPU always runs ahead of the CPU, then the program loop will occasionally break out earlier in emulation than it would in the actual hardware. And, some games are very sensitive to timing variations like that, such as The Simpsons: Bart vs. the Space Mutants. When the timing is slightly off, the status bar shakes. Running the PPU ahead or behind the CPU doesn't solve the issue because the discrepancy still causes variations in program loop iteration counts.

Zepper wrote:
Unable to find a way of doing such thing.


Unfortunately, the first step is to rewrite your CPU. But, it's actually not that difficult to transform the implementation if it's already working on the instruction level.

This link contains lists of microcodes executed by each instruction. Note that every single microcode reads from memory or writes to memory (the R/W column). Consequentially, the code that emulates each microcode will have to call some common read() and write() functions. Within those functions, advance the PPU by 3 PPU cycles (for NTSC). The PPU will still be running ahead of the CPU, but only by a maximum of 3 PPU cycles, as opposed to the up to 21 PPU cycles in your current model.

This will actually solve other timing issues as well. For instance, interrupts essentially happen in between executing instructions. But, DMA can occur in the middle of an executing instruction. And, it's behavior is modified based on the type of microcode (i.e. whether it is a read or write cycle).


Top
 Profile  
 
PostPosted: Fri Dec 18, 2015 10:55 am 
Offline
User avatar

Joined: Wed Nov 19, 2014 9:00 am
Posts: 40
Location: Mexico
Zepper wrote:
aLaix wrote:
@Zepper
Sorry for the misunderstanding, we pass suppression test, what we are not passing is 07-NMI_on_timing test.


Oh, okay. Do what I described and tell me the result.


@Zepper

We were able to pass the test by suppressing the NMI when rendering is turned on close to the end of vblank. (i.e. first two cycles of scanline 261).

zeroone wrote:
This will actually solve other timing issues as well. For instance, interrupts essentially happen in between executing instructions. But, DMA can occur in the middle of an executing instruction. And, it's behavior is modified based on the type of microcode (i.e. whether it is a read or write cycle).


@zeroone
- I remember reading somewhere in this forum that sprite DMA can only take place between instructions just like interrupts.
- By "microcode" you mean each cycle of an instruction, as in the http://nesdev.com/6502_cpu.txt document? If so, our CPU already execute instructions that way, doing every single memory access like that, it's just that the loop allows the current instruction to finish (i.e. we don't return from the loop in the middle of an instruction). On a write to $4014, we turn a "dmaPending" flag on, and let the DMA happen after the instruction finishes. Could that be causing sync problems between the CPU and the PPU?

Thanks for all the insight in this topic guys.

_________________
*** O-Nes-Sama emulator team ***


Top
 Profile  
 
PostPosted: Fri Dec 18, 2015 12:47 pm 
Offline
User avatar

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 710
Location: New York, NY
Fumarumota wrote:
I remember reading somewhere in this forum that sprite DMA can only take place between instructions just like interrupts.


See Likely internal implementation of the read.

Per that link, the RDY pin "causes the CPU to pause during the next read cycle". My interpretation of this is that the processor can be suspended mid-instruction. And, it keeps RDY low for 4 CPU cycles because the longest contiguous sequence of write cycles for any instruction (or interrupt) is length 3. So, the processor is suspended for 1 to 4 cycles.

Fumarumota wrote:
By "microcode" you mean each cycle of an instruction, as in the http://nesdev.com/6502_cpu.txt document?


Yes. I am referring to each of those steps as a microcode instruction.

But, if you want to nitpick, the 6502 does not technically use microcode. It uses a state machine in combination with a programmable logic array. It's the poor man's version of microcode. If you know a better alternative name, I'll adopt it for this discussion.

Fumarumota wrote:
If so, our CPU already execute instructions that way, doing every single memory access like that, it's just that the loop allows the current instruction to finish (i.e. we don't return from the loop in the middle of an instruction).


That sounds perfect. There is no need to return mid-instruction.

Fumarumota wrote:
On a write to $4014, we turn a "dmaPending" flag on, and let the DMA happen after the instruction finishes. Could that be causing sync problems between the CPU and the PPU?


You do not need to return mid-instruction, but from my understanding, you need to handle DMA mid-instruction. And, you can handle this in the common read() function that I mentioned earlier in this thread. Meaning, if there is a DMA request, as soon as a read cycle is encountered, the processor will be suspended. I.e. the read() function will do 3 things: 1) handle DMA if need be, 2) update the PPU by at least 3 PPU cycles and 3) return a value from memory.


Top
 Profile  
 
PostPosted: Fri Dec 18, 2015 6:40 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19104
Location: NE Indiana, USA (NTSC)
zeroone wrote:
But, if you want to nitpick, the 6502 does not technically use microcode. It uses a state machine in combination with a programmable logic array. It's the poor man's version of microcode. If you know a better alternative name, I'll adopt it for this discussion.

It's microcode. Visual 6502 refers to it as a decode ROM. It's just incompletely decoded to improve compression (130 words vs. 256).


Top
 Profile  
 
PostPosted: Fri Dec 18, 2015 7:44 pm 
Offline
User avatar

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 710
Location: New York, NY
tepples wrote:
It's microcode. Visual 6502 refers to it as a decode ROM. It's just incompletely decoded to improve compression (130 words vs. 256).


It's almost microcode. Here are some further details about what's on that PLA.

For discussions on this forum, I'm fine with the term "microcode", even though technically, maybe it's not officially microcode.


Top
 Profile  
 
PostPosted: Fri Dec 18, 2015 9:21 pm 
Offline
User avatar

Joined: Wed Nov 19, 2014 9:00 am
Posts: 40
Location: Mexico
zeroone wrote:
See Likely internal implementation of the read.

Per that link, the RDY pin "causes the CPU to pause during the next read cycle". My interpretation of this is that the processor can be suspended mid-instruction. And, it keeps RDY low for 4 CPU cycles because the longest contiguous sequence of write cycles for any instruction (or interrupt) is length 3. So, the processor is suspended for 1 to 4 cycles.


Quite an interesting read that was. I will definitely take it into account when we implement DMC.

zeroone wrote:
Yes. I am referring to each of those steps as a microcode instruction.

But, if you want to nitpick, the 6502 does not technically use microcode. It uses a state machine in combination with a programmable logic array. It's the poor man's version of microcode. If you know a better alternative name, I'll adopt it for this discussion.


Didn't mean to nitpick or anything, microcode or anything else is just fine :).

zeroone wrote:
You do not need to return mid-instruction, but from my understanding, you need to handle DMA mid-instruction. And, you can handle this in the common read() function that I mentioned earlier in this thread. Meaning, if there is a DMA request, as soon as a read cycle is encountered, the processor will be suspended. I.e. the read() function will do 3 things: 1) handle DMA if need be, 2) update the PPU by at least 3 PPU cycles and 3) return a value from memory.


As I understand it, this is how an accurate DMC DMA would work, right? What about Sprite (OAM) DMA?

Here's how I do it currently:

- After $4014 write cycle:
* Run 3 PPU cycles, Account 1 CPU idle cycle.
* If write was on odd CPU cycle: Run PPU 3 cycles, account another CPU cycle.
* Perform OAM initialization by reading ram[valueWrittenIn4014 * 0x100] and writing it in PPU OAM_DATA via $2004 register (These account 512 CPU cycles, running 3 PPU cycles before each CPU access cycle).

All this happens after the instruction causing the write finishes.

Please forgive me if this a bit off topic on the Zero Hit stuff, but we want to find out everything that could be causing a zero hit miss (hanging some games like Battletoads or shaking Bart's status bar).

_________________
*** O-Nes-Sama emulator team ***


Top
 Profile  
 
PostPosted: Sat Dec 19, 2015 6:33 am 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3064
Location: Brazil
@zeroone
I've spent a lot of time debugging and tracing all those test ROMs provided by blargg... trying to understand some deep mechanics, cycle by cycle, huge logs, analyzing precisions of 1 PPU cycle + or - while reading/writing. You can't say "running up to 7 PPU cycles" of PPU/CPU out of sync because simply they are NOT. Most of these test ROMs perform specific reads that require precision of 1 cycle (as I said before), so I really dunno about it. My emu passes in all those tests, with NO hacks. Sorry. :cry: While the idea of breaking the CPU time (or cycle/clock/whatever?) into a smaller piece of time is somewhat interesting, I have to disagree. Otherwise, _Q would have written something.


Top
 Profile  
 
PostPosted: Sat Dec 19, 2015 10:30 am 
Offline
User avatar

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 710
Location: New York, NY
Zepper wrote:
@zeroone
I've spent a lot of time debugging and tracing all those test ROMs provided by blargg... trying to understand some deep mechanics, cycle by cycle, huge logs, analyzing precisions of 1 PPU cycle + or - while reading/writing. You can't say "running up to 7 PPU cycles" of PPU/CPU out of sync because simply they are NOT. Most of these test ROMs perform specific reads that require precision of 1 cycle (as I said before), so I really dunno about it. My emu passes in all those tests, with NO hacks. Sorry. :cry: While the idea of breaking the CPU time (or cycle/clock/whatever?) into a smaller piece of time is somewhat interesting, I have to disagree. Otherwise, _Q would have written something.


Treat the Simpsons ROM as an additional test. Use logging and figure out why the status bar shakes. RockNES maybe perfectly tuned to beat all of Blargg's tests, but it's accuracy can be still be increased.

Breaking down each instruction into smaller executable pieces is certainly not a hack. And, the techniques that I mentioned above are used in Nintendulator.


Top
 Profile  
 
PostPosted: Sat Dec 19, 2015 2:31 pm 
Offline
User avatar

Joined: Mon Dec 29, 2014 1:46 pm
Posts: 710
Location: New York, NY
Fumarumota wrote:
What about Sprite (OAM) DMA?

Here's how I do it currently:

- After $4014 write cycle:
* Run 3 PPU cycles, Account 1 CPU idle cycle.
* If write was on odd CPU cycle: Run PPU 3 cycles, account another CPU cycle.
* Perform OAM initialization by reading ram[valueWrittenIn4014 * 0x100] and writing it in PPU OAM_DATA via $2004 register (These account 512 CPU cycles, running 3 PPU cycles before each CPU access cycle).

All this happens after the instruction causing the write finishes.


That sounds about right.

The processor will be suspended immediately after the $4014 write cycle (as opposed to waiting until the full instruction ends). Consequentially, within the write() function discussed earlier, the code can do the memory transfer:

Code:
if (odd CPU cycle)
  read(PC) 
 
read(PC)

for (i = 0 to 255)
  write($2004, read((value * 256) + i))   


Above, the read() and write() functions have the side-effect of running 3 PPU cycles for NTSC and 3 or 4 PPU cycles for PAL. And, those functions each count as a CPU cycle (i.e. they will increment a CPU cycle counter required for frame timing). Since the for-loop calls both write() and read(), it takes 512 CPU cycles. The prior 2 read() calls extends the length of the transfer to 513 or 514 CPU cycles.

Futher details can be found here.


Top
 Profile  
 
PostPosted: Sun Dec 20, 2015 9:40 am 
Offline
User avatar

Joined: Wed Nov 10, 2004 6:47 pm
Posts: 1845
is read(pc) right? The wiki says those cycles are idle.


Top
 Profile  
 
PostPosted: Sun Dec 20, 2015 9:41 am 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3064
Location: Brazil
Disch wrote:
is read(pc) right? The wiki says those cycles are idle.

That code (the for() loop) shouldn't be taken as "correct".


Top
 Profile  
 
PostPosted: Sun Dec 20, 2015 9:42 am 
Offline
User avatar

Joined: Wed Nov 10, 2004 6:47 pm
Posts: 1845
Zepper wrote:
That code (the for() loop) shouldn't be taken as "correct".


Why not? It looks correct to me.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 70 posts ]  Go to page Previous  1, 2, 3, 4, 5  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group