Page 2 of 4

Posted: Fri Oct 22, 2010 9:58 am
by tokumaru
Vash wrote:When is frame rendered?
The PPU is constantly working alongside the CPU. It keeps repeating this cycle: 20 Vblank scanlines, 1 dummy scanline, 240 picture scanlines, 1 dummy scanline. It never stops. It's the responsibility of the game program to sync itself to this cycle.

Posted: Fri Oct 22, 2010 10:06 am
by Vash
Ok so the CPU and PPU are working in parallel but in my emulator I wanted to do something that looks like a game loop as this :

while(GAME_RUNNING)
{
if(timeElapsed>=tick)
Game.update();

Game.render();
}

So basically my question is when do I stop the CPU emulator to render a frame?

Posted: Fri Oct 22, 2010 10:15 am
by tepples
The most accurate method is to render three pixels, then perform one CPU cycle, and repeat. That's slow, so various catch-up schemes are used. As for when you give the GUI control, But the common pattern that I've seen is to stop the emulator at the start of line 240 (the post-render line), where 241 is the start of vertical blanking.

Posted: Fri Oct 22, 2010 10:23 am
by Vash
Ok so the emulator main loop can look like something like that :

Code: Select all

while(EMULATOR_RUNNING)
{
   if(cycle < 262) // number of scanline
      CPU.run(&cycle);

   PPU.render();
}

Posted: Fri Oct 22, 2010 10:36 am
by Dwedit
There are 262 scanlines, but 341 PPU cycles in each scanline. More like 89342 PPU cycles total (29780.66... CPU cycles).

You need to expect the game to write to the PPU during rendering time, because even Super Mario Bros changes the scrolling location part way through draw time.

Posted: Fri Oct 22, 2010 10:46 am
by Vash
The more I read stuff, the less I understand :D. I'm completely lost.

I'm ok with the cycle : 341 PPU cycle per scanline with 262 scanlines : 89342 PPU cycles. As 1 cpu cycle = 3 ppu Cycles, we end up with 29780 cpu cycles.

What do you mean by : the game write to the PPU during rendering time?

Posted: Fri Oct 22, 2010 11:12 am
by Bregalad
tepples wrote:The most accurate method is to render three pixels, then perform one CPU cycle, and repeat.
Are you racist against people living in PAL territories ?

Posted: Fri Oct 22, 2010 11:20 am
by blargg
Let's first adopt some terms that don't all sound the same. We don't need to use "cycle" for everything.

Cycle: CPU cycle. For example, two cycles in a NOP
Pixel: time PPU spends rendering a single pixel
Clock: the 21477272.7 Hz master timebase

Therefore:
1 clock = 1/21477272.7 second
1 pixel = 4 clocks = 1/5369318 second
1 cycle = 3 pixels = 12 clocks = 1/1789772.7 second
1 scanline = 341 pixels (in most cases) = 113.67 cycles
1 frame = 262 scanlines = 29780.67 cycles = 1/60.1 second

For PAL:
1 clock = 1/26601712.5 second
1 pixel = 5 clocks = 1/5320342.5 second
1 cycle = 3.2 pixels = 16 clocks = 1/1662607 second
1 scanline = 341 pixels = 106.5625 cycles
1 frame = 312 scanlines = 33247.5 cycles = 1/50 second

Then we can talk of these things with different one-word terms, and not get confused.

Posted: Fri Oct 22, 2010 11:24 am
by tepples
Bregalad wrote:
tepples wrote:The most accurate method is to render three pixels, then perform one CPU cycle, and repeat.
Are you racist against people living in PAL territories ?
Russia is a PAL territory. The Dendy famiclone uses a /15 CPU instead of a /16 one like the official PAL NES, resulting in 3 pixels per cycle, and a PPU that makes NMI at scanline 291 instead of 241 like the official PAL NES. It appears the newbie hasn't yet appreciated the concept of two processors running in parallel. If I listed all variants of the NES architecture immediately, it would confuse the newbie even more.

For Dendy:
1 clock = 1/26601712.5 second
1 pixel = 5 clocks
1 cycle = 3 pixels = 15 clocks
1 scanline = 341 pixels = 113.67 cycles
1 frame = 312 scanlines = 35464 cycles

Posted: Fri Oct 22, 2010 1:02 pm
by Dwedit
Vash wrote: What do you mean by : the game write to the PPU during rendering time?
Here are the most common changes made to the PPU's state during rendering...

* Change the scrolling location for a status bar.
* Change the scrolling location many times because we want wave backgrounds or it's a racing game.
* Bankswitch the CHR so that different graphics are drawn after a certain scanline.
* Change which pattern table backgrouds and sprites use.

Then some more tricky stuff that games can do...

* Bankswitch the CHR more than once within the same scanline (Punch Out, Marble Madness, Fire Emblem, etc...)
* Disable rendering so the game can write to video ram, then re-enable rendering later within the same frame. (Wizards and Warriors 3)
* Disable rendering, then write a second sprite table, then re-enable rendering (Day Dreamin Davey, RC Pro am, Stunt Kids, some other games)

Any Renderer which looks at the PPU's initial state at the start of the frame (scroll position, CHR banks mapped in, which pattern tables to use, size of sprites) and attempts to draw the entire screen using only that initial state won't do a very good job, even Super Mario Bros won't scroll correctly.
You need at least scanline-level accuracy of PPU state changes. And then, scanline-level accuracy of PPU state changes isn't good enough for Punch Out, that needs pixel-level accuracy.

But you don't need to keep switching between CPU code and PPU code every instruction, you can instead use a catch-up method where you wait until the emulated game makes a PPU write, or the frame ends, then you draw that amount of pixels which have elapsed.

Posted: Fri Oct 22, 2010 1:31 pm
by tepples
It took a while, but I found an overview of "catch-up" and "timestamp" related techniques in this article on our wiki. Let me know about anything that you don't understand in this article so that I can go fix it.

Posted: Fri Oct 22, 2010 1:34 pm
by 3gengames
Why don't you guys support emulation the core one cycle at a time, not just "x cycles=instruction y"...? Wouldn't that help the emulation alot for REALLY close timing things?

Posted: Fri Oct 22, 2010 1:49 pm
by tepples
Running one cycle at a time is slow because it needs to keep the state of both the emulated CPU and PPU in the host CPU's L1 cache, and not all host CPUs are big enough for that. Efficient emulators use catch-up techniques to keep the host CPU's attention on only one emulated part at once yet still act as if the components run at the same time. Drop the catch-up, as you suggest, and you have an emulator like Nintendulator or bsnes, which last time I checked didn't run too well on netbooks.

Posted: Fri Oct 22, 2010 2:46 pm
by Zepper
tepples wrote:The most accurate method is to render three pixels, then perform one CPU cycle, and repeat. That's slow, so various catch-up schemes are used. As for when you give the GUI control, But the common pattern that I've seen is to stop the emulator at the start of line 240 (the post-render line), where 241 is the start of vertical blanking.
- Absolutely true. :) I had to (additionally) create a queue system, right after an instruction, for things like switching to GUI or sound output updates/poll.

Posted: Fri Oct 22, 2010 2:49 pm
by tokumaru
Vash wrote:Ok so the emulator main loop can look like something like that :

Code: Select all

while(EMULATOR_RUNNING)
{
   if(cycle < 262) // number of scanline
      CPU.run(&cycle);

   PPU.render();
}
The problem with this approach is that many games (and I mean LOTS of them) make use of the fact that CPU and PPU run side by side. These games modify certain PPU parameters as the image renders in order to to change the rendered image in some way. This is used for status bars, parallax scrolling, color changes, things like that. If you ignore those timed changes and only render the image based on the final state of the PPU, almost every game will look wrong, and many might even hang (the ones that rely on sprite 0 hits).

A common solution to this problem is the "catch up" method. You basically run the CPU until the program tries to make any changes to the PPU or the frame ends, at which point you make the PPU catch up to the CPU by rendering the necessary number of pixels.

Of course you still have to consider events external to the CPU that might affect the program flow, such as sprite 0 hits or IRQs. You have to predict when those will happen so that you can update the system's state accordingly at the correct times.