It is currently Mon Oct 23, 2017 3:39 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 12 posts ] 
Author Message
 Post subject: timing... (attn: disch)
PostPosted: Fri Aug 19, 2005 9:44 am 
Offline
User avatar

Joined: Tue Dec 21, 2004 8:35 pm
Posts: 600
Location: Argentina
i dont know what to do with timing in my emu. I know how i emulate ppu is crappy:

i do the following:

- has a "cc" counter that counts ppu cycles, this is inside the ppu emulate loop
- when the "cc" arrives at more or less "340" i increse another counter "cScanline"
- When "cScanLine" arrives at 262 it reset to 0;
- well all things happens inside this as 2C02 brad taylor's doc says

In the emulation main loop i do this:
Code:
EmulateCpu();
EmulatePPU(cCurrentCycle * 3)

(im not tanking account yet PAL)

Disch told me about a method to keep emulating the cpu until something happens to the ppu that stop the cpu emulation and then executes the ppu as many cycles the cpu executed. I readed about it in a emulation doc too.., but i have problems, in other words i dont know how to implement it, i catch it "theorically" but i cant put it in the code :( .

Help plz!!

_________________
ANes


Top
 Profile  
 
 Post subject:
PostPosted: Fri Aug 19, 2005 10:20 am 
Offline
User avatar

Joined: Wed Nov 10, 2004 6:47 pm
Posts: 1845
I do the following:

1) Keep a CPU timestamp (obviously). This timestamp, is in "master cycles" (see below)

2) Keep a PPU timestamp -- same idea as CPU timestamp. Again, in "master cycles"

3) Keep a Scanline Counter (-1 through 240).

4) Keep a scanline cycle counter (0-340)

5) Keep a 'VBlank Time' var (this will be more or less constant, but it changes between PAL/NTSC modes).


I do the 'main' timestamps in what I call Master Cycles. These are neither CPU nor PPU cycles -- rather they're a higher resolution so that the ratio between PAL CPU:PPU cycles can be manitained.

- For every 1 NTSC CPU cycle that passes, I increment the CPU timestamp by 15
- For every 1 PAL CPU cycle that passes, I increment the CPU timestamp by 16
- For every 1 PPU cycle that passes (NTSC or PAL), I increment the PPU timestamp by 5

I'd recommend you take PAL into account as soon as possible, as relying on the 3:1 NTSC ratio will make things a pain in the ass later when you finally do decide to add PAL support.


As for implimentation -- the two big functions of my program are RunCPU(int runto) and RunPPU(int runto). RunCPU will emulate CPU instructions until the CPU timestamp reaches/passes the given 'runto' timestamp (typically, RunCPU is only called once in my emu and it told to run the CPU for an entire frame's worth of time). RunPPU does the same thing, but runs the PPU (and renders pixels) until the given timestamp is reached (typically, RunPPU is called many times per frame).


Making these functions work together is simple. If you keep the CPU timestamp updated as you emulate 6502 instructions -- you simply pass the CPU timestamp to RunPPU when you want the PPU to 'catch up' to the CPU. You should have the PPU catch up everytime something on the system which affects drawing changes, and also when the status of the PPU will alter CPU action (in the case of register reads). This includes (but is not necessarily limited to) PPU register writes/reads, Nametable mode changes, and CHR swapping.

For instance when your game is swapping CHR -- updating the PPU would be as simple as something like the following:

Code:
void SwapCHR(int where,int page)
{
  RunPPU( cpu_timestamp );

  // swap CHR here
}



The tricky part now, is making a RunPPU function which can be entered and exited on ANY given PPU cycle. This is one reason why I keep those Scanline and Scanline Cycle counters I mentioned earlier. If you keep track of the scanline and scanline cycle that the PPU is in, it makes PPU emulation easier. But you also need to keep the main timestamp to keep it synced up with the CPU.

My RunPPU function looks kind of like this:

Code:
void RunPPU( int runto )
{
  if( ppu_timestamp < vblank_cycles ) /* vblank_cycles is the number of master cycles VBLank lasts.  For example on NTSC this is (20 * 341 * 5) */
  {
     ppu_timestamp = vblank_cycles;  //do nothing in vblank
     scanline = -1;  // set scanline counter to pre-render scanline
     scanline_cycle = 0;  // start of cycle 0 of that scanline
  }

  if( ppu_timestamp >= runto )  return;  /* see if we're done -- this should be done every time ppu_timestamp is adjusted */

  if( scanline == -1 )
  {
    // do pre-render scanline stuff
  }

  while( scanline < 240 )
  {
    while( scanline_cycle < 256 )
    {
       /*render 1 pixel, load another tile if needed, adjust PPU address where needed, etc */

       scanline_cycle++;
       ppu_timestamp += 5;
       if( ppu_timestamp >= runto )   return;
    }

    while( scanline_cycle < 340 )
    {
       //similar things here
    }

    scanline_cycle = 0;
    scanline++;
  }
}



That's gives a rough idea.


Anyway -- there are rooms for optimizations. The two big things I can think of are:

- detecting $2002 read loops and running the PPU until $2002 status changes

- having a faster version of RunPPU which renders full scanlines which can be called when the PPU is to render a full scanline.


Anyway, at the end of the frame, you'd make sure the PPU is caught up to the CPU again, then you subtract CPU/PPU timestamps by the number of cycles in that frame (do not reset the timestamps to 0! Otherwise cycles which "spilled" over to the next frame would be lost).


Top
 Profile  
 
 Post subject:
PostPosted: Mon Aug 22, 2005 8:06 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
One thing I wanted to try with my NES emulator was seeing how efficient a PPU core could be if it rendered the whole screen at once. After thinking about the design Disch described, I realized that it does allow the optimization of the common case where dozens of scanlines are rendered without any relevant PPU writes between. It allows the standard approach to efficiency of first writing code that works in all cases and then optimizing the common operations.

The design simulates cooperative threading, where each thread explicitly yields to another. It would be interesting to implement it with a proper cooperative threading library. The code below shows the differences:

Code:
// no threading
void f()
{
    for ( int i = 0; i < 10; i++ )
        g();

    h();
}

// manual threading
static int i;
static int phase;

void f()
{
    switch ( phase )
    {
        case 0:
            i = 0;
            phase = 1;
            break;

        case 1:
            i = 0;
            if ( i < 10 ) {
                g();
                i++;
            }
            else {
                phase = 2;
            }
            break;

        case 2:
            h();
            phase = 3;
            break;
    }
}

// cooperative threading
void f()
{
    for ( int i = 0; i < 10; i++ ) {
        g();
        yield();
    }

    h();
}


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 23, 2005 12:39 pm 
Offline
User avatar

Joined: Tue Dec 21, 2004 8:35 pm
Posts: 600
Location: Argentina
thanks disch, i taked the "concept" i applied to my emu, its working better and with better performace, but i still have problems with battletoads, any help? thanks.

_________________
ANes


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 23, 2005 1:07 pm 
Offline
User avatar

Joined: Wed Nov 10, 2004 6:47 pm
Posts: 1845
Battletoads relies on some pretty exact timing crap. To get it working properly, make sure:


1) You execute 1 instruction between the start of VBlank (when $2002.7 is raised) and when an NMI is actually triggered. There appears to be some latency between the two. This doesn't apply to battletoads, but this latency also exists when you enable NMIs from a disabled state when $2002.7 is high (failure to handle this latency will make Lolo games crash and burn -- failure to handle NMI triggering when NMI's are enabled when 2002.7 is high will cause problems with Captain Skyhawk)

2) PPU X address is incremented no earlier than every 4th cycle on the scanline (4, 12, 20, etc)

3) PPU Y address is incremented on cycle 252

4) PPU X address is reset on cycle 256

Doing those 4 things should get Battletoads running without problems.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 23, 2005 2:15 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 10:59 pm
Posts: 1390
Disch wrote:
2) PPU X address is incremented no earlier than every 4th cycle on the scanline (4, 12, 20, etc)

3) PPU Y address is incremented on cycle 252

4) PPU X address is reset on cycle 256


The actual values for these are 3/11/19/etc., 251, and 257 (all zero-based), verified by doing extremely precise PPU testing using Kevin Horton's "3-in-1 tester".

_________________
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Aug 23, 2005 2:39 pm 
Offline
User avatar

Joined: Wed Nov 10, 2004 6:47 pm
Posts: 1845
whoops -- I stand corrected.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Aug 25, 2005 5:31 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Fx3 wrote:
(from the thread "Reading opcodes directly without read function")
Code:
void cpu_run()
{
   ppu_run(); apu_run();
   data = cpu->bank[PC>>13][PC & 0x1fff];
   //do stuff
}


Why do the PPU and APU need to be run every CPU instruction? Unless they can affect each other in some way, they can each be run separately and in any order.

What you need is a way to ask the PPU and APU for a timestamp of the earliest time they can affect the CPU, then run the CPU until this time. Along the way the CPU might write to the APU or PPU in a way that changes the timestamp of their earliest effect, in which case you might need to stop the current CPU emulation run loop.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 28, 2005 11:20 am 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3064
Location: Brazil
Now you're messing up the things. Let me clear it - anyway, the PPU/APU is executed at every single CPU cycle. For the case above, 1 cycle to fetch the instruction. I'm not running CPU/APU for every instruction, but for every cycle.

_________________
Zepper
RockNES developer


Top
 Profile  
 
 Post subject:
PostPosted: Sun Aug 28, 2005 11:57 am 
Offline

Joined: Mon Sep 20, 2004 11:13 am
Posts: 134
Location: Sweden
Of course, that makes much more sense.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 24, 2005 2:40 pm 
Offline

Joined: Tue Mar 15, 2005 10:34 am
Posts: 34
Disch wrote:

4) PPU X address is reset on cycle 256




By reset do you mean it is reloaded with PPU X address
from the temp address (Loopy_t)?


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 24, 2005 2:43 pm 
Offline
User avatar

Joined: Wed Nov 10, 2004 6:47 pm
Posts: 1845
Yes

X Scroll reset logic:

Loopy_V = (Loopy_V & ~0x041F) | (Loopy_T & 0x041F);


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 12 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group