It is currently Thu Oct 19, 2017 3:42 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 28 posts ]  Go to page Previous  1, 2
Author Message
 Post subject:
PostPosted: Tue Oct 04, 2005 2:52 pm 
Offline

Joined: Thu Sep 15, 2005 9:23 am
Posts: 1194
Location: Behind you with a knife!
Ok, I want to implement some form of accurate timing on my emulator. However, I don't want to rewrite my CPU for cycle for cycle timing. Can I do it this way.

Opcode == 0xA9 (LDA Immediate)

Do Operation...
Clock Cycles = Clock Cycles + 2;

DrawPixel( NumberofPixels ) (NumberofPixels == 2 * 3)
{
...
}

Obviously, this would increase the accuracy of my emulator. But would it be enough for hit detection?

_________________
http://www.jamesturner.de/


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 04, 2005 4:00 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
That would work, though it doesn't match the simplicity and speed of the method described in the thread "timing... (attn: disch)".

The basic idea is very simple: whenever the CPU is just about to do something that might affect PPU rendering, first run the PPU until that time, then carry out the read/write. The only requirement is that the CPU keep track of how many clocks it's executed and make this available when reading and writing I/O memory locations. With this scheme you don't constantly run the PPU every instruction, so it's quite fast.

Code:
void run_ppu( long ppu_time )
{
    ...
}

void write_ppu( long ppu_time, int addr, int data )
{
    run_ppu( ppu_time );
    switch ( addr & 0x2007 )
    {
        case 0x2000:
            ...
       
        case 0x2005:
            ...
    }
}

long cpu_time;
long cpu_end; // CPU will run until or just after this time

void write_memory( int addr, int data )
{
    if ( (addr & 0xe000) == 0x2000 )
        write_ppu( cpu_time * 3, addr, data );
    ...
}

void stop_cpu()
{
    cpu_end = 0; // stop CPU execution after current instruction
}

void run_cpu()
{
    while ( cpu_time < cpu_end )
    {
        int opcode = read_memory( pc++ );
        cpu_time += timing_table [opcode];
        switch ( opcode )
        {
            0x8D: { // STA abs
                int addr = read_memory( pc + 1 ) * 0x100 +
                        read_memory( pc );
                pc += 2;
                write_memory( addr, a );
                break;
            }
            ...
        }
    }
}


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 04, 2005 4:08 pm 
Offline
User avatar

Joined: Wed Nov 10, 2004 6:47 pm
Posts: 1845
What you're thinking looks a lot like a pixel-accurate renderer.. although catching up after every instruction would be slow.

One way to go would be to impiment a pixel-by-pixel PPU as I describe in this thread... although that would likely require many significant changes.

A good alternative would be to predict the cycle at which sprite 0 hit will occur... and on $2002 reads, see if the CPU is before or after that cycle... if at or after, you would set the sprite0 hit flag without having to do any PPU emulation.

You could predict by rendering the sprite 0 into a temporary buffer... then rendering the BG tiles on top of it to see where they'd first collide (up to 6 BG tiles will need to be drawn -- if sprite 0 is 8x16 it can be over at most 6 tiles).. to see where they'd hit and get the timestamp from that. However you'd need to re-predict every time the circumstances change (CHR swapped, CHR-RAM written to, Sprite/BG enable/disable change, scroll change, etc, etc.. anything that could affect when sprite 0 will happen).

Rather than re-predict every time those things change (since they change all the freaking time), you could raise a "NeedRepredict" flag when those things change... and re-predict only on $2002 reads if the NeedRepredict flag is set (of course clearing it after you predict).

I was meaning to put something like that in my emu to speed up games which do wait for sprite 0 loops. You could do something similar to this for the 8-sprite flag, too.

Anyway I hope that makes sense.


edit --- oop too slow... blargg beat me to it... and he linked to that thread as well XD


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 04, 2005 4:45 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Disch described exactly what I've been thinking of for my emulator (though he's going at it from an already-correct implementation, while I'm trying to improve accuracy). Currently in my $2002 read function I check to see if the current time is after the earliest sprite 0 could occur. (based on its Y position). If so, I just scan however many lines of sprite 0 have been drawn and report a hit when I find one with any non-transparent pixels (i.e. I never look at the background). This works surprising well for many games (even Battletoads, except for the snake pit and tower). This also passes the sprite hit timing test ROM I posted earlier (since they use a sprite that's just a big block of non-transparent pixels).

I haven't yet come up with a way to handle sprite 0 hit without interacting with sprite rendering. I don't want to write a separate mini-renderer because it would be so similar to main rendering and might have subtle differences. The idea I'm working on involves saving the pixels under sprite 0, then comparing those to the pixels after it's drawn. Cheap, but simpler to implement and it doesn't affect low-level pixel rendering (which is done in chunks one or more scanlines).


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 04, 2005 6:16 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Well, I just implemented the scheme I described above and it works well so far. It came out quite simple and I didn't have to duplicate any of the sprite drawing logic (flipping, etc.). I'm going to be improving the sprite hit timing ROM to test with pixels in the four corners, and writing a second test ROM to test many different situations of transparent and non-transparent pixels, other sprites, non-hit under left clip border and right edge, etc. Hopefully I'll post it tomorrow, if I don't run into any problems.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Oct 05, 2005 2:28 pm 
Offline
Formerly Fx3
User avatar

Joined: Fri Nov 12, 2004 4:59 pm
Posts: 3064
Location: Brazil
I still need to run a benchmark test in a PentiumIII 800Mhz, but for my machine (Celeron 2.66GHz), my emu runs at 130~140 FPS on 256x240 windowed mode. On 640x480 stretched, it goes up to 85 FPS. I have no clue if this is a good or bad result, but anyways it uses pixel precision emulation. ^_^;;

_________________
Zepper
RockNES developer


Top
 Profile  
 
 Post subject:
PostPosted: Wed Oct 05, 2005 2:53 pm 
Offline

Joined: Thu Sep 15, 2005 9:23 am
Posts: 1194
Location: Behind you with a knife!
Ok, I am going to implement my afforementioned method of rendering, i.e. execute a full CPU instruction, followed by rendering 3 pixels etc.

Draw (Instruction Time - 1) * 3 Pixels
Execute Instruction
Draw 3 Pixels (Remaining Cycle)

With this method am I guarenteed to have an accurate CPU/PPU/APU relation?

I was wondering which methods other people use in their emulators. Quitest, Fx3, blargg, What do yours use?

Also what is the importance of Loopy's scroll document? I have totally ignored the information contained inside (as I also find it totally incomprehensible), but I have had not scrolling issues in my emulator.

_________________
http://www.jamesturner.de/


Top
 Profile  
 
 Post subject:
PostPosted: Wed Oct 05, 2005 3:42 pm 
Offline
User avatar

Joined: Wed Nov 10, 2004 6:47 pm
Posts: 1845
A pixel is not rendered on every PPU cycle. There are 341 PPU cycles per scanline... but only the first 256 of those cycles render pixels. The other cycles do other things.

This method'll work... but as blargg and I have already pointed out, it'll be difficult to get going properly and will be painfully slow (it's basically the same concept as the "catch up" method described in the previously linked thread, only instead of only catching up when needed you're catching up after every instruction).

Quietust, afaik, does things one cycle at a time... as in he runs the CPU for one cycle, then the PPU, then the APU, CPU, PPU, APU, etc... which makes it easier to do things with cycle-perfect accuracy.. however it is DREADFULLY slow, which is why Nintendulator demands a much more powerful computer to run than other emus do. (Feel free to correct me on this Q, that's just my understanding of how Nintendulator works.. I could very well be wrong).

Some games may rely on $2006 and $2005 interaction for split screen effects, so understanding and applying the info in Loopy's docs might be important. The docs are pretty hard to understand at first... but it's not really as complex as it may seem.

There's a PPU address (Loopy_V) which the PPU uses to not only handle $2007 read/writes, but also uses for tile fetching when rendering. There's also a temporary value (Loopy_T) which it uses to refresh Loopy_V with during rendering (like say, to reset the X scroll at the start of a new scanline).

Loopy_V and Loopy_T are both 15 bits... and are referred to as 'v' and 't' in loopy's doc. 'd' in loopy's doc refers to the value being written to the register, and 'x' is the fine X-scroll value.

so in loopy's doc:
Code:
2000 write:
        t:0000110000000000=d:00000011


Means the low 2 bits of the value written to $2000, are written to bits 10 and 11 of Loopy_T (other bits in Loopy_T are unaffected).

Code:
2005 first write:
        t:0000000000011111=d:11111000
        x=d:00000111


means the high 5 bits of the written value get written to the low 5 bits of Loopy_T, and the low 3 bits of the written value set the fine X scroll.

And so on.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Oct 05, 2005 4:23 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 10:59 pm
Posts: 1389
Disch wrote:
Quietust, afaik, does things one cycle at a time... as in he runs the CPU for one cycle, then the PPU, then the APU, CPU, PPU, APU, etc... which makes it easier to do things with cycle-perfect accuracy.. however it is DREADFULLY slow, which is why Nintendulator demands a much more powerful computer to run than other emus do. (Feel free to correct me on this Q, that's just my understanding of how Nintendulator works.. I could very well be wrong).


For the most part, you are correct - the only detail is that while my CPU does emulate individual instruction cycles (and emulates the PPU/APU between each one), it is not capable of stopping in the middle of an instruction. The end result is effectively the same, however.

For example, the instruction "STA $4015" would do the following:
* Read opcode (LDA absolute) and update PPU+APU
* Read operand low byte ($15) and update PPU+APU
* Read operand high byte ($40) and update PPU+APU
* Write value in accumulator to $4015 and update PPU+APU

In my current code, I emulate the PPU+APU before the corresponding CPU cycle. The only down side is that this can cause some PPU updates (grayscale, colour emphasis, fine X scroll, palette change) to be up to 3 pixels off, which is negligible.

_________________
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Oct 06, 2005 3:00 am 
Offline

Joined: Thu Sep 15, 2005 9:23 am
Posts: 1194
Location: Behind you with a knife!
Disch wrote:
A pixel is not rendered on every PPU cycle. There are 341 PPU cycles per scanline... but only the first 256 of those cycles render pixels. The other cycles do other things.


I know about that. 256 Pixels are rendered and the rest of the CPU time is HBlank (about 28.3 cc's). What do the remaining PPU cycles do then?

_________________
http://www.jamesturner.de/


Top
 Profile  
 
 Post subject:
PostPosted: Thu Oct 06, 2005 4:48 am 
Offline
User avatar

Joined: Thu Mar 24, 2005 3:17 pm
Posts: 355
Quote:
What do the remaining PPU cycles do then?


A lot. Did you read Brad Taylor's NTSC 2C02 technical reference ? If you didn't yet, now's the time. It explains what the PPU cycles 'do'.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Oct 06, 2005 5:43 am 
Offline

Joined: Thu Sep 15, 2005 9:23 am
Posts: 1194
Location: Behind you with a knife!
hap wrote:
A lot. Did you read Brad Taylor's NTSC 2C02 technical reference ? If you didn't yet, now's the time. It explains what the PPU cycles 'do'.


Should I use that reference or this one http://www.nesworld.com/dev/ntscpput.txt

_________________
http://www.jamesturner.de/


Top
 Profile  
 
 Post subject:
PostPosted: Thu Oct 06, 2005 7:53 am 
Offline
User avatar

Joined: Wed Nov 10, 2004 6:47 pm
Posts: 1845
That doc seems to say the exact same thing... just cut down (a lot of other not-as-useful-for-emu-development information removed). I'd say either reference is fine... nothing in the two should contradict each other... at least not that I saw.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 28 posts ]  Go to page Previous  1, 2

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group