It is currently Thu Oct 19, 2017 5:58 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Mon Oct 24, 2005 3:37 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
This is a continuation of a previous thread where efficient implementation of the "more than 8 sprites on a scanline encountered" flag, bit 5 of $2002 when read. Here's an elaboration of the idea I sketched out. Basically you keep a cache of how many sprites are on each scanline, and recalculate it whenever you access it after it might have become invalid. I'm going to try this in my emulator and report on the performance impact (as compared to not implementing the flag at all).

Code:
int scanlines [256 + 16]; // extra lines allow elimination of range checking
bool scanlines_valid;

void calc_scanlines()
{
    scanlines_valid = true;
    memset( scanlines, 0, sizeof scanlines ); // clear counts
    for ( int i = 0; i < 64; i++ )
    {
        int top = sprite_ram [i * 4] + 1;
        for ( int line = top; line < height; line++ )
            scanlines [top + line]++;
    }
}

void write_2000( int data )
{
    if ( (w2000 ^ data) & 0x20 )
        scanlines_valid = false; // sprite height changed
    ...
    w2000 = data;
}

void write_2001( int data )
{
    if ( (w2001 ^ data) & 0x10 )
        scanlines_valid = false; // sprite visibility changed
    ...
    w2001 = data;
}

// (same for 0x4014)
void write_2004( int data )
{
    if ( (w2003 & 3) == 0 )
        scanlines_valid = false; // might have modified vertical position
    ...
    w2003 = data;
}

int read_2002( long timestamp )
{
    if ( !r2002 & 0x20 )
    {
        // max sprites flag not yet set

        if ( !scanlines_valid )
            calc_scanlines();
       
        if ( scanlines [timestamp / 341] > 8 )
            run_ppu_until( time ); // may set max sprites flag
    }
   
    ...
    return r2002;
}


Top
 Profile  
 
 Post subject:
PostPosted: Mon Oct 24, 2005 5:46 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Preliminary results are fairly good. On average, this slows emulation by 3-8% (as compared to no max sprite checking). Emulation speed is halved in the worst case where a continuous loop constantly toggles sprite height in $2000 and reads $2002 each time. Absolute numbers are ~1.39 msec/frame without max sprite checking, average ~1.47 msec/frame with max sprite checking (on a 400 MHz PowerMac).

I should profile games that poll $2002 to get an idea of what pattern is best to optimize for. I doubt the worst-case is a good one to tune to. For example, it might be that most games start polling $2002 within a scanline or two of when they expect it to be set, which would call for something simpler than the scanlines array.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Oct 24, 2005 6:25 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 10:59 pm
Posts: 1389
I should point out to you that this is only good for "Normal" sprite overflow emulation. "Correct" sprite overflow emulation is much, much stranger, due to a flaw in the PPU hardware which causes it to, once exactly 8 sprites have been found, evaluate sprite TILE, FLAGS, and X-coordinate as though they were Y coordinates. More details can be found in the wiki.

_________________
Quietust, QMT Productions
P.S. If you don't get this note, let me know and I'll write you another.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Oct 24, 2005 8:18 pm 
Online

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19100
Location: NE Indiana, USA (NTSC)
I had that in mind when I made my suggestion. Even in bug-for-bug compatibility mode, a delayed PPU engine can still benefit from storing which scanlines have 8 sprites because you need at least 8 sprites to trigger that weird diagonal search pattern.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 25, 2005 10:31 am 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
I profiled $2002 polling behavior of several older games that didn't have a scanline IRQ (and a few that did). None of the games modified sprite height or sprite RAM mid-frame, so they only required one pass through sprites to determine any lines where sprite max might be set. For each game, the number of CPU clocks from the beginning of the frame to the first $2002 read is listed, then delays between further $2002 reads. NxRRR means N repeated RRR times, i.e. in a polling loop that reads $2002 every N CPU clocks. N1,N2xRRR means N1 N2 repeated RRR times.

Code:
SMB             2576 1653 9x177
SMB2            29781
SMB3            24329 4 104 17 5326
Zelda           8784 20996
Zelda 2         4328 7x214 9405 14543
Rod Land        2711 27069
RC Pro-Am       10820 8x1460 97 910 6265
Gimmick!        24190 5591
Fester's Quest  29780
Battletoads     4196 905 7x111 23896
Rygar           14650 15130
Mega Man        29781
Section-Z       3103 9x240 655 51 23802
Snake's Revenge 13371 9x1327 160 4298
Castlevania     2428 13,8x208 13 8 229 84 22735
Goonies II      3568 8,13x179 330 22124
Rambo           2424 9x387 23865
Guardian Legend 11 7x180 (sometimes 500)


I also optimized and reimplemented the sprite max algorithm to properly handle the weird search pattern for the 9th sprite. On most games, it slows emulation by 0.3-0.7%. In the elevator in Snake's Revenge it slowed by almost 2% due to the large number of scanlines with 8 sprites.

And as Disch said, it doesn't require doing any PPU rendering at all. You just clear out the scanlines array, go through all sprites and increment the counts on the scanlines each falls on, then check any scanlines with 8 or more sprites using the algorithm described on the wiki page. Repeat mid-frame if sprite height or RAM is changed.

If you weren't emulating the odd search pattern for the 9th sprite, the algorithm would be almost exactly the same, except you'd stop on the first scanline with 9 or more sprites. You'd still need to find which sprite is the 9th, since the time the flag is set is based on the sprite number.

Bee 52 is the only game I have found that needs this flag implemented. Can anyone point me to any other games/demos that rely on it? Or maybe write something to provide more justification for implementing this flag.
other games/demos that rely on it? Or maybe write something to provide more justification for implementing this flag.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 25, 2005 12:49 pm 
Online

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19100
Location: NE Indiana, USA (NTSC)
Did you test any games that use a split screen effect, such as Bigfoot, The Three Stooges, and Spy vs. Spy?


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 25, 2005 1:14 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Bigfoot, The Three Stooges, and Spy vs. Spy didn't use the max sprites flag for their split-screen effects (I'm assuming they used sprite hit). I'll have to try some of the more obscure Codemasters games.

I played around a bit with the odd PPU search pattern for the 9th sprite. It seems that the flag isn't even useful as a way to find if any sprites were (partially) hidden due to exceeding the maximum per scanline. It doesn't even seem reliable as a secondary equivalent to sprite 0 hit for finding when a particular scanline is hit, since it might get set on any scanline with 8 sprites and particular tile/attribute/x values that the search pattern looks at. It would work for marking the end of a status display, since there you can easily ensure that there are always 7 or less sprites per scanline.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 25, 2005 1:26 pm 
Online

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19100
Location: NE Indiana, USA (NTSC)
blargg wrote:
Bigfoot, The Three Stooges, and Spy vs. Spy didn't use the max sprites flag for their split-screen effects (I'm assuming they used sprite hit).

But did they rewrite the sprite memory in the middle?


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 25, 2005 1:53 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Just tested them again and none write to sprite memory mid-frame. I also tested Stunt Kids, another one with split-screen scrolling. I'm going to verify that you can actually rewrite sprites mid-frame, then perhaps have my tool scan all 3500+ ROMs for instances of this.

What's the significance of this? The algorithm works perfectly well if sprite RAM is rewritten mid-frame, it just needs to 1) set the max sprites flag if that event already occurred, and 2) recalculate the sprites per line for the remaining scanlines.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 25, 2005 2:23 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Sorry, I was being quite sloppy and relying on the max sprites algorithm behavior rather than writing a couple of lines that print when $2004 or $4014 are written to. Adding the latter showed that Stunt Kids does rewrite sprites mid-frame. I also wrote a PPU test that puts a large 8x8 block of sprites at the top of the screen, waits 15000 clocks, then DMAs another 8x8 block of them with their Y coordinates lower on screen, and it works on my NES. It's kind of cool, and the DMA only takes 4.5 scanlines.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group