It is currently Fri Nov 17, 2017 9:48 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 34 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject:
PostPosted: Mon Sep 17, 2007 4:58 pm 
Offline

Joined: Sun Sep 16, 2007 10:41 am
Posts: 29
OK so (if rendering is on)...

if (S.PPU_Scanline == 0) S.vAddr = S.vAddr_Latch & 0x3FFF;

And after rendering the scanline's BG, but before HBLANK:

S.vAddr = S.vAddr & 0x41F | (S.vAddr_Latch & 0x41F);

This is in addition to my previous modifications to vAddr, correct? Did they all look right?

If I'm all set with vAddr updates, I presume now I should be calculating X and Y from this. First, let me show you how I'm doing it now:

1. Set S.PPU_SCROLL_X and S.PPU_SCROLL_Y using [the wrong] methods in the registers
2. Calculate the values that the DrawScanline function REALLY uses:


S.PPU_SCROLL_Y_BYTE = (S.PPU_SCROLL_Y >> 3);
S.PPU_SCROLL_Y_BIT = (S.PPU_SCROLL_Y & 7);
S.PPU_SCROLL_X_BYTE = (S.PPU_SCROLL_X >> 3);
S.PPU_SCROLL_X_BIT = (S.PPU_SCROLL_X & 7);

So, should I throw all of this away and do it some other way? If keeping the above code makes sense, then I just need to know how and when to develop S.PPU_SCROLL_X and S.PPU_SCROLL_Y.

Otherwise, I need to understand the whole caboodle.

I see the previous posts about how to get it from the vAddr, except for Fine-X (where is that btw)... should I be using that, and if so, when specifically?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 17, 2007 5:34 pm 
Offline

Joined: Sun Sep 16, 2007 10:41 am
Posts: 29
Current state of my code:

Read Operations:
Code:
    case 0x2000:  /* PPU */
        switch( wAddr )
        {
            case (0x2007): /* VRAM Read */
                if (S.vAddr <0x3F00) {
                    wScratch = S.vAddr;
                    wScratch &= 0x3FFF;
                    bScratch = S.PPU_R7;
                    S.PPU_R7 = W.PPUBANK[ wScratch >> 10 ][ wScratch & 0x3FF ];
                } else {
                    bScratch = W.PPUBANK[ wScratch >> 10 ] [ wScratch & 0x3FF ];
                }

                S.vAddr += (S.PPU_R0 & R0_INC_ADDR) ? 0x20 : 0x01;
                S.vAddr &= 0x3FFF;

                return bScratch;
                break;

            case (0x2004): /* SPR-RAM Read */
                return S.SPRRAM[ S.PPU_R3 ];
                break;

            case (0x2002):
                S.PPU_Latch_Flag = 0;
                return S.PPU_R2;
                break;

            default: /* $2000, $2001, $2003, $2005, $2006 */
                return S.PPU_R7;
                break;
        }


Write Operations:

Code:
   case 0x2000:  /* PPU */
      switch ( wAddr )
      {
        case 0x2000: /* PPU Control Register 1 */
          S.PPU_R0 = byData;
          S.PPU_SP_Height     = (S.PPU_R0 & R0_SP_SIZE) ? 0x10 : 0x08;
          W.PPU_BG = (S.PPU_R0 & R0_BG_ADDR) ? S.ChrBuf + 0x4000 : S.ChrBuf;
          W.PPU_SP = (S.PPU_R0 & R0_SP_ADDR) ? S.ChrBuf + 0x4000 : S.ChrBuf;
          S.PPU_NameTable = NAME_TABLE0 + ( S.PPU_R0 & R0_NAME_ADDR );
          break;

        case 0x2001: /* PPU Control Register 2 */
          S.PPU_R1 = byData;
          break;

        case 0x2002: /* PPU Status - NOT WRITABLE */
          break;

        case 0x2003: /* Sprite RAM ADDR */
          S.PPU_R3 = byData;
          break;

        case 0x2004: /* Sprite RAM DATA */
          S.SPRRAM[ S.PPU_R3++ ] = byData;
          break;

        case 0x2005: /* Scroll Register */
          if (S.PPU_Latch_Flag ^= 1) {
            S.PPU_R5A = byData;
          } else {
            S.PPU_R5B = byData;
            if (S.PPU_R5B > 240)     
                S.PPU_NameTable ^= NAME_TABLE_V_MASK;
          }
          break;

        case 0x2006: /* VRAM Address Register */
          if (S.PPU_Latch_Flag ^= 1) {
             S.PPU_R6A = byData;

             S.vAddr_Latch = (S.vAddr_Latch & 0xFF)
                           | ((word) (byData & 0xFF) << 8);
          }
          else {
             S.PPU_R6B = byData;

             S.vAddr_Latch = (S.vAddr_Latch & 0xFF00)
                           | ((word) byData & 0xFF);
             S.vAddr = S.vAddr_Latch & 0x3FFF;

          }
          break;


Scroll development (called every scanline at the moment)

Code:
void NESCore_Develop_Scroll_Values() {

  S.PPU_SCROLL_Y_BYTE = (S.vAddr >> 5) & 0x1F;
  S.PPU_SCROLL_Y_BIT  = (S.vAddr >> 12) & 0x07;

  S.PPU_SCROLL_X_BYTE = (S.vAddr & 0x1F);
  S.PPU_SCROLL_X_BIT  = (S.PPU_R5A & 0x7);
}



Within DrawScanline:
Code:
 if (!( S.PPU_R1 & R1_SHOW_SCR ))
  {
    /* Clear scanline if display is off */
    memset( pPoint, 0, NES_DISP_WIDTH << 1 ); /* Assumes 16-Bit buffer! */
    Exec6502(&S.m6502_state, 85);
  }
  else
  {

    if (S.PPU_Scanline == 0)
        S.vAddr = S.vAddr_Latch & 0x3FFF;
   
    NESCore_Develop_Scroll_Values();

...


Before HBLANK:

Code:
    S.vAddr = S.vAddr & 0x41F | (S.vAddr_Latch & 0x41F);


So far, things look wonky but I hope I'm close.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 17, 2007 6:20 pm 
Offline

Joined: Wed Mar 22, 2006 8:00 am
Posts: 354
I strongly recommend that you read Brad Taylor's PPU document:

http://nesdev.com/2C02%20techn ... erence.TXT

This explains in detail how the NES PPU works, inclusing how the BG and sprite layers are rendered and when during a scanline/frame things take place.

Here's a summary of how the PPU address/scroll registers work:

Code:
V = active VRAM address
T = VRAM address latch
X = fine X scroll register

$2000 Write:
T = (T & 0x0C00) | (data << 10);

$2005 First Write:
T = (T & 0x7FE0) | (data >> 3);
X = data & 0x07;

$2005 Second Write:
T = (T & 0x0C1F) | ((data & 0x07) << 12) | ((data & 0xF8) << 2);

$2006 First Write:
T = (T & 0x00FF) | ((data & 0x3F) << 8);

$2006 Second Write:
T = (T & 0x7F00) | data;
V = T;

$2007 Write:
if ((V & 0x3F00) == 0x3F00)
    palette[(V & 0x3) ? V & 0x1F : V & 0x0F] = data & 0x3F;
else
    PerformVRAMWrite(V & 0x3FFF, data);
V = (V + (($2000 & 0x04) ? 0x20 : 0x01) & 0x7FFF;

$2007 Read:
if ((V & 0x3F00) == 0x3F00)
    data = palette[(V & 0x3) ? V & 0x1F : V & 0x0F];
else
    data = readBuffer;
readBuffer = PerformVRAMRead(V & 0x3FFF);
V = (V + (($2000 & 0x04) ? 0x20 : 0x01) & 0x7FFF;

Vertical Reset:
V = T;

Horizontal Reset:
V = (V & 0x7BE0) | (T & 0x041F);

Vertical Increment:
V = (V + 0x1000) & 0x7FFF;
if ((V & 0x7000) == 0)
{
    V = (V & 0x7C1F) | ((V + 0x20) & 0x03E0);
    if ((V & 0x03E0) == 0x03C0)
        V = (V & 0x7C1F) ^ 0x0800);
}

Horizontal Increment:
V = (V & 0x7FE0) | ((V + 0x01) & 0x1F);
if ((V & 0x1F) == 0)
    V ^= 0x0400;


Vertical reset occurs at cycle 304 of the "dummy" scanline (in Brad Taylor's document, this is scanline 20). Horizontal reset occurs once per scanline, including the dummy scanline, at cycle 257 (right after H-Blank). Vertical increment occurs once per scanline as well, but at cycle 251 (shortly BEFORE H-Blank).

Horizontal increment occurs 34 times per scanline, at cycle 3 of each 8-cycle rotation (that is, at cycles 3, 11, 19, 27, etc., as well as at cycles 323 and 331 at the end of H-Blank).

It may not be possible to implement this and maintain full speed on devices such as the iPhone. However, numerous NES titles rely on this behavior to work properly, so if your goal is accuracy, you'll have to implement it. (Nestopia is an emulator that has this implemented, and it actually runs slower than some Super NES emulators, to give an idea.)

_________________
"Last version was better," says Floyd. "More bugs. Bugs make game fun."


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 17, 2007 6:40 pm 
Offline

Joined: Sun Sep 16, 2007 10:41 am
Posts: 29
Quote:
Horizontal increment occurs 34 times per scanline, at cycle 3 of each 8-cycle rotation (that is, at cycles 3, 11, 19, 27, etc., as well as at cycles 323 and 331 at the end of H-Blank).


Is there a calculation that works on a per-pixel basis?


Top
 Profile  
 
 Post subject:
PostPosted: Mon Sep 17, 2007 7:31 pm 
Offline

Joined: Sun Sep 16, 2007 10:41 am
Posts: 29
As far as accuracy goes, I'd just like rad racer, smb3, and others to work - I don't need to run obscure japanese games, etc.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 18, 2007 4:54 am 
Offline

Joined: Sat Mar 19, 2005 11:18 am
Posts: 69
You do not need a per-pixel renderer to fix the gross graphical errors in games like Rad Racer and Crystalis. That only requires implementing rendering per scanline using the VRAM address. If your target platform is the iPhone or other handheld devices, then per-pixel rendering would be far too slow anyway.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 18, 2007 5:18 am 
Offline

Joined: Sun Sep 16, 2007 10:41 am
Posts: 29
Right now i just want to use whatever's going to get most of the popular games to run. I don't think I'm going to have enough resources for a perfect emulator. But if you guys can help get rad racer and others working that would be great. This is the latest state of the code, I think it's sick...

Read Ops:
Code:
    case 0x2000:  /* PPU */
        switch( wAddr )
        {
            case (0x2007): /* VRAM Read */
                if (S.vAddr <0x3F00) {
                    wScratch = S.vAddr;
                    wScratch &= 0x3FFF;
                    bScratch = S.PPU_R7;
                    S.PPU_R7 = W.PPUBANK[ wScratch >> 10 ][ wScratch & 0x3FF ];
                } else {
                    bScratch = W.PPUBANK[ wScratch >> 10 ] [ wScratch & 0x3FF ];
                }

                S.vAddr += (S.PPU_R0 & R0_INC_ADDR) ? 0x20 : 0x01;
                S.vAddr &= 0x3FFF;

                return bScratch;
                break;

            case (0x2004): /* SPR-RAM Read */
                return S.SPRRAM[ S.PPU_R3 ];
                break;

            case (0x2002):
                S.PPU_Latch_Flag = 0;
                ret = S.PPU_R2;
                S.PPU_R2 &= 0x7F;
                return ret;
                break;

            default: /* $2000, $2001, $2003, $2005, $2006 */
                return S.PPU_R7;
                break;
        }
        break;


Write Ops:

Code:
    case 0x2000:  /* PPU */
      switch ( wAddr )
      {
        case 0x2000: /* PPU Control Register 1 */
          S.PPU_R0 = byData;
          S.PPU_SP_Height     = (S.PPU_R0 & R0_SP_SIZE) ? 0x10 : 0x08;^M
          W.PPU_BG = (S.PPU_R0 & R0_BG_ADDR) ? S.ChrBuf + 0x4000 : S.ChrBuf;^M
          W.PPU_SP = (S.PPU_R0 & R0_SP_ADDR) ? S.ChrBuf + 0x4000 : S.ChrBuf;^M
          S.PPU_NameTable = NAME_TABLE0 + ( S.PPU_R0 & R0_NAME_ADDR );^M
          S.vAddr_Latch = (S.vAddr_Latch & 0xF3FF) | ((word)(byData & 3) << 10);

          break;

        case 0x2001: /* PPU Control Register 2 */
          S.PPU_R1 = byData;
          break;

        case 0x2002: /* PPU Status - NOT WRITABLE */
          break;

        case 0x2003: /* Sprite RAM ADDR */
          S.PPU_R3 = byData;
          break;

        case 0x2004: /* Sprite RAM DATA */
          S.SPRRAM[S.PPU_R3] = byData;
          break;

        case 0x2005: /* Scroll Register */
          lowerBits = (byData & 7);
          upperBits = (byData >> 3);

          if (S.PPU_Latch_Flag) {
             if (byData > 240)
                S.PPU_NameTable ^= NAME_TABLE_V_MASK;
             S.vAddr_Latch = (S.vAddr_Latch & 0x8C1F) | (upperBits << 5) |
             (lowerBits << 12);
          }
          else {
             S.vAddr_Latch = (S.vAddr_Latch & 0xFFE0) | upperBits;
             S.PPU_R5A = lowerBits;
          }

          S.PPU_Latch_Flag ^= 1;
          break;
        case 0x2006: /* VRAM Address Register */
          if (S.PPU_Latch_Flag) {
             S.vAddr_Latch = (S.vAddr_Latch & 0xFF00) | byData;
             S.vAddr = S.vAddr_Latch;
          }
          else {
             S.vAddr_Latch = ((byData & 0x3F) << 8) | (S.vAddr_Latch & 0xFF);
          }
          S.PPU_Latch_Flag ^= 1;
          break;
        case 0x2007: /* VRAM Data */
        {
            wScratch = S.vAddr;
            wScratch &= 0x3FFF;

            S.vAddr += (S.PPU_R0 & R0_INC_ADDR) ? 0x20 : 0x01;
            if (S.vAddr > 0x3FFF)
                S.vAddr &= 0x3FFF;

            if (wScratch < 0x2000 && S.VRAMWriteEnable)
            {
              /* Pattern Data */
              S.ChrBufUpdate |= ( 1 << ( wScratch >> 10 ) );
              W.PPUBANK[ wScratch >> 10 ][ wScratch & 0x3FF ] = byData;
            }
            else if (wScratch < 0x3F00 )  /* 0x2000 - 0x3EFF */
            {
              /* Name Table and Mirror */
              W.PPUBANK[ (wScratch) >> 10 ][ wScratch & 0x3ff ] = byData;
              W.PPUBANK[ (wScratch ^ 0x1000) >> 10][ wScratch & 0x3FF ] = byData;
            }
            else if (!(wScratch & 0xF))  /* 0x3F00 or 0x3F10 */
            {
                /* Palette Mirror */
                S.PPURAM[ 0x3f10 ] = S.PPURAM[ 0x3f14 ] = S.PPURAM[ 0x3f18 ]
              = S.PPURAM[ 0x3f1c ] = S.PPURAM[ 0x3f00 ] = S.PPURAM[ 0x3f04 ]
              = S.PPURAM[ 0x3f08 ] = S.PPURAM[ 0x3f0c ] = byData;

                S.PalTable[ 0x00 ] = S.PalTable[ 0x04 ] = S.PalTable[ 0x08 ]
              = S.PalTable[ 0x0c ] = S.PalTable[ 0x10 ] = S.PalTable[ 0x14 ]
              = S.PalTable[ 0x18 ] = S.PalTable[ 0x1c ]
              = NesPalette[ byData ] | 0x8000;
            }
            else if (wScratch & 0x03)
            {
              /* Palette */
              S.PPURAM[ wScratch ] = byData;
              S.PalTable[ wScratch & 0x1f ] = NesPalette[ byData ];
            }
        }
        break;
    }


Scroll Registers:

Code:
void NESCore_Develop_Scroll_Values() {

  S.PPU_SCROLL_Y_BYTE = (S.vAddr >> 5) & 0x1F;
  S.PPU_SCROLL_Y_BIT  = (S.vAddr >> 12) & 0x07;

  S.PPU_SCROLL_X_BYTE = (S.vAddr & 0x1F);
  S.PPU_SCROLL_X_BIT  = S.PPU_R5A;
}


Increments:

Code:
  if (!( S.PPU_R1 & R1_SHOW_SCR ))
  {
    /* Clear scanline if display is off */
    memset( pPoint, 0, NES_DISP_WIDTH << 1 ); /* Assumes 16-Bit buffer! */
    Exec6502(&S.m6502_state, 85);
  }
  else
  {

    if (S.PPU_Scanline == 0)
        S.vAddr = S.vAddr_Latch & 0x7FFF;
...

End of HBLANK

    S.vAddr = (S.vAddr & 0xFBE0) | (S.vAddr_Latch & 0x41F);




The last increment there after the hblank seems to break a lot of stuff as well, although more appears to be broken.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 18, 2007 7:46 am 
Offline

Joined: Sun Sep 16, 2007 10:41 am
Posts: 29
I'm getting a lot closer; implemented horizontal / vertical increments. I think the biggest problem right now is selection of the proper name table. $2000 sets the name table, and $2005 flips it if bData > 240. I also flip it locally inside my rendering routines for vertical / horizontal mirroring. Is there some other way I should be doing it? I tried reloading it after each clock, but that doesn't seem to set it right either.

EDIT: It looks like nY is never > 29 any more now that I'm doing it right, so how can I tell if I need to flip the name table?

Code:
   /* Develop Scroll Values */
    PPU_SCROLL_Y_BYTE = (S.vAddr >> 5) & 0x1F;
    PPU_SCROLL_Y_BIT  = (S.vAddr >> 12) & 0x07;
    PPU_SCROLL_X_BYTE = (S.vAddr & 0x1F);
    PPU_SCROLL_X_BIT  = S.FineX;

    nY = PPU_SCROLL_Y_BYTE;
    nX = PPU_SCROLL_X_BYTE;
 
    nYBit = PPU_SCROLL_Y_BIT;
    nYBit <<= 3;
 
    /* Name Table Selection: Flip for vertical scrolling */
    nNameTable = S.PPU_NameTable;
    if (nY > 29) {  <-------- NEVER TRUE
        nY -= 30; 
        nNameTable ^= NAME_TABLE_V_MASK;
    }
[/quote]


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 18, 2007 8:24 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19222
Location: NE Indiana, USA (NTSC)
NerveGas wrote:
EDIT: It looks like nY is never > 29 any more now that I'm doing it right, so how can I tell if I need to flip the name table?

You don't "flip" VRAM address bit 11 ($0800) when a "high" Y scroll value is written through $2005 or $2006. The hardware renders rows 30, 31, 0, 1, 2, 3, ... of a single nametable. Bit 11 changes during rendering only when the VRAM address updates after the bottom line of row 29.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 18, 2007 8:38 am 
Offline

Joined: Sun Sep 16, 2007 10:41 am
Posts: 29
Ah ok, so I get it from vAddr... how often should i update that, every scanline?


Top
 Profile  
 
 Post subject:
PostPosted: Tue Sep 18, 2007 7:02 pm 
Offline

Joined: Sun Sep 16, 2007 10:41 am
Posts: 29
Awright well things are finally starting to shape up, SMB3 is running properly, and rad racer looks good except for a few glitches (some gaps, and tracks in the distance are choppy). Overall though, a lot of games are rendering properly now.

One strange thing that happened - rolling thunder and gi joe no longer run at all. They didn't run to begin with, after forking the project, but with a little massaging of the PPU, they did come up for a bit. Now I just get a blank screen once again, and I can tell they haven't initialized. Any ideas what might be causing this?


Top
 Profile  
 
 Post subject:
PostPosted: Fri Sep 21, 2007 9:20 pm 
Offline

Joined: Sat Mar 19, 2005 11:18 am
Posts: 69
I'm having trouble figuring out a good way to get the code running on Windows. Are you doing all your testing on the iPhone?

What is the current speed of execution on the iPhone? Does it require frameskipping?


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 22, 2007 8:46 am 
Offline

Joined: Sun Sep 16, 2007 10:41 am
Posts: 29
I think we figured out rolling thunder and GI joe...there's a crucial 6-steps between vblank and the NMI that need to get executed. We also weren't counting the 512 steps for DMA... fixing all that seemed to fix everything in the code.

On the iphone, it's auto-frameskip...i'm getting around 1 or 2 skip. It still runs pretty smooth.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 22, 2007 2:01 pm 
Offline

Joined: Sat Mar 19, 2005 11:18 am
Posts: 69
I'm trying to think of ways that the rendering code might be optimized. It should be possible to draw 2 pixels per operation by using a short (16x4) look-up table to convert the planar bits to a 32-bit int. Do you happen to know if there's a penalty for unaligned DWORD-length writes on the iPhone's ARM CPU? That would make a difference in how such an algorithm would be implemented.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Sep 22, 2007 2:59 pm 
Offline

Joined: Sun Sep 16, 2007 10:41 am
Posts: 29
I'm not sure, but you can try it. Another method might be to try using OpenGL ES instead of CoreSurfaces. I suspect, though, that OpenGL would have to go through CoreSurfaces, so that may prove counter-intuitive.

I'd like to find ways to optimize the mapper 5 code...it appears to run _very_ slow. Making Castlevania III and other mapper 5 games run may prove more useful than optimizing video.

My current hit list is:
- Mapper 5 too slow / not completely right
- Mapper 119 (Which should be the same as mapper 4, but with RAM/ROM selection on bit 6); yet pinbot and high speed don't render properly.
- Rad Racer 2 has a completely mangled background
- When I set VBLANK START to 141 (where it should be) instead of 143, weird things happen in Mach Rider (screen flicker); trying to find the root of the problem.
- Star Wars doesn't initialize
- Cool Spot doesn't initialize
- Green background in Fantasy Zone
- Kung Fu and Rad Racer render OK, but due to (possibly) sprite hit stuff (which jodan is working on fixing), there is some distortion
- Punchout / Mike Tyson's Punchout has some rendering glitches on the title screen and in-between certain rounds (such as Von Kaiser)
- NES Scroll Test still fails vertically (might just be a lame test)
- TMNT 3 doesnt initialize
- Videomation appears to be broken (mapper problem?)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 34 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group