What is WRONG with my PPU???

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

NerveGas
Posts: 29
Joined: Sun Sep 16, 2007 10:41 am

Post by NerveGas »

OK so (if rendering is on)...

if (S.PPU_Scanline == 0) S.vAddr = S.vAddr_Latch & 0x3FFF;

And after rendering the scanline's BG, but before HBLANK:

S.vAddr = S.vAddr & 0x41F | (S.vAddr_Latch & 0x41F);

This is in addition to my previous modifications to vAddr, correct? Did they all look right?

If I'm all set with vAddr updates, I presume now I should be calculating X and Y from this. First, let me show you how I'm doing it now:

1. Set S.PPU_SCROLL_X and S.PPU_SCROLL_Y using [the wrong] methods in the registers
2. Calculate the values that the DrawScanline function REALLY uses:


S.PPU_SCROLL_Y_BYTE = (S.PPU_SCROLL_Y >> 3);
S.PPU_SCROLL_Y_BIT = (S.PPU_SCROLL_Y & 7);
S.PPU_SCROLL_X_BYTE = (S.PPU_SCROLL_X >> 3);
S.PPU_SCROLL_X_BIT = (S.PPU_SCROLL_X & 7);

So, should I throw all of this away and do it some other way? If keeping the above code makes sense, then I just need to know how and when to develop S.PPU_SCROLL_X and S.PPU_SCROLL_Y.

Otherwise, I need to understand the whole caboodle.

I see the previous posts about how to get it from the vAddr, except for Fine-X (where is that btw)... should I be using that, and if so, when specifically?
NerveGas
Posts: 29
Joined: Sun Sep 16, 2007 10:41 am

Post by NerveGas »

Current state of my code:

Read Operations:

Code: Select all

    case 0x2000:  /* PPU */
        switch( wAddr )
        {
            case (0x2007): /* VRAM Read */
                if (S.vAddr <0x3F00) {
                    wScratch = S.vAddr;
                    wScratch &= 0x3FFF;
                    bScratch = S.PPU_R7;
                    S.PPU_R7 = W.PPUBANK[ wScratch >> 10 ][ wScratch & 0x3FF ];
                } else {
                    bScratch = W.PPUBANK[ wScratch >> 10 ] [ wScratch & 0x3FF ];
                }

                S.vAddr += (S.PPU_R0 & R0_INC_ADDR) ? 0x20 : 0x01;
                S.vAddr &= 0x3FFF;

                return bScratch;
                break;

            case (0x2004): /* SPR-RAM Read */
                return S.SPRRAM[ S.PPU_R3 ];
                break;

            case (0x2002):
                S.PPU_Latch_Flag = 0;
                return S.PPU_R2;
                break;

            default: /* $2000, $2001, $2003, $2005, $2006 */
                return S.PPU_R7;
                break;
        }
Write Operations:

Code: Select all

   case 0x2000:  /* PPU */
      switch ( wAddr )
      {
        case 0x2000: /* PPU Control Register 1 */
          S.PPU_R0 = byData;
          S.PPU_SP_Height     = (S.PPU_R0 & R0_SP_SIZE) ? 0x10 : 0x08;
          W.PPU_BG = (S.PPU_R0 & R0_BG_ADDR) ? S.ChrBuf + 0x4000 : S.ChrBuf;
          W.PPU_SP = (S.PPU_R0 & R0_SP_ADDR) ? S.ChrBuf + 0x4000 : S.ChrBuf;
          S.PPU_NameTable = NAME_TABLE0 + ( S.PPU_R0 & R0_NAME_ADDR );
          break;

        case 0x2001: /* PPU Control Register 2 */
          S.PPU_R1 = byData;
          break;

        case 0x2002: /* PPU Status - NOT WRITABLE */
          break;

        case 0x2003: /* Sprite RAM ADDR */
          S.PPU_R3 = byData;
          break;

        case 0x2004: /* Sprite RAM DATA */
          S.SPRRAM[ S.PPU_R3++ ] = byData;
          break;

        case 0x2005: /* Scroll Register */
          if (S.PPU_Latch_Flag ^= 1) {
            S.PPU_R5A = byData;
          } else {
            S.PPU_R5B = byData;
            if (S.PPU_R5B > 240)     
                S.PPU_NameTable ^= NAME_TABLE_V_MASK;
          }
          break;

        case 0x2006: /* VRAM Address Register */
          if (S.PPU_Latch_Flag ^= 1) {
             S.PPU_R6A = byData;

             S.vAddr_Latch = (S.vAddr_Latch & 0xFF)
                           | ((word) (byData & 0xFF) << 8);
          }
          else {
             S.PPU_R6B = byData;

             S.vAddr_Latch = (S.vAddr_Latch & 0xFF00)
                           | ((word) byData & 0xFF);
             S.vAddr = S.vAddr_Latch & 0x3FFF;

          }
          break;
Scroll development (called every scanline at the moment)

Code: Select all

void NESCore_Develop_Scroll_Values() {

  S.PPU_SCROLL_Y_BYTE = (S.vAddr >> 5) & 0x1F;
  S.PPU_SCROLL_Y_BIT  = (S.vAddr >> 12) & 0x07;

  S.PPU_SCROLL_X_BYTE = (S.vAddr & 0x1F);
  S.PPU_SCROLL_X_BIT  = (S.PPU_R5A & 0x7);
}

Within DrawScanline:

Code: Select all

 if (!( S.PPU_R1 & R1_SHOW_SCR ))
  {
    /* Clear scanline if display is off */
    memset( pPoint, 0, NES_DISP_WIDTH << 1 ); /* Assumes 16-Bit buffer! */
    Exec6502(&S.m6502_state, 85);
  }
  else
  {

    if (S.PPU_Scanline == 0)
        S.vAddr = S.vAddr_Latch & 0x3FFF;
    
    NESCore_Develop_Scroll_Values();

...
Before HBLANK:

Code: Select all

    S.vAddr = S.vAddr & 0x41F | (S.vAddr_Latch & 0x41F);
So far, things look wonky but I hope I'm close.
dvdmth
Posts: 354
Joined: Wed Mar 22, 2006 8:00 am

Post by dvdmth »

I strongly recommend that you read Brad Taylor's PPU document:

http://nesdev.com/2C02%20technical%20reference.TXT

This explains in detail how the NES PPU works, inclusing how the BG and sprite layers are rendered and when during a scanline/frame things take place.

Here's a summary of how the PPU address/scroll registers work:

Code: Select all

V = active VRAM address
T = VRAM address latch
X = fine X scroll register

$2000 Write:
T = (T & 0x0C00) | (data << 10);

$2005 First Write:
T = (T & 0x7FE0) | (data >> 3);
X = data & 0x07;

$2005 Second Write:
T = (T & 0x0C1F) | ((data & 0x07) << 12) | ((data & 0xF8) << 2);

$2006 First Write:
T = (T & 0x00FF) | ((data & 0x3F) << 8);

$2006 Second Write:
T = (T & 0x7F00) | data;
V = T;

$2007 Write:
if ((V & 0x3F00) == 0x3F00)
    palette[(V & 0x3) ? V & 0x1F : V & 0x0F] = data & 0x3F;
else
    PerformVRAMWrite(V & 0x3FFF, data);
V = (V + (($2000 & 0x04) ? 0x20 : 0x01) & 0x7FFF;

$2007 Read:
if ((V & 0x3F00) == 0x3F00)
    data = palette[(V & 0x3) ? V & 0x1F : V & 0x0F];
else
    data = readBuffer;
readBuffer = PerformVRAMRead(V & 0x3FFF);
V = (V + (($2000 & 0x04) ? 0x20 : 0x01) & 0x7FFF;

Vertical Reset:
V = T;

Horizontal Reset:
V = (V & 0x7BE0) | (T & 0x041F);

Vertical Increment:
V = (V + 0x1000) & 0x7FFF;
if ((V & 0x7000) == 0)
{
    V = (V & 0x7C1F) | ((V + 0x20) & 0x03E0);
    if ((V & 0x03E0) == 0x03C0)
        V = (V & 0x7C1F) ^ 0x0800);
}

Horizontal Increment:
V = (V & 0x7FE0) | ((V + 0x01) & 0x1F);
if ((V & 0x1F) == 0)
    V ^= 0x0400;
Vertical reset occurs at cycle 304 of the "dummy" scanline (in Brad Taylor's document, this is scanline 20). Horizontal reset occurs once per scanline, including the dummy scanline, at cycle 257 (right after H-Blank). Vertical increment occurs once per scanline as well, but at cycle 251 (shortly BEFORE H-Blank).

Horizontal increment occurs 34 times per scanline, at cycle 3 of each 8-cycle rotation (that is, at cycles 3, 11, 19, 27, etc., as well as at cycles 323 and 331 at the end of H-Blank).

It may not be possible to implement this and maintain full speed on devices such as the iPhone. However, numerous NES titles rely on this behavior to work properly, so if your goal is accuracy, you'll have to implement it. (Nestopia is an emulator that has this implemented, and it actually runs slower than some Super NES emulators, to give an idea.)
"Last version was better," says Floyd. "More bugs. Bugs make game fun."
NerveGas
Posts: 29
Joined: Sun Sep 16, 2007 10:41 am

Post by NerveGas »

Horizontal increment occurs 34 times per scanline, at cycle 3 of each 8-cycle rotation (that is, at cycles 3, 11, 19, 27, etc., as well as at cycles 323 and 331 at the end of H-Blank).
Is there a calculation that works on a per-pixel basis?
NerveGas
Posts: 29
Joined: Sun Sep 16, 2007 10:41 am

Post by NerveGas »

As far as accuracy goes, I'd just like rad racer, smb3, and others to work - I don't need to run obscure japanese games, etc.
Josh
Posts: 69
Joined: Sat Mar 19, 2005 11:18 am

Post by Josh »

You do not need a per-pixel renderer to fix the gross graphical errors in games like Rad Racer and Crystalis. That only requires implementing rendering per scanline using the VRAM address. If your target platform is the iPhone or other handheld devices, then per-pixel rendering would be far too slow anyway.
NerveGas
Posts: 29
Joined: Sun Sep 16, 2007 10:41 am

Post by NerveGas »

Right now i just want to use whatever's going to get most of the popular games to run. I don't think I'm going to have enough resources for a perfect emulator. But if you guys can help get rad racer and others working that would be great. This is the latest state of the code, I think it's sick...

Read Ops:

Code: Select all

    case 0x2000:  /* PPU */
        switch( wAddr )
        {
            case (0x2007): /* VRAM Read */
                if (S.vAddr <0x3F00) {
                    wScratch = S.vAddr;
                    wScratch &= 0x3FFF;
                    bScratch = S.PPU_R7;
                    S.PPU_R7 = W.PPUBANK[ wScratch >> 10 ][ wScratch & 0x3FF ];
                } else {
                    bScratch = W.PPUBANK[ wScratch >> 10 ] [ wScratch & 0x3FF ];
                }

                S.vAddr += (S.PPU_R0 & R0_INC_ADDR) ? 0x20 : 0x01;
                S.vAddr &= 0x3FFF;

                return bScratch;
                break;

            case (0x2004): /* SPR-RAM Read */
                return S.SPRRAM[ S.PPU_R3 ];
                break;

            case (0x2002):
                S.PPU_Latch_Flag = 0;
                ret = S.PPU_R2;
                S.PPU_R2 &= 0x7F;
                return ret;
                break;

            default: /* $2000, $2001, $2003, $2005, $2006 */
                return S.PPU_R7;
                break;
        }
        break;
Write Ops:

Code: Select all

    case 0x2000:  /* PPU */
      switch ( wAddr )
      {
        case 0x2000: /* PPU Control Register 1 */
          S.PPU_R0 = byData;
          S.PPU_SP_Height     = (S.PPU_R0 & R0_SP_SIZE) ? 0x10 : 0x08;^M
          W.PPU_BG = (S.PPU_R0 & R0_BG_ADDR) ? S.ChrBuf + 0x4000 : S.ChrBuf;^M
          W.PPU_SP = (S.PPU_R0 & R0_SP_ADDR) ? S.ChrBuf + 0x4000 : S.ChrBuf;^M
          S.PPU_NameTable = NAME_TABLE0 + ( S.PPU_R0 & R0_NAME_ADDR );^M
          S.vAddr_Latch = (S.vAddr_Latch & 0xF3FF) | ((word)(byData & 3) << 10);

          break;

        case 0x2001: /* PPU Control Register 2 */
          S.PPU_R1 = byData;
          break;

        case 0x2002: /* PPU Status - NOT WRITABLE */
          break;

        case 0x2003: /* Sprite RAM ADDR */
          S.PPU_R3 = byData;
          break;

        case 0x2004: /* Sprite RAM DATA */
          S.SPRRAM[S.PPU_R3] = byData;
          break;

        case 0x2005: /* Scroll Register */
          lowerBits = (byData & 7);
          upperBits = (byData >> 3);

          if (S.PPU_Latch_Flag) {
             if (byData > 240)
                S.PPU_NameTable ^= NAME_TABLE_V_MASK;
             S.vAddr_Latch = (S.vAddr_Latch & 0x8C1F) | (upperBits << 5) |
             (lowerBits << 12);
          }
          else {
             S.vAddr_Latch = (S.vAddr_Latch & 0xFFE0) | upperBits;
             S.PPU_R5A = lowerBits;
          }

          S.PPU_Latch_Flag ^= 1;
          break;
        case 0x2006: /* VRAM Address Register */
          if (S.PPU_Latch_Flag) {
             S.vAddr_Latch = (S.vAddr_Latch & 0xFF00) | byData;
             S.vAddr = S.vAddr_Latch;
          }
          else {
             S.vAddr_Latch = ((byData & 0x3F) << 8) | (S.vAddr_Latch & 0xFF);
          }
          S.PPU_Latch_Flag ^= 1;
          break;
        case 0x2007: /* VRAM Data */
        {
            wScratch = S.vAddr;
            wScratch &= 0x3FFF;

            S.vAddr += (S.PPU_R0 & R0_INC_ADDR) ? 0x20 : 0x01;
            if (S.vAddr > 0x3FFF)
                S.vAddr &= 0x3FFF;

            if (wScratch < 0x2000 && S.VRAMWriteEnable)
            {
              /* Pattern Data */
              S.ChrBufUpdate |= ( 1 << ( wScratch >> 10 ) );
              W.PPUBANK[ wScratch >> 10 ][ wScratch & 0x3FF ] = byData;
            }
            else if (wScratch < 0x3F00 )  /* 0x2000 - 0x3EFF */
            {
              /* Name Table and Mirror */
              W.PPUBANK[ (wScratch) >> 10 ][ wScratch & 0x3ff ] = byData;
              W.PPUBANK[ (wScratch ^ 0x1000) >> 10][ wScratch & 0x3FF ] = byData;
            }
            else if (!(wScratch & 0xF))  /* 0x3F00 or 0x3F10 */
            {
                /* Palette Mirror */
                S.PPURAM[ 0x3f10 ] = S.PPURAM[ 0x3f14 ] = S.PPURAM[ 0x3f18 ]
              = S.PPURAM[ 0x3f1c ] = S.PPURAM[ 0x3f00 ] = S.PPURAM[ 0x3f04 ]
              = S.PPURAM[ 0x3f08 ] = S.PPURAM[ 0x3f0c ] = byData;

                S.PalTable[ 0x00 ] = S.PalTable[ 0x04 ] = S.PalTable[ 0x08 ]
              = S.PalTable[ 0x0c ] = S.PalTable[ 0x10 ] = S.PalTable[ 0x14 ]
              = S.PalTable[ 0x18 ] = S.PalTable[ 0x1c ]
              = NesPalette[ byData ] | 0x8000;
            }
            else if (wScratch & 0x03)
            {
              /* Palette */
              S.PPURAM[ wScratch ] = byData;
              S.PalTable[ wScratch & 0x1f ] = NesPalette[ byData ];
            }
        }
        break;
    }
Scroll Registers:

Code: Select all

void NESCore_Develop_Scroll_Values() {

  S.PPU_SCROLL_Y_BYTE = (S.vAddr >> 5) & 0x1F;
  S.PPU_SCROLL_Y_BIT  = (S.vAddr >> 12) & 0x07;

  S.PPU_SCROLL_X_BYTE = (S.vAddr & 0x1F);
  S.PPU_SCROLL_X_BIT  = S.PPU_R5A;
}
Increments:

Code: Select all

  if (!( S.PPU_R1 & R1_SHOW_SCR ))
  {
    /* Clear scanline if display is off */
    memset( pPoint, 0, NES_DISP_WIDTH << 1 ); /* Assumes 16-Bit buffer! */
    Exec6502(&S.m6502_state, 85);
  }
  else
  {

    if (S.PPU_Scanline == 0)
        S.vAddr = S.vAddr_Latch & 0x7FFF;
...

End of HBLANK

    S.vAddr = (S.vAddr & 0xFBE0) | (S.vAddr_Latch & 0x41F);


The last increment there after the hblank seems to break a lot of stuff as well, although more appears to be broken.
NerveGas
Posts: 29
Joined: Sun Sep 16, 2007 10:41 am

Post by NerveGas »

I'm getting a lot closer; implemented horizontal / vertical increments. I think the biggest problem right now is selection of the proper name table. $2000 sets the name table, and $2005 flips it if bData > 240. I also flip it locally inside my rendering routines for vertical / horizontal mirroring. Is there some other way I should be doing it? I tried reloading it after each clock, but that doesn't seem to set it right either.

EDIT: It looks like nY is never > 29 any more now that I'm doing it right, so how can I tell if I need to flip the name table?

Code: Select all

   /* Develop Scroll Values */
    PPU_SCROLL_Y_BYTE = (S.vAddr >> 5) & 0x1F;
    PPU_SCROLL_Y_BIT  = (S.vAddr >> 12) & 0x07;
    PPU_SCROLL_X_BYTE = (S.vAddr & 0x1F);
    PPU_SCROLL_X_BIT  = S.FineX; 

    nY = PPU_SCROLL_Y_BYTE;
    nX = PPU_SCROLL_X_BYTE;
 
    nYBit = PPU_SCROLL_Y_BIT; 
    nYBit <<= 3;
 
    /* Name Table Selection: Flip for vertical scrolling */
    nNameTable = S.PPU_NameTable;
    if (nY > 29) {  <-------- NEVER TRUE
        nY -= 30;  
        nNameTable ^= NAME_TABLE_V_MASK;
    }
[/quote]
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples »

NerveGas wrote:EDIT: It looks like nY is never > 29 any more now that I'm doing it right, so how can I tell if I need to flip the name table?
You don't "flip" VRAM address bit 11 ($0800) when a "high" Y scroll value is written through $2005 or $2006. The hardware renders rows 30, 31, 0, 1, 2, 3, ... of a single nametable. Bit 11 changes during rendering only when the VRAM address updates after the bottom line of row 29.
NerveGas
Posts: 29
Joined: Sun Sep 16, 2007 10:41 am

Post by NerveGas »

Ah ok, so I get it from vAddr... how often should i update that, every scanline?
NerveGas
Posts: 29
Joined: Sun Sep 16, 2007 10:41 am

Post by NerveGas »

Awright well things are finally starting to shape up, SMB3 is running properly, and rad racer looks good except for a few glitches (some gaps, and tracks in the distance are choppy). Overall though, a lot of games are rendering properly now.

One strange thing that happened - rolling thunder and gi joe no longer run at all. They didn't run to begin with, after forking the project, but with a little massaging of the PPU, they did come up for a bit. Now I just get a blank screen once again, and I can tell they haven't initialized. Any ideas what might be causing this?
Josh
Posts: 69
Joined: Sat Mar 19, 2005 11:18 am

Post by Josh »

I'm having trouble figuring out a good way to get the code running on Windows. Are you doing all your testing on the iPhone?

What is the current speed of execution on the iPhone? Does it require frameskipping?
NerveGas
Posts: 29
Joined: Sun Sep 16, 2007 10:41 am

Post by NerveGas »

I think we figured out rolling thunder and GI joe...there's a crucial 6-steps between vblank and the NMI that need to get executed. We also weren't counting the 512 steps for DMA... fixing all that seemed to fix everything in the code.

On the iphone, it's auto-frameskip...i'm getting around 1 or 2 skip. It still runs pretty smooth.
Josh
Posts: 69
Joined: Sat Mar 19, 2005 11:18 am

Post by Josh »

I'm trying to think of ways that the rendering code might be optimized. It should be possible to draw 2 pixels per operation by using a short (16x4) look-up table to convert the planar bits to a 32-bit int. Do you happen to know if there's a penalty for unaligned DWORD-length writes on the iPhone's ARM CPU? That would make a difference in how such an algorithm would be implemented.
NerveGas
Posts: 29
Joined: Sun Sep 16, 2007 10:41 am

Post by NerveGas »

I'm not sure, but you can try it. Another method might be to try using OpenGL ES instead of CoreSurfaces. I suspect, though, that OpenGL would have to go through CoreSurfaces, so that may prove counter-intuitive.

I'd like to find ways to optimize the mapper 5 code...it appears to run _very_ slow. Making Castlevania III and other mapper 5 games run may prove more useful than optimizing video.

My current hit list is:
- Mapper 5 too slow / not completely right
- Mapper 119 (Which should be the same as mapper 4, but with RAM/ROM selection on bit 6); yet pinbot and high speed don't render properly.
- Rad Racer 2 has a completely mangled background
- When I set VBLANK START to 141 (where it should be) instead of 143, weird things happen in Mach Rider (screen flicker); trying to find the root of the problem.
- Star Wars doesn't initialize
- Cool Spot doesn't initialize
- Green background in Fantasy Zone
- Kung Fu and Rad Racer render OK, but due to (possibly) sprite hit stuff (which jodan is working on fixing), there is some distortion
- Punchout / Mike Tyson's Punchout has some rendering glitches on the title screen and in-between certain rounds (such as Von Kaiser)
- NES Scroll Test still fails vertically (might just be a lame test)
- TMNT 3 doesnt initialize
- Videomation appears to be broken (mapper problem?)
Post Reply