It is currently Wed Oct 18, 2017 5:18 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 52 posts ]  Go to page 1, 2, 3, 4  Next
Author Message
PostPosted: Thu Feb 04, 2016 8:46 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 555
I have a rom that reliably crashes. It's all written in C, so I can probably blame cc65, but I need to know why it crashes first.

The thing is, I'm on Linux. I have several NES emulators, but none of them offer debugging. And even if they did, my 6502 assembly skills are on the level "I can read it, if I google every single instruction and before-unseen syntax" :P

Sound stops playing. It looks like this (sometimes different colors, sometimes black screen), but keeps scrolling:
Image

I can post the code and the ROM, but I'd also like to know how you'd approach this.


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 8:50 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 555
This is the function that crashes, but only in special circumstances.

When run the first time, everything is perfect, and I can run the function 10 times in a loop without a crash. I can tell it to load any level, and it successfully does so.

However, when I load level 0 and "win the level", moving to level 1, it reliably crashes in this function, in the final ppu_on_all call (tracked using sound effects). It still crashes if I tell it to always load level 0. Very strange.

Code:
void loadlevel(const u8 num) {
   static u8 x, y, q, i, scrollx, scrolly, t, tx, ty, qx, qy;
   static s16 winx, winy, winxmax, winymax;
   static u16 bigx, bigy, maxx, maxy;

   ppu_off();

   LZ4_decompress_fast(complevels[num], (char *) levelbuf, 684);

   memfill(attr, 0, 64);

   lapsleft = laps[num];
   timeleft = times[num];
   leveldir = directions[num];

   bankswitch(levelbank[num]);

   vram_adr(NAMETABLE_A);
   vram_fill(0, 1024);

   // Where do we start?
   goalx = checkfinish[num * 4 + 0];
   goaly = checkfinish[num * 4 + 1];
   checkpointx = checkfinish[num * 4 + 2];
   checkpointy = checkfinish[num * 4 + 3];

   // Aim the camera so it centers on the finish line.
   camx = goalx * 64 + (256 - 128 + 32);
   camy = goaly * 64 + (256 - 120 + 32);

   scrollx = camx % 256;
   scrolly = camy % 240;
   scroll(scrollx, scrolly);

   scrollxoff = scrollx / 8;
   scrollyoff = scrolly / 8;

   // Init area. Which tiles are visible, and at which positions?
   maxy = camy + 256;
   maxx = camx + 272;

   winx = (camx - 256) / 8;
   winy = (camy - 256) / 8;
   winxmax = winx + 32;
   winymax = winy + 30;

   if (maxy > 2304)
      maxy = 2304;
   if (maxx > 2304)
      maxx = 2304;

   bigy = camy;
   if (bigy < 256)
      bigy = 256;

   for (; bigy <= maxy; bigy += 64) {
      y = (bigy - 256) / 64;

      bigx = camx;
      if (bigx < 256)
         bigx = 256;

      for (; bigx <= maxx; bigx += 64) {
         x = (bigx - 256) / 64;

         gqx = x;
         gqy = y;
         q = getquad();

         x *= 4;

         for (i = 0; i < 16; ++i) {
            const u8 tile = quads[q][i];

            qx = x + i % 4;
            qy = y * 4 + i / 4;

            qx *= 2;
            qy *= 2;

            for (t = 0; t < 4; ++t) {
               const u8 val = tiles[tile][t];

               tx = qx + t % 2;
               ty = qy + t / 2;

               if (tx < winx || tx >= winxmax)
                  continue;
               if (ty < winy || ty >= winymax)
                  continue;

               // tx and ty are hw-tile world coordinates,
               // 0-255. Where is our window?

               tx -= winx;
               ty -= winy;

               // Now in window space. Where's the hw window?

               tx = (tx + scrollxoff) % 32;
               ty = ty + scrollyoff;
               while (ty >= 30) ty -= 30;

               // Place the tile
               vram_adr(NTADR_A(tx, ty));
               vram_put(val);
            }
         }
      }
   }

   // Load palette for level
   pal_bg(levelpal);
   pal_spr(sprpal);

   pal_bright(4);
   ppu_on_all();
}


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 8:53 am 
Offline
User avatar

Joined: Sun Nov 09, 2008 9:18 pm
Posts: 983
Location: Pennsylvania, USA
In my experience, most of the time when I have a crash or glitchy screens, it's one of a few things:

-An address of something is wrong when loading chr or nametable data
-My bankswitching code has a bug or I just forgot to update a bank number in a table
-My vblank code is running too long and causing glitches by writing to the ppu outside of vblank
-Maybe I have a stack error (forgot a pla where I had a matching pha)
-Maybe I have a buffer overrun error and I'm trampling on variables outside of an array (this could easily happen in C, too, no safety net there). With local stack space in C, that could even mess up return addresses...eek! *edit* I see you are not using local stack space in your function, here, so this likely does not apply---could still apply with global arrays though. *edit* To debug this last type of bug, you can experiment with moving arrays around in RAM or changing the size of buffers to see if it affects the behavior of the bug.

*edit* I'm unfamiliar with neslib (assuming that is what you are using). Maybe you need to set a global vram address so the nmi routine resets the vram address and scroll correctly when the ppu is turned back on?

To debug, I can often reason about my code if it is organized enough, but when it is enough of a mystery I can often help narrow down the problem with git bisect (assuming I've been tracking many small changes since the beginning of the project) *edit* note, this only helps if it was previously working and has mysteriously broken. If it's your first revision, all you can do is reason about the code you've written and try to narrow down the problem some other way, possibly by trying simpler cases first. Maybe just load the palette and nothing else and verify it works, then bankswitch and see if it works, (use FCEUX ppu and memory viewer) keep introducing small bits of code until you observe the crash---to help find which code is a suspect for the source of your bug.

I'd be reluctant to blame cc65 itself...I'll let C users here comment further.


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 10:00 am 
Offline
User avatar

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 3943
From just the picture alone, it looks like it is repeatedly running the code to update graphics that would usually be in NMI.

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 10:06 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 555
Thanks for the suggestions. I don't think some of them apply though, they're asm-specific, which I'm not using.

The bankswitch is not responsible. When I move it to the start of main, things crash here the same. The PPU is off in this function, and every cpu-heavy function has been tested under valgrind on the host pc.

Quote:
*edit* I'm unfamiliar with neslib (assuming that is what you are using). Maybe you need to set a global vram address so the nmi routine resets the vram address and scroll correctly when the ppu is turned back on?


No need for that, I believe. Here's the part of its NMI routine dealing with scrolling:
Code:
        lda #0
        sta PPU_ADDR
        sta PPU_ADDR

        lda <SCROLL_X
        sta PPU_SCROLL
        lda <SCROLL_Y
        sta PPU_SCROLL

        lda <PPU_CTRL_VAR
        sta PPU_CTRL


Bisect is not useful, since this is working code, I just now added a second level. Also the Linux version of fceux does not have a memory or ppu viewer.

I've hit several valid cc65 compiler bugs, all of which I've reported and some are already fixed. It's the quality of this compiler that always leads me to suspect it first ;)


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 10:10 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 555
Dwedit wrote:
From just the picture alone, it looks like it is repeatedly running the code to update graphics that would usually be in NMI.


That's not possible. Well unless something corrupted the state it keeps for that in the ZP. Which I have no way of testing :(


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 10:17 am 
Offline
User avatar

Joined: Sun Nov 09, 2008 9:18 pm
Posts: 983
Location: Pennsylvania, USA
calima wrote:
I've hit several valid cc65 compiler bugs, all of which I've reported and some are already fixed. It's the quality of this compiler that always leads me to suspect it first ;)


Fascinating...further reinforces my recent decision to abandon C permanently and simply continue to build idioms (and make good non-overengineered use of macros) in 6502. The only suggestion I made that was asm specific was a stack error (specifically one involving a manual pha .... pla)---all those other things could easily happen in C code, just in different ways.


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 10:27 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 555
Well, I don't have any nmi/vblank code. It's all Shiru's, which is well tested, and I've used it myself in several games successfully.


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 10:35 am 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 1772
Location: DIGDUG
There are many ways a game can crash...
-infinite loop
-NMI within an NMI
-stack overflow
-bank switch and program calling data from the wrong bank
-corrupted C library variables
-corrupted C stack

Are you able to run an emulator with debugging features in Linux?

My first step would be to look at the RAM, see if the stack (100-1ff) is going crazy. Then I would export labels from the linker, and use that to locate the probable error function, set a breakpoint in an emulator, and go through it step by step (would help if you knew 6502 ASM).

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Last edited by dougeff on Thu Feb 04, 2016 11:22 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 10:53 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 555
AFAIK no emulator for Linux supports any debugging, though I've only tried the most popular ones. That's making this quite hard.


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 10:57 am 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 1772
Location: DIGDUG
Quote:
Well, I don't have any nmi/vblank code. It's all Shiru's, which is well tested, and I've used it myself in several games successfully.


(it looks to me like...)
Shiru's code uses NMI code to time VRAM/Palette changes (and music) while rendering is on. With rendering off 'ppu_off()' it skips doing that in the NMI code, and lets you make VRAM changes with the main code. (but it will still go to NMI code to do music updates).

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 10:57 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19097
Location: NE Indiana, USA (NTSC)
calima wrote:
Also the Linux version of fceux does not have a memory or ppu viewer.

Unless you're using Linux on a non-x86 platform, you can do what I did: sudo apt-get install wine and then use the Windows version.


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 11:10 am 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 1772
Location: DIGDUG
Well, it definitely looks to me like your game is repeatedly writing the same chunk of data to the PPU over and over while rendering is on. This would also give a 'scrolling' effect, as PPU changes also effect scroll position.

There's nothing in the code you posted that would indicate why this is so, because you have ppu_off(); near the top of the function.

Is there any other code that would also happen at level change, that writes to the PPU? Is it perhaps doing it while rendering is on?

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 11:52 am 
Offline
User avatar

Joined: Sun Nov 09, 2008 9:18 pm
Posts: 983
Location: Pennsylvania, USA
Here's another guess---are you decompressing a level/screen etc. into RAM and then decoding that into vram writes? If so---perhaps your nes.cfg allocates too much ram space to CC65 and this decompression is actually overwriting local/cc65 stack. I recall the default nes.cfg uses 3 pages, which is kinda huge. I asked about this in Efficiency of development process using C versus 6502 and the general response (rainwarrior's response actually) was it can be much smaller, perhaps as small as 64 or even 32 if you're careful. I imagine you already checked into this, just thought I'd throw that out there.


Top
 Profile  
 
PostPosted: Thu Feb 04, 2016 2:07 pm 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 555
tepples wrote:
Unless you're using Linux on a non-x86 platform, you can do what I did: sudo apt-get install wine and then use the Windows version.

Pure 64-bit, so Wine is not a possibility.

dougeff wrote:
Is there any other code that would also happen at level change, that writes to the PPU? Is it perhaps doing it while rendering is on?


No, that is all. And it baffles me how I can run it several times over with no ill effects, but run after a second of gameplay, that function crashes.

It's very curious how the full chain works the first time. The outer loop literally goes

Code:
while (1) {
    loadlevel(foo);
    gameplay();
}


The same path works the first time, but fails the second.

Quote:
Well, it definitely looks to me like your game is repeatedly writing the same chunk of data to the PPU over and over while rendering is on. This would also give a 'scrolling' effect, as PPU changes also effect scroll position.


Why would the music stop in that case? It would only be possible if there was an eternal loop in the NMI itself, so it never reached the music code.

GradualGames wrote:
Here's another guess---are you decompressing a level/screen etc. into RAM and then decoding that into vram writes? If so---perhaps your nes.cfg allocates too much ram space to CC65 and this decompression is actually overwriting local/cc65 stack. I recall the default nes.cfg uses 3 pages, which is kinda huge. I asked about this in Efficiency of development process using C versus 6502 and the general response (rainwarrior's response actually) was it can be much smaller, perhaps as small as 64 or even 32 if you're careful. I imagine you already checked into this, just thought I'd throw that out there.


The decompression goes into a statically allocated BSS buffer. There's almost 200 bytes free for cc65's stack, and no function uses more than a couple variables local stack. No deep call chains either. If it's stack corruption, it's not because of my code.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 52 posts ]  Go to page 1, 2, 3, 4  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Bing [Bot], tokumaru and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group