How to debug a nes crash?

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

How to debug a nes crash?

Post by calima »

I have a rom that reliably crashes. It's all written in C, so I can probably blame cc65, but I need to know why it crashes first.

The thing is, I'm on Linux. I have several NES emulators, but none of them offer debugging. And even if they did, my 6502 assembly skills are on the level "I can read it, if I google every single instruction and before-unseen syntax" :P

Sound stops playing. It looks like this (sometimes different colors, sometimes black screen), but keeps scrolling:
Image

I can post the code and the ROM, but I'd also like to know how you'd approach this.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: How to debug a nes crash?

Post by calima »

This is the function that crashes, but only in special circumstances.

When run the first time, everything is perfect, and I can run the function 10 times in a loop without a crash. I can tell it to load any level, and it successfully does so.

However, when I load level 0 and "win the level", moving to level 1, it reliably crashes in this function, in the final ppu_on_all call (tracked using sound effects). It still crashes if I tell it to always load level 0. Very strange.

Code: Select all

void loadlevel(const u8 num) {
	static u8 x, y, q, i, scrollx, scrolly, t, tx, ty, qx, qy;
	static s16 winx, winy, winxmax, winymax;
	static u16 bigx, bigy, maxx, maxy;

	ppu_off();

	LZ4_decompress_fast(complevels[num], (char *) levelbuf, 684);

	memfill(attr, 0, 64);

	lapsleft = laps[num];
	timeleft = times[num];
	leveldir = directions[num];

	bankswitch(levelbank[num]);

	vram_adr(NAMETABLE_A);
	vram_fill(0, 1024);

	// Where do we start?
	goalx = checkfinish[num * 4 + 0];
	goaly = checkfinish[num * 4 + 1];
	checkpointx = checkfinish[num * 4 + 2];
	checkpointy = checkfinish[num * 4 + 3];

	// Aim the camera so it centers on the finish line.
	camx = goalx * 64 + (256 - 128 + 32);
	camy = goaly * 64 + (256 - 120 + 32);

	scrollx = camx % 256;
	scrolly = camy % 240;
	scroll(scrollx, scrolly);

	scrollxoff = scrollx / 8;
	scrollyoff = scrolly / 8;

	// Init area. Which tiles are visible, and at which positions?
	maxy = camy + 256;
	maxx = camx + 272;

	winx = (camx - 256) / 8;
	winy = (camy - 256) / 8;
	winxmax = winx + 32;
	winymax = winy + 30;

	if (maxy > 2304)
		maxy = 2304;
	if (maxx > 2304)
		maxx = 2304;

	bigy = camy;
	if (bigy < 256)
		bigy = 256;

	for (; bigy <= maxy; bigy += 64) {
		y = (bigy - 256) / 64;

		bigx = camx;
		if (bigx < 256)
			bigx = 256;

		for (; bigx <= maxx; bigx += 64) {
			x = (bigx - 256) / 64;

			gqx = x;
			gqy = y;
			q = getquad();

			x *= 4;

			for (i = 0; i < 16; ++i) {
				const u8 tile = quads[q][i];

				qx = x + i % 4;
				qy = y * 4 + i / 4;

				qx *= 2;
				qy *= 2;

				for (t = 0; t < 4; ++t) {
					const u8 val = tiles[tile][t];

					tx = qx + t % 2;
					ty = qy + t / 2;

					if (tx < winx || tx >= winxmax)
						continue;
					if (ty < winy || ty >= winymax)
						continue;

					// tx and ty are hw-tile world coordinates,
					// 0-255. Where is our window?

					tx -= winx;
					ty -= winy;

					// Now in window space. Where's the hw window?

					tx = (tx + scrollxoff) % 32;
					ty = ty + scrollyoff;
					while (ty >= 30) ty -= 30;

					// Place the tile
					vram_adr(NTADR_A(tx, ty));
					vram_put(val);
				}
			}
		}
	}

	// Load palette for level
	pal_bg(levelpal);
	pal_spr(sprpal);

	pal_bright(4);
	ppu_on_all();
}
User avatar
GradualGames
Posts: 1106
Joined: Sun Nov 09, 2008 9:18 pm
Location: Pennsylvania, USA
Contact:

Re: How to debug a nes crash?

Post by GradualGames »

In my experience, most of the time when I have a crash or glitchy screens, it's one of a few things:

-An address of something is wrong when loading chr or nametable data
-My bankswitching code has a bug or I just forgot to update a bank number in a table
-My vblank code is running too long and causing glitches by writing to the ppu outside of vblank
-Maybe I have a stack error (forgot a pla where I had a matching pha)
-Maybe I have a buffer overrun error and I'm trampling on variables outside of an array (this could easily happen in C, too, no safety net there). With local stack space in C, that could even mess up return addresses...eek! *edit* I see you are not using local stack space in your function, here, so this likely does not apply---could still apply with global arrays though. *edit* To debug this last type of bug, you can experiment with moving arrays around in RAM or changing the size of buffers to see if it affects the behavior of the bug.

*edit* I'm unfamiliar with neslib (assuming that is what you are using). Maybe you need to set a global vram address so the nmi routine resets the vram address and scroll correctly when the ppu is turned back on?

To debug, I can often reason about my code if it is organized enough, but when it is enough of a mystery I can often help narrow down the problem with git bisect (assuming I've been tracking many small changes since the beginning of the project) *edit* note, this only helps if it was previously working and has mysteriously broken. If it's your first revision, all you can do is reason about the code you've written and try to narrow down the problem some other way, possibly by trying simpler cases first. Maybe just load the palette and nothing else and verify it works, then bankswitch and see if it works, (use FCEUX ppu and memory viewer) keep introducing small bits of code until you observe the crash---to help find which code is a suspect for the source of your bug.

I'd be reluctant to blame cc65 itself...I'll let C users here comment further.
User avatar
Dwedit
Posts: 4924
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Re: How to debug a nes crash?

Post by Dwedit »

From just the picture alone, it looks like it is repeatedly running the code to update graphics that would usually be in NMI.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: How to debug a nes crash?

Post by calima »

Thanks for the suggestions. I don't think some of them apply though, they're asm-specific, which I'm not using.

The bankswitch is not responsible. When I move it to the start of main, things crash here the same. The PPU is off in this function, and every cpu-heavy function has been tested under valgrind on the host pc.
*edit* I'm unfamiliar with neslib (assuming that is what you are using). Maybe you need to set a global vram address so the nmi routine resets the vram address and scroll correctly when the ppu is turned back on?
No need for that, I believe. Here's the part of its NMI routine dealing with scrolling:

Code: Select all

        lda #0
        sta PPU_ADDR
        sta PPU_ADDR

        lda <SCROLL_X
        sta PPU_SCROLL
        lda <SCROLL_Y
        sta PPU_SCROLL

        lda <PPU_CTRL_VAR
        sta PPU_CTRL
Bisect is not useful, since this is working code, I just now added a second level. Also the Linux version of fceux does not have a memory or ppu viewer.

I've hit several valid cc65 compiler bugs, all of which I've reported and some are already fixed. It's the quality of this compiler that always leads me to suspect it first ;)
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: How to debug a nes crash?

Post by calima »

Dwedit wrote:From just the picture alone, it looks like it is repeatedly running the code to update graphics that would usually be in NMI.
That's not possible. Well unless something corrupted the state it keeps for that in the ZP. Which I have no way of testing :(
User avatar
GradualGames
Posts: 1106
Joined: Sun Nov 09, 2008 9:18 pm
Location: Pennsylvania, USA
Contact:

Re: How to debug a nes crash?

Post by GradualGames »

calima wrote: I've hit several valid cc65 compiler bugs, all of which I've reported and some are already fixed. It's the quality of this compiler that always leads me to suspect it first ;)
Fascinating...further reinforces my recent decision to abandon C permanently and simply continue to build idioms (and make good non-overengineered use of macros) in 6502. The only suggestion I made that was asm specific was a stack error (specifically one involving a manual pha .... pla)---all those other things could easily happen in C code, just in different ways.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: How to debug a nes crash?

Post by calima »

Well, I don't have any nmi/vblank code. It's all Shiru's, which is well tested, and I've used it myself in several games successfully.
User avatar
dougeff
Posts: 3079
Joined: Fri May 08, 2015 7:17 pm

Re: How to debug a nes crash?

Post by dougeff »

There are many ways a game can crash...
-infinite loop
-NMI within an NMI
-stack overflow
-bank switch and program calling data from the wrong bank
-corrupted C library variables
-corrupted C stack

Are you able to run an emulator with debugging features in Linux?

My first step would be to look at the RAM, see if the stack (100-1ff) is going crazy. Then I would export labels from the linker, and use that to locate the probable error function, set a breakpoint in an emulator, and go through it step by step (would help if you knew 6502 ASM).
Last edited by dougeff on Thu Feb 04, 2016 11:22 am, edited 1 time in total.
nesdoug.com -- blog/tutorial on programming for the NES
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: How to debug a nes crash?

Post by calima »

AFAIK no emulator for Linux supports any debugging, though I've only tried the most popular ones. That's making this quite hard.
User avatar
dougeff
Posts: 3079
Joined: Fri May 08, 2015 7:17 pm

Re: How to debug a nes crash?

Post by dougeff »

Well, I don't have any nmi/vblank code. It's all Shiru's, which is well tested, and I've used it myself in several games successfully.
(it looks to me like...)
Shiru's code uses NMI code to time VRAM/Palette changes (and music) while rendering is on. With rendering off 'ppu_off()' it skips doing that in the NMI code, and lets you make VRAM changes with the main code. (but it will still go to NMI code to do music updates).
nesdoug.com -- blog/tutorial on programming for the NES
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: How to debug a nes crash?

Post by tepples »

calima wrote:Also the Linux version of fceux does not have a memory or ppu viewer.
Unless you're using Linux on a non-x86 platform, you can do what I did: sudo apt-get install wine and then use the Windows version.
User avatar
dougeff
Posts: 3079
Joined: Fri May 08, 2015 7:17 pm

Re: How to debug a nes crash?

Post by dougeff »

Well, it definitely looks to me like your game is repeatedly writing the same chunk of data to the PPU over and over while rendering is on. This would also give a 'scrolling' effect, as PPU changes also effect scroll position.

There's nothing in the code you posted that would indicate why this is so, because you have ppu_off(); near the top of the function.

Is there any other code that would also happen at level change, that writes to the PPU? Is it perhaps doing it while rendering is on?
nesdoug.com -- blog/tutorial on programming for the NES
User avatar
GradualGames
Posts: 1106
Joined: Sun Nov 09, 2008 9:18 pm
Location: Pennsylvania, USA
Contact:

Re: How to debug a nes crash?

Post by GradualGames »

Here's another guess---are you decompressing a level/screen etc. into RAM and then decoding that into vram writes? If so---perhaps your nes.cfg allocates too much ram space to CC65 and this decompression is actually overwriting local/cc65 stack. I recall the default nes.cfg uses 3 pages, which is kinda huge. I asked about this in Efficiency of development process using C versus 6502 and the general response (rainwarrior's response actually) was it can be much smaller, perhaps as small as 64 or even 32 if you're careful. I imagine you already checked into this, just thought I'd throw that out there.
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: How to debug a nes crash?

Post by calima »

tepples wrote:Unless you're using Linux on a non-x86 platform, you can do what I did: sudo apt-get install wine and then use the Windows version.
Pure 64-bit, so Wine is not a possibility.
dougeff wrote:Is there any other code that would also happen at level change, that writes to the PPU? Is it perhaps doing it while rendering is on?
No, that is all. And it baffles me how I can run it several times over with no ill effects, but run after a second of gameplay, that function crashes.

It's very curious how the full chain works the first time. The outer loop literally goes

Code: Select all

while (1) {
    loadlevel(foo);
    gameplay();
}
The same path works the first time, but fails the second.
Well, it definitely looks to me like your game is repeatedly writing the same chunk of data to the PPU over and over while rendering is on. This would also give a 'scrolling' effect, as PPU changes also effect scroll position.
Why would the music stop in that case? It would only be possible if there was an eternal loop in the NMI itself, so it never reached the music code.
GradualGames wrote:Here's another guess---are you decompressing a level/screen etc. into RAM and then decoding that into vram writes? If so---perhaps your nes.cfg allocates too much ram space to CC65 and this decompression is actually overwriting local/cc65 stack. I recall the default nes.cfg uses 3 pages, which is kinda huge. I asked about this in Efficiency of development process using C versus 6502 and the general response (rainwarrior's response actually) was it can be much smaller, perhaps as small as 64 or even 32 if you're careful. I imagine you already checked into this, just thought I'd throw that out there.
The decompression goes into a statically allocated BSS buffer. There's almost 200 bytes free for cc65's stack, and no function uses more than a couple variables local stack. No deep call chains either. If it's stack corruption, it's not because of my code.
Post Reply