How to debug a nes crash?
Moderator: Moderators
How to debug a nes crash?
I have a rom that reliably crashes. It's all written in C, so I can probably blame cc65, but I need to know why it crashes first.
The thing is, I'm on Linux. I have several NES emulators, but none of them offer debugging. And even if they did, my 6502 assembly skills are on the level "I can read it, if I google every single instruction and before-unseen syntax"
Sound stops playing. It looks like this (sometimes different colors, sometimes black screen), but keeps scrolling:
I can post the code and the ROM, but I'd also like to know how you'd approach this.
The thing is, I'm on Linux. I have several NES emulators, but none of them offer debugging. And even if they did, my 6502 assembly skills are on the level "I can read it, if I google every single instruction and before-unseen syntax"
Sound stops playing. It looks like this (sometimes different colors, sometimes black screen), but keeps scrolling:
I can post the code and the ROM, but I'd also like to know how you'd approach this.
Re: How to debug a nes crash?
This is the function that crashes, but only in special circumstances.
When run the first time, everything is perfect, and I can run the function 10 times in a loop without a crash. I can tell it to load any level, and it successfully does so.
However, when I load level 0 and "win the level", moving to level 1, it reliably crashes in this function, in the final ppu_on_all call (tracked using sound effects). It still crashes if I tell it to always load level 0. Very strange.
When run the first time, everything is perfect, and I can run the function 10 times in a loop without a crash. I can tell it to load any level, and it successfully does so.
However, when I load level 0 and "win the level", moving to level 1, it reliably crashes in this function, in the final ppu_on_all call (tracked using sound effects). It still crashes if I tell it to always load level 0. Very strange.
Code: Select all
void loadlevel(const u8 num) {
static u8 x, y, q, i, scrollx, scrolly, t, tx, ty, qx, qy;
static s16 winx, winy, winxmax, winymax;
static u16 bigx, bigy, maxx, maxy;
ppu_off();
LZ4_decompress_fast(complevels[num], (char *) levelbuf, 684);
memfill(attr, 0, 64);
lapsleft = laps[num];
timeleft = times[num];
leveldir = directions[num];
bankswitch(levelbank[num]);
vram_adr(NAMETABLE_A);
vram_fill(0, 1024);
// Where do we start?
goalx = checkfinish[num * 4 + 0];
goaly = checkfinish[num * 4 + 1];
checkpointx = checkfinish[num * 4 + 2];
checkpointy = checkfinish[num * 4 + 3];
// Aim the camera so it centers on the finish line.
camx = goalx * 64 + (256 - 128 + 32);
camy = goaly * 64 + (256 - 120 + 32);
scrollx = camx % 256;
scrolly = camy % 240;
scroll(scrollx, scrolly);
scrollxoff = scrollx / 8;
scrollyoff = scrolly / 8;
// Init area. Which tiles are visible, and at which positions?
maxy = camy + 256;
maxx = camx + 272;
winx = (camx - 256) / 8;
winy = (camy - 256) / 8;
winxmax = winx + 32;
winymax = winy + 30;
if (maxy > 2304)
maxy = 2304;
if (maxx > 2304)
maxx = 2304;
bigy = camy;
if (bigy < 256)
bigy = 256;
for (; bigy <= maxy; bigy += 64) {
y = (bigy - 256) / 64;
bigx = camx;
if (bigx < 256)
bigx = 256;
for (; bigx <= maxx; bigx += 64) {
x = (bigx - 256) / 64;
gqx = x;
gqy = y;
q = getquad();
x *= 4;
for (i = 0; i < 16; ++i) {
const u8 tile = quads[q][i];
qx = x + i % 4;
qy = y * 4 + i / 4;
qx *= 2;
qy *= 2;
for (t = 0; t < 4; ++t) {
const u8 val = tiles[tile][t];
tx = qx + t % 2;
ty = qy + t / 2;
if (tx < winx || tx >= winxmax)
continue;
if (ty < winy || ty >= winymax)
continue;
// tx and ty are hw-tile world coordinates,
// 0-255. Where is our window?
tx -= winx;
ty -= winy;
// Now in window space. Where's the hw window?
tx = (tx + scrollxoff) % 32;
ty = ty + scrollyoff;
while (ty >= 30) ty -= 30;
// Place the tile
vram_adr(NTADR_A(tx, ty));
vram_put(val);
}
}
}
}
// Load palette for level
pal_bg(levelpal);
pal_spr(sprpal);
pal_bright(4);
ppu_on_all();
}
- GradualGames
- Posts: 1106
- Joined: Sun Nov 09, 2008 9:18 pm
- Location: Pennsylvania, USA
- Contact:
Re: How to debug a nes crash?
In my experience, most of the time when I have a crash or glitchy screens, it's one of a few things:
-An address of something is wrong when loading chr or nametable data
-My bankswitching code has a bug or I just forgot to update a bank number in a table
-My vblank code is running too long and causing glitches by writing to the ppu outside of vblank
-Maybe I have a stack error (forgot a pla where I had a matching pha)
-Maybe I have a buffer overrun error and I'm trampling on variables outside of an array (this could easily happen in C, too, no safety net there). With local stack space in C, that could even mess up return addresses...eek! *edit* I see you are not using local stack space in your function, here, so this likely does not apply---could still apply with global arrays though. *edit* To debug this last type of bug, you can experiment with moving arrays around in RAM or changing the size of buffers to see if it affects the behavior of the bug.
*edit* I'm unfamiliar with neslib (assuming that is what you are using). Maybe you need to set a global vram address so the nmi routine resets the vram address and scroll correctly when the ppu is turned back on?
To debug, I can often reason about my code if it is organized enough, but when it is enough of a mystery I can often help narrow down the problem with git bisect (assuming I've been tracking many small changes since the beginning of the project) *edit* note, this only helps if it was previously working and has mysteriously broken. If it's your first revision, all you can do is reason about the code you've written and try to narrow down the problem some other way, possibly by trying simpler cases first. Maybe just load the palette and nothing else and verify it works, then bankswitch and see if it works, (use FCEUX ppu and memory viewer) keep introducing small bits of code until you observe the crash---to help find which code is a suspect for the source of your bug.
I'd be reluctant to blame cc65 itself...I'll let C users here comment further.
-An address of something is wrong when loading chr or nametable data
-My bankswitching code has a bug or I just forgot to update a bank number in a table
-My vblank code is running too long and causing glitches by writing to the ppu outside of vblank
-Maybe I have a stack error (forgot a pla where I had a matching pha)
-Maybe I have a buffer overrun error and I'm trampling on variables outside of an array (this could easily happen in C, too, no safety net there). With local stack space in C, that could even mess up return addresses...eek! *edit* I see you are not using local stack space in your function, here, so this likely does not apply---could still apply with global arrays though. *edit* To debug this last type of bug, you can experiment with moving arrays around in RAM or changing the size of buffers to see if it affects the behavior of the bug.
*edit* I'm unfamiliar with neslib (assuming that is what you are using). Maybe you need to set a global vram address so the nmi routine resets the vram address and scroll correctly when the ppu is turned back on?
To debug, I can often reason about my code if it is organized enough, but when it is enough of a mystery I can often help narrow down the problem with git bisect (assuming I've been tracking many small changes since the beginning of the project) *edit* note, this only helps if it was previously working and has mysteriously broken. If it's your first revision, all you can do is reason about the code you've written and try to narrow down the problem some other way, possibly by trying simpler cases first. Maybe just load the palette and nothing else and verify it works, then bankswitch and see if it works, (use FCEUX ppu and memory viewer) keep introducing small bits of code until you observe the crash---to help find which code is a suspect for the source of your bug.
I'd be reluctant to blame cc65 itself...I'll let C users here comment further.
Re: How to debug a nes crash?
From just the picture alone, it looks like it is repeatedly running the code to update graphics that would usually be in NMI.
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!
Re: How to debug a nes crash?
Thanks for the suggestions. I don't think some of them apply though, they're asm-specific, which I'm not using.
The bankswitch is not responsible. When I move it to the start of main, things crash here the same. The PPU is off in this function, and every cpu-heavy function has been tested under valgrind on the host pc.
Bisect is not useful, since this is working code, I just now added a second level. Also the Linux version of fceux does not have a memory or ppu viewer.
I've hit several valid cc65 compiler bugs, all of which I've reported and some are already fixed. It's the quality of this compiler that always leads me to suspect it first
The bankswitch is not responsible. When I move it to the start of main, things crash here the same. The PPU is off in this function, and every cpu-heavy function has been tested under valgrind on the host pc.
No need for that, I believe. Here's the part of its NMI routine dealing with scrolling:*edit* I'm unfamiliar with neslib (assuming that is what you are using). Maybe you need to set a global vram address so the nmi routine resets the vram address and scroll correctly when the ppu is turned back on?
Code: Select all
lda #0
sta PPU_ADDR
sta PPU_ADDR
lda <SCROLL_X
sta PPU_SCROLL
lda <SCROLL_Y
sta PPU_SCROLL
lda <PPU_CTRL_VAR
sta PPU_CTRL
I've hit several valid cc65 compiler bugs, all of which I've reported and some are already fixed. It's the quality of this compiler that always leads me to suspect it first
Re: How to debug a nes crash?
That's not possible. Well unless something corrupted the state it keeps for that in the ZP. Which I have no way of testingDwedit wrote:From just the picture alone, it looks like it is repeatedly running the code to update graphics that would usually be in NMI.
- GradualGames
- Posts: 1106
- Joined: Sun Nov 09, 2008 9:18 pm
- Location: Pennsylvania, USA
- Contact:
Re: How to debug a nes crash?
Fascinating...further reinforces my recent decision to abandon C permanently and simply continue to build idioms (and make good non-overengineered use of macros) in 6502. The only suggestion I made that was asm specific was a stack error (specifically one involving a manual pha .... pla)---all those other things could easily happen in C code, just in different ways.calima wrote: I've hit several valid cc65 compiler bugs, all of which I've reported and some are already fixed. It's the quality of this compiler that always leads me to suspect it first
Re: How to debug a nes crash?
Well, I don't have any nmi/vblank code. It's all Shiru's, which is well tested, and I've used it myself in several games successfully.
Re: How to debug a nes crash?
There are many ways a game can crash...
-infinite loop
-NMI within an NMI
-stack overflow
-bank switch and program calling data from the wrong bank
-corrupted C library variables
-corrupted C stack
Are you able to run an emulator with debugging features in Linux?
My first step would be to look at the RAM, see if the stack (100-1ff) is going crazy. Then I would export labels from the linker, and use that to locate the probable error function, set a breakpoint in an emulator, and go through it step by step (would help if you knew 6502 ASM).
-infinite loop
-NMI within an NMI
-stack overflow
-bank switch and program calling data from the wrong bank
-corrupted C library variables
-corrupted C stack
Are you able to run an emulator with debugging features in Linux?
My first step would be to look at the RAM, see if the stack (100-1ff) is going crazy. Then I would export labels from the linker, and use that to locate the probable error function, set a breakpoint in an emulator, and go through it step by step (would help if you knew 6502 ASM).
Last edited by dougeff on Thu Feb 04, 2016 11:22 am, edited 1 time in total.
nesdoug.com -- blog/tutorial on programming for the NES
Re: How to debug a nes crash?
AFAIK no emulator for Linux supports any debugging, though I've only tried the most popular ones. That's making this quite hard.
Re: How to debug a nes crash?
(it looks to me like...)Well, I don't have any nmi/vblank code. It's all Shiru's, which is well tested, and I've used it myself in several games successfully.
Shiru's code uses NMI code to time VRAM/Palette changes (and music) while rendering is on. With rendering off 'ppu_off()' it skips doing that in the NMI code, and lets you make VRAM changes with the main code. (but it will still go to NMI code to do music updates).
nesdoug.com -- blog/tutorial on programming for the NES
Re: How to debug a nes crash?
Unless you're using Linux on a non-x86 platform, you can do what I did: sudo apt-get install wine and then use the Windows version.calima wrote:Also the Linux version of fceux does not have a memory or ppu viewer.
Re: How to debug a nes crash?
Well, it definitely looks to me like your game is repeatedly writing the same chunk of data to the PPU over and over while rendering is on. This would also give a 'scrolling' effect, as PPU changes also effect scroll position.
There's nothing in the code you posted that would indicate why this is so, because you have ppu_off(); near the top of the function.
Is there any other code that would also happen at level change, that writes to the PPU? Is it perhaps doing it while rendering is on?
There's nothing in the code you posted that would indicate why this is so, because you have ppu_off(); near the top of the function.
Is there any other code that would also happen at level change, that writes to the PPU? Is it perhaps doing it while rendering is on?
nesdoug.com -- blog/tutorial on programming for the NES
- GradualGames
- Posts: 1106
- Joined: Sun Nov 09, 2008 9:18 pm
- Location: Pennsylvania, USA
- Contact:
Re: How to debug a nes crash?
Here's another guess---are you decompressing a level/screen etc. into RAM and then decoding that into vram writes? If so---perhaps your nes.cfg allocates too much ram space to CC65 and this decompression is actually overwriting local/cc65 stack. I recall the default nes.cfg uses 3 pages, which is kinda huge. I asked about this in Efficiency of development process using C versus 6502 and the general response (rainwarrior's response actually) was it can be much smaller, perhaps as small as 64 or even 32 if you're careful. I imagine you already checked into this, just thought I'd throw that out there.
Re: How to debug a nes crash?
Pure 64-bit, so Wine is not a possibility.tepples wrote:Unless you're using Linux on a non-x86 platform, you can do what I did: sudo apt-get install wine and then use the Windows version.
No, that is all. And it baffles me how I can run it several times over with no ill effects, but run after a second of gameplay, that function crashes.dougeff wrote:Is there any other code that would also happen at level change, that writes to the PPU? Is it perhaps doing it while rendering is on?
It's very curious how the full chain works the first time. The outer loop literally goes
Code: Select all
while (1) {
loadlevel(foo);
gameplay();
}
Why would the music stop in that case? It would only be possible if there was an eternal loop in the NMI itself, so it never reached the music code.Well, it definitely looks to me like your game is repeatedly writing the same chunk of data to the PPU over and over while rendering is on. This would also give a 'scrolling' effect, as PPU changes also effect scroll position.
The decompression goes into a statically allocated BSS buffer. There's almost 200 bytes free for cc65's stack, and no function uses more than a couple variables local stack. No deep call chains either. If it's stack corruption, it's not because of my code.GradualGames wrote:Here's another guess---are you decompressing a level/screen etc. into RAM and then decoding that into vram writes? If so---perhaps your nes.cfg allocates too much ram space to CC65 and this decompression is actually overwriting local/cc65 stack. I recall the default nes.cfg uses 3 pages, which is kinda huge. I asked about this in Efficiency of development process using C versus 6502 and the general response (rainwarrior's response actually) was it can be much smaller, perhaps as small as 64 or even 32 if you're careful. I imagine you already checked into this, just thought I'd throw that out there.