Got any tips for Early NES Emulator Development?

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Got any tips for Early NES Emulator Development?

Post by MottZilla » Tue Mar 11, 2008 5:01 pm

Along time ago (something like 8 years ago) I thought it would be really cool to write an emulator. Ofcourse I didn't know anything back then, but now I actually have a decent understanding of both the NES and programming.

So I had made two previous attempts at writing a NES emulator which never went anywhere, which was before I'd ever written any ASM for the NES. But now I'm making another attempt. I understand the concept I believe and I've got a CPU interpreter partially written. By partially I mean not every opcode is there yet, and the ones that are there I can't be 100% sure they are all correct.

But everything I've done so far was enough that I see some life. In my little "Pong" demo I wrote ages ago for the NES, it shows the name tables being written to with some hacky graphics emulation. I also can see the menu of that NESTEST rom. But I have some confusion as to what could be causing one issue.

I started the CPU core pretty simple, made it reset to the vector and "fetch" an Opcode. It then would either do it if I supported it or tell me what it was if I didn't. So I used Donkey Kong (JU) as my test ROM. I would repeatedly load the game and let it run until it hit an opcode I hadn't supported yet, and then go to add suppotrt for that and kept doing this. I would "try" to verify the opcodes were working the way they should be comparing with other emulators.

I eventually got the thing to just loop forever as it wouldn't hit any unsupported opcodes. I figured out (I think so atleast) that the game was just waiting around now for NMI, or maybe for a nes register to return a value. After adding NMI I got to a bunch more opcodes. I verified the addresses it was executing are executed addresses and not errors by tracing in a completed emulator (FCEU). Eventually, I was back to the same thing, the game would run and never hit any unsuppotred opcodes.

I went to implement the "graphics" emulation if you want to call it that, but it's really just a function to draw what values in the NameTable 0 are on screen. I could see Donkey Kong and Donkey Kong Jr. both cleared the name table to all #$24, the "Blank" tile for the games. I also could see the nestest and my pong demo. This is the point I'm at now.

My problem is that Donkey Kong and DK Jr never write anything else to the screen. And the program counter is looping in the same range. I'm going to investigate what its looping on to try to figure out what to do next. Update: It wasn't looping like I had thought at one point. For all I know the game is running, though I'm not sure why I cannot see anything plotted on the nametable other than the blank tiles.



Now the point of this topic, what advice do you have for writing a NES emulator? Keep in mind I specifically mean advice for someone that is new to writing a NES emulator. Such as what games or NES programs do you suggest would be good targets to get working first? I have read on here before that many make the mistake of thinking getting Super Mario Brothers running is the easiest. I assume a game like Donkey Kong would be easier, no scrolling, no sprite 0 hit, no rom banking, etc.

So anyone got any tips?

User avatar
Disch
Posts: 1849
Joined: Wed Nov 10, 2004 6:47 pm

Post by Disch » Tue Mar 11, 2008 8:50 pm

1) write a tracer (if you haven't already). Bugs and timing issues are next to impossible to solve without one.

2) get and pass as many CPU test ROMs as you can. nestest.nes is paticularly nice because it doesn't require a PPU to be emulated in order to run it.

3) This is in the same vein as #2 -- but focus on the CPU first. CPU bugs will usually be the biggest reason for emu muckups. PPU bugs will usually cause graphical glitches, APU bugs will cause sound glitches, but CPU bugs can and will cause every possible kind of glitch making them very hard to diagnose.

4) Pointers are your friend. NT mirroring, CHR/PRG swapping, and tons of other things can be handled painlessly and easily with proper use of pointers. Bulk memory copying is slow and cumbersome.

that's about all I can think of.

User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla » Wed Mar 12, 2008 12:58 am

Well I'll be doing what you suggest, and adding a tracer of some kind so I can better understand what is going on. I agree that it's virtually impossible to understand what is going on without something more than a few values displayed on screen.

User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla » Wed Mar 12, 2008 6:55 pm

And you were right Disch. After writing a disasembling/tracer output and reading the log, I found out two bugs that were preventing the title screen from appearing. For one, my NMI was not executing correctly. It was going to NMI Vector + 1, but the odd thing is I have no idea why. I ended up fixing it by deciding that if an NMI is triggered to do everything and then return from the function and not to proceed through the CPU core.

The other bug, I forgot to update the flags on opcode B1, which meant it never took a branch and never loaded the title. Now I can read with my text output of the nametable "DONKEY KONG", with marked tiles where all the text and such would be. So ironing out my CPU certainly will be top priority.

User avatar
kyuusaku
Posts: 1665
Joined: Mon Sep 27, 2004 2:13 pm

Post by kyuusaku » Wed Mar 12, 2008 7:49 pm

Keep everything modular, flexible, and try to emulate at the lowest level you can manage; it will pay off later on in accuracy and probably speed after optimization.

User avatar
Disch
Posts: 1849
Joined: Wed Nov 10, 2004 6:47 pm

Post by Disch » Wed Mar 12, 2008 8:12 pm

Some more advice:

Don't copy/paste or have multiple copies of the same code. You shouldn't be having a problem with a single opcode not setting flags, because that opcode probably should be sharing code with other similar opcodes.

That is to say... if you have LDA immediate, and ADC absolute, LDA absolute should not require any additional code other than an additional 'case' statement to tie together your 'LDA' code and your 'absolute' code. This way if you have a problem with LDA it will become visible much sooner and be much more obvious that you have a bug (meaning you can correct it sooner -- hidden bugs that you have to coax out are less preferable than bugs that stand up and shout at you)

Plus if you find and fix a bug or make a technical correction -- if all opcodes share the same code you only have to make the change once, rather than updating a dozen opcodes. Like if you found out you were wrapping Indirect,Y incorrectly -- it's much easier to just change your Indirect,Y code than it would be to update and fix every single opcode that uses Indirect,Y.

This kind of thing can be accomplished pretty well by function inlining. Here's a snippit of my ADC absolute code to give the idea:

Code: Select all

void NES_INLINE AdRdAb(NESCPU& cpu)				// Absolute
{
	cpu.adr = Rd(cpu.PC); ++cpu.PC;				// cycle 1
	cpu.adr |= (Rd(cpu.PC) << 8); ++cpu.PC;		// cycle 2
	cpu.val = Rd(cpu.adr);						// cycle 3
}

...

void NES_INLINE _ADC(NESCPU& cpu)
{
	register u16 tmp = cpu.A + cpu.val + (cpu.fC != 0);
	cpu.fC = (tmp >= 0x0100);
	cpu.fV = (tmp ^ cpu.A) & (tmp ^ cpu.val) & 0x80;
	cpu.fN = cpu.fZ = cpu.A = (u8)tmp;
}

...


		// the ops
		op = Read(mCPU->PC);
		++mCPU->PC;

		switch(op)
		{
...
			/* ADC	*/
		case 0x69:	AdRdIm(*mCPU);	_ADC(*mCPU);		break;
		case 0x65:	AdRdZp(*mCPU);	_ADC(*mCPU);		break;
		case 0x75:	AdRdZx(*mCPU);	_ADC(*mCPU);		break;
		case 0x6D:	AdRdAb(*mCPU);	_ADC(*mCPU);		break;
		case 0x7D:	AdRdAx(*mCPU);	_ADC(*mCPU);		break;
		case 0x79:	AdRdAy(*mCPU);	_ADC(*mCPU);		break;
		case 0x61:	AdRdIx(*mCPU);	_ADC(*mCPU);		break;
		case 0x71:	AdRdIy(*mCPU);	_ADC(*mCPU);		break;
...


A trick for cycle tallying:

With the exception of stack opcodes like PHA, RTI, etc, and for one of two oddball instructions (JMP) the addressing mode directly dictates the number of cycle any given instruction uses. That is, Absolute always uses 4 cycles... Zero Page always uses 3... Indirect,Y always uses 5+1 cycles, etc.

Note this is true for read-only instructions like LDA, ADC, CMP, and for write instructions like STA, STX, STY. Read/modify/write instructions like INC, ASL use different numbers, but are still just as predictable: Absolute always uses 6 cycles, Zero Page always uses 5, etc.

I have three sets of addressing mode fuctions. One for read-only ops, one for write ops, and one for read/modify/write ops. You can then use these functions to tally your CPU cycles rather than building and using a lookup table.


Of course this method isn't really better or worse than using a lookup table. I just found that building the lookup table is time consuming and dull.. and it's very easy to make an ever-so-subtle mistake that will be really hard to find.

(In case you were wondering I don't tally cycles this way in my above code -- I actually just tally a cycle in my Rd() and Wr() functions -- and perform all the dummy reads and writes performed by instructions. This works because the CPU either reads or writes a byte for every cycle an instruction takes)

User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla » Wed Mar 12, 2008 10:10 pm

I've gotten further with the CPU test ROM though I am getting some error codes. I'm planning on redoing alot of the CPU code as you suggested because as I was writing it, I found that I could probably make things alot neater and reuse the same code rather than copying and pasting it a million times. :p

As far as accuracy goes, I don't need or intend for it to be perfect or close to that. I think that'd be a prett big goal for a first try, and also when I'm not really looking to try to somehow top the great emulators others have created.

What I've been doing is not like you've suggested where CPU instructions are broken down into individual cycles. I've got it setup so it executes each instruction and increases the clock counter. And right now my biggest concern is a working CPU core anyway. So it doesn't have to be perfect, just has to work. And I'm thinking doing that will require or atleast go better if I change things so that I take advantage of Opcodes & Addressing modes rather than to do each opcode number individually.

I'll probably have some questions about the CPU too. The Test rom has certainly raised some strange issues. Like saying something about overflow and carry and not being affected by INX and DEX. But in the 6502.txt document, it says those flags are unaffected anyway.

User avatar
Disch
Posts: 1849
Joined: Wed Nov 10, 2004 6:47 pm

Post by Disch » Thu Mar 13, 2008 8:52 am

MottZilla wrote:As far as accuracy goes, I don't need or intend for it to be perfect or close to that. I think that'd be a prett big goal for a first try, and also when I'm not really looking to try to somehow top the great emulators others have created.
I wouldn't recommend you try for a high level of accuracy on your first go, either. I've rewritten my emu a few times now and each time I've made changes based on what I learned from my past attempts.

There's no way you'll be able to plan for and work around every issue that comes up your first time out of the gate -- so I wouldn't worry too much about it. But that being said... I would say try to get things as accurate as you feel comfortable with. Details which might seem relatively insignificant can sometimes cause some games to go horribly wrong. But at the same time, it's not worth killing yourself over every little detail until you have a better grasp on things. You'll have to find some middle ground that you're happy with.

In short: keep doing what you're doing =P. You may get the urge to go back and rewrite later... but when you do you'll be amazed at how much easier it is.

I've got it setup so it executes each instruction and increases the clock counter.
That's totally fine.
I'll probably have some questions about the CPU too. The Test rom has certainly raised some strange issues. Like saying something about overflow and carry and not being affected by INX and DEX. But in the 6502.txt document, it says those flags are unaffected anyway.
If the test ROM is yelling at you for that, it may be because you're changing C or V on INX/DEX when you shouldn't be (INX/DEX only change N and Z... other flags should not be changed from their previous state)

Also -- for a quick reference, I would recommend obelisk over 6502.txt:

http://www.obelisk.demon.co.uk/6502/reference.html

6502.txt does a better job at giving details of what each instruction does... but when it comes to opcode listings and other reference stuff it has a few typos which can be a real pain.

Still use 6502.txt if it's easier for you, but cross-check the info with that obelisk page to make sure you're not assigning an instruction to the wrong opcode or something like that.

User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla » Thu Mar 13, 2008 12:15 pm

Well the thing about INX and DEX, as far as I could tell I only was chagning N and Z. I didn't touch C or V. But I'm rewriting all the instructions now to use address mode functions which are then paired with instruction functions ex: LDA(AM_Immediate()). I don't think it's as fast as it could be in execution but I'm more concerned with readability of the code and maintaining it.

From what you say it sounds like I'm on the right track. And I'll start using the obelisk page to cross reference as before I relied solely on 6502.txt which has some typos that were probably generated by paper running through a machine to translate it into a text file. Things like capital Ds being 0s.

So far as I've been changing over from the code in the switch() case to functions I haven't broken anything. This approach seems much better than the initial one. You definitely learn as you go what works and what doesn't or what is better.


Update: As I've gone converting my core into functions, I've fixed a number of instructions that were not working properly. As a result, Donkey Kong and Donkey Kong Jr. (my test games) now proceed past the title screen and I actually see the name table is loaded with their first levels. I'm still converting and verifying the instructions I've done and after they are all converted I believe I have a few more to add. I'm stil using the cpu test rom and I still get errors on certain things but I'm sure I'll iron that all out eventually.

I do have a question though. The CPU test error codes often have entrys that don't say much more than the instruction that failed. Like I think the last ADC Immediate fails on me for some reason it doesn't tell me what though.

User avatar
Disch
Posts: 1849
Joined: Wed Nov 10, 2004 6:47 pm

Post by Disch » Thu Mar 13, 2008 8:22 pm

yeah nestest doesn't really get into specifics. All I can say is double check your ADC code and make sure you're setting flags properly.

V, in paticular, tends to give people the most trouble. For ADC, V is set when:

positive + positive = negative
or
negative + negative = positive

and is cleared on all other cases. Another way to think of how V works is to look at the signed number range... V does for signed numbers what C does for unsigned:

unsigned range = 0 to 255 ($00-$FF)
signed range = -128 to 127 ($80-$7F)

Just as C is set when the addition produces a number higher than 255 ($FF) -- V is set when the addition produces a number higher than 127 ($7F)

User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla » Thu Mar 13, 2008 8:59 pm

Can you spot anything wrong with this code?

Code: Select all

fixed
When I replaced either one of the ADCs or the SBCs with the function call instead of the garbage I had in the case, Donkey Kong no longer shows the correct name table setup for the title and the level. I'm not sure why it broke.
Last edited by MottZilla on Thu Mar 13, 2008 9:50 pm, edited 2 times in total.

User avatar
Dwedit
Posts: 4412
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit » Thu Mar 13, 2008 9:14 pm

Code: Select all

   if( (CPU_A + Value + Carry ) < CPU_A)   // Check if Carry will Result 
WTF. How is that supposed to work?
Shouldn't this be something like this?

Code: Select all

if (CPU_A + value + carry >= 256)
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla » Thu Mar 13, 2008 9:19 pm

I believe the idea was that, if you have an 8bit value and you are adding a number to it, if the number you end up with is less than wht you started with, then you wrapped around. However when I went back to my original code which does it with an int and a >0xff, it is working again. I guess I was thinking by adding a bunch of 8bit values together it would wrap and never be greater than 0xff. I dunno. I'm tired I guess. :p Thanks for pointing that out. Anyway, I took my newer code and fixed that fuckup. Everything is fine now.

User avatar
hap
Posts: 355
Joined: Thu Mar 24, 2005 3:17 pm
Contact:

Post by hap » Fri Mar 14, 2008 6:38 am

It's ok, I made that thinking-error too once ;p. It would only work if you stored it into an 8 bit variable before the "if". Still though, it won't work that way, since if A=$FF, value=$FF, carry is set, result would be the same as A, but cause a carry anyway.

User avatar
MottZilla
Posts: 2835
Joined: Wed Dec 06, 2006 8:18 pm

Post by MottZilla » Fri Mar 14, 2008 4:00 pm

I believe I have all the opcodes implemented now. However I'm getting error codes with nestest. Most if not all, refer to the final SBC error code in their respective blocks. Does anyone know what that means? This is my ADC function and SBC function, if there's some error I've overlooked let me know.

Code: Select all

void ADC(unsigned char Value)
{
	unsigned char Carry=CPU_P&0x01;
	// Check for Carry
	if( (CPU_A + Value + Carry ) > 0xFF)	// Check if Carry will Result
	{
		CPU_SETC=1;
	}
	else
	{
		CPU_SETC=0;
	}

	// Check for Zero
	if( (CPU_A + Value + Carry)==0 )		// Check if Zero will Result, Set Flag accordingly.
	{
		SetZero();
	}
	else
	{
		ClearZero();
	}

	// Check for Overflow
	CPU_TEMP=CPU_A + Value + Carry;
	CPU_SETV=0;
	if(!((CPU_A ^ Value)&0x80) && !((CPU_A ^ CPU_TEMP)&0x80))
		CPU_SETV=1;
	if(CPU_SETV)
	{
		SetOverflow();
	}
	else
	{
		ClearOverflow();
	}

	// Do ADC Operation
	CPU_A = CPU_A + Value + Carry;

	if(CPU_SETC==1)
	{
		SetCarry();
	}
	else
	{
		ClearCarry();
	}

}

void SBC(unsigned char Value)
{
	ADC(Value ^ 0xFF);
}
Value is returned by the appropriate address mode function.

Post Reply