Assembly Question

Discuss emulation of the Nintendo Entertainment System and Famicom.

Moderator: Moderators

Post Reply
User avatar
laughy
Posts: 41
Joined: Wed Nov 17, 2004 12:34 pm
Contact:

Assembly Question

Post by laughy »

Hey peeps, a quick assembly question (x86!!)

According to x86 docs a read/write to memory takes 1 cycle (given that the thing being read is in cache)

My question is, if I have to save a register on a stack, would it be faster to to use a temporary int memory location all over my program for this purpose?

a push takes 1 cycle
a pop takes like 4 cycles

but a read/write to memory location takes 1+1 = 2

So over my code would it be faster to go:

mov TEMP, ecx;
call runPerfectNesEmulator;
mov ecx, TEMP;

vs

push ecx;
call runPerfectNesEmulator;
pop ecx;

The math adds up? :D
doynax
Posts: 162
Joined: Mon Nov 22, 2004 3:24 pm
Location: Sweden
Contact:

Re: Assembly Question

Post by doynax »

Unlike 6502's you just can't do precise cycle counting on never CPUs. A guesstimate would be that most normal instructions takes about half a cycle while a complete cache miss might take a 100 cycles, and a page-fault can cost you billions of cycles.

It's virtually guaranteed that the stack is cached (unless you use loads of local data) while a single random memory location is probably not, or worse, spreading out your data might waste a whole cache line for it. A push/pop is slightly more complex than a simple move however they are also heavily optimized by processor manufacturers, on ancient hardware they caused stalls for other instructrions accessing the stack pointer.

It's probably a good idea to reserve some stack space among the local variables to hold the value, that way you can still use a move and you won't have to modify the stack pointer in the middle of a function.

The real advantage of using the stack (at least to lazy programmers like me) is that you don't have to worry about race conditions among the writer, this is what often makes it a pain to reuse zeropage registers on the 6502.
Nessie
Posts: 133
Joined: Mon Sep 20, 2004 11:13 am
Location: Sweden
Contact:

Post by Nessie »

As doynax says, clock cycles can't be counted in the same way on modern CPUs. I would use push/pop for simplicity's sake, but I guess it's a matter of taste.
Either way, if your code really has that call between the push/pop, you shouldn't worry about a clock cycle being lost or gained when using the stack. :)
Post Reply