It is currently Thu Jul 19, 2018 7:00 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 101 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next
Author Message
 Post subject: 6502 ASM trick
PostPosted: Sun Nov 11, 2007 12:08 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10605
Location: Rio de Janeiro - Brazil
Has it occurred to anyone that it might be useful to have a page of ROM filled with values 0 through 255, so that you can perform operations between the accumulator and the index registers?

Instructions like ADC, SBC, AND, ORA and EOR all have "Absolute, X" and "Absolute, Y" modes, so if you point to the table with the values and use one of the index registers, the value fetched will be the same as the one in the index register, making it seem like the operation used the values of both registers as operands.

I guess I have used this trick before, but just for a small subset of the numbers, but now that I think of it, having the full table seems very useful, specially for avoiding temporary variables.

I don't know exactly why I started this topic, but I'm sure we can discuss other useful 6502 ASM tricks. I also like very much the one where you push an address (minus 1) to the stack and then use the RTS instruction to jump to that address. This can be useful for implementing jump tables, and I'm using this a lot in my game.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Nov 11, 2007 1:34 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Great idea! It makes up for the 65xx's general lack of operations that combine A and X directly. Here's an asm summary in case not everyone gets it:
Code:
; Original code
stx temp
eor temp ; A = A EOR X

; New solution
eor table,x ; A = A EOR X

table:
.byte $00,$01,$02...$0F
.byte $10,$11,$12...$1F
...
.byte $F0,$F1,$F2...$FF

On the other hand, this only saves 2 clocks and 1 byte, so it'd have to be used more than 256 times or in a time-critical area to pay off.

EDIT: corrected clock count and major inefficiency in original code (tax?!? thanks tepples)


Last edited by blargg on Mon Nov 12, 2007 4:58 pm, edited 3 times in total.

Top
 Profile  
 
 Post subject: Re: 6502 ASM trick
PostPosted: Sun Nov 11, 2007 2:11 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20265
Location: NE Indiana, USA (NTSC)
tokumaru wrote:
Has it occurred to anyone that it might be useful to have a page of ROM filled with values 0 through 255, so that you can perform operations between the accumulator and the index registers?

If you have the ROM space for such a table, and it's aligned, it saves a byte and two cycles over the temporary variable way:
Code:
A:
  stx $FF  ; 2b 3c
  ora $FF  ; 2b 3c
B:
  ora table,x  ; 3b 4c

EDIT: thanks tokumaru

Quote:
I also like very much the one where you push an address (minus 1) to the stack and then use the RTS instruction to jump to that address. This can be useful for implementing jump tables, and I'm using this a lot in my game.

It saves about four bytes off the temporary variable way to implement jump tables but is one cycle slower:
Code:
A:
  lda hi
  pha  ; 1b 3c
  lda lo
  pha  ; 1b 3c
  rts  ; 1b 6c
B:
  lda hi
  sta $01  ; 2b 3c
  lda lo
  sta $00  ; 2b 3c
  jmp ($0000)  ; 3b 5c


Last edited by tepples on Sun Nov 11, 2007 6:06 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject:
PostPosted: Sun Nov 11, 2007 3:08 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10605
Location: Rio de Janeiro - Brazil
You guys are right, the savings are not that incredible. But I always feel bad about using temp variables (because the code looks messy), and it's very hard not to do so with the 6502, that has very few work registers. I liked the illusion that it's possible to have these few operations between the accumulator and index registers.

256 bytes out of a whole game ROM is not such a high price to pay for cleaner code and slightly more speed. And this can be used for other things too, such as mapper writes on boards with bus conflicts.

And tepples, as far as I know, "ora table,x" takes 4 cycles to execute, not 5, as long as the table is perfectly aligned to a memory page. Or am I wrong?

About the jump tables, yeah, it depends if you're aiming at speed or size.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Nov 11, 2007 6:02 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Quote:
But I always feel bad about using temp variables (because the code looks messy), and it's very hard not to do so with the 6502, that has very few work registers.

Change your idea of messy. The main problem with zero page variables is when two routines try to use one at once. The best way to avoid this is to have temp variables that aren't used across subroutine calls, and aren't used by more than one thread at once (like main code and interrupt handler). But the 6502 has a ton of extended registers: 256 of them. That's why X and Y can't be used directly by arithmetic, only for indexing and counting. Embrace zero page!

Try coding for the Z80/8085/GB-Z80 for a while and you'll appreciate the elegance of the 65xx. Sure, you can do lots of register to register operations, but everything has a layer of bloat on it.


Top
 Profile  
 
 Post subject:
PostPosted: Sun Nov 11, 2007 10:17 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10605
Location: Rio de Janeiro - Brazil
blargg wrote:
The best way to avoid this is to have temp variables that aren't used across subroutine calls, and aren't used by more than one thread at once (like main code and interrupt handler).

OK, but how do you do that and still keep things looking nice? Saying that a byte can only be used by one subroutine is a waste of space, as that byte will probably be unused most of the time. And reusing bytes is not easy when you have many nested subroutines. For routines that need a few bytes of work RAM, you can have a few groups of bytes, each dedicated to a different depth, but then you can't go very deep. And recursion is out of the question. What do you guys do about this?

Quote:
But the 6502 has a ton of extended registers: 256 of them.

Fair enough. I've heard the argument that the 6502 has 256 bytes worth of registers, and I guess this is mostly right.

Quote:
Try coding for the Z80/8085/GB-Z80 for a while and you'll appreciate the elegance of the 65xx. Sure, you can do lots of register to register operations, but everything has a layer of bloat on it.

I've done very little Z80 work, but enjoyed the fact that I could perform some fairly complicated work without having to touch a byte of RAM. And those shadow registers... that feature has to be useful! I know that instructions take more CPU cycles than on a 6502 though, probably even more than equivalent 6502 code using zero page RAM.

By the way, this is a very interesting topic about tricks on the 6502.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Nov 12, 2007 10:57 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7443
Location: Chexbres, VD, Switzerland
I must say I absolutely love the 6502 way to do thing, for me it largely beats PIC, 8080 and Atmel so far, baybe some other CPUs/MUCs I haven't tried yet.

I have never trought of having such a table of constants, I guess it's only for use if you're short of temp variables and/or if speed is very important. I think wasting 256 bytes is significant on the NES. (unless you have maybe more than 256 KB of PRGROM). This thing could go if you know the number is small enough (something like 0-16) and that a such table is needed anyway (on a cart with bus conflicts). I have encountered a few temporary variable problems so far, I did a whole game engine with only 4 "Temp" variables, and 4 "NMITemp" variables (used in and outside NMI code separately, to avoid pushing the Temp variables or a stupid time-wasting thing like this in the NMI handler). I have encountered problem when I wrote a routine that for example uses Temp1 and Temp2, which calls a routine that uses Temp3 and Temp4, and that itself calls a routine which also uses Temp2 (and exept it to be fully available), this is a real pain to debug. Pushing Temp2 before calling the second routine is the way to go (or do it another way). Eventually it's better to give explicit names to variables. The best way could be to have an assembler which can undefine zero page variables to re-use them, so that the same adress can be used by two pieces of code if the programmer safely says they will never call eachother and that the routine does not expect a particular value to be in when called.

I also never trought of the push-push-rts way to do indirect jump, I always use the jmp []. The main problem is that the rts adress is not the real adress, and I never know how many it should be added/removed to work. However, it becomes interesting if you use this a lot, as saves a lot time four bytes may become significant. Plus the code looks more messy (this can also add to the geek factor in the other side).


Top
 Profile  
 
 Post subject:
PostPosted: Mon Nov 12, 2007 5:16 pm 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
tokumaru wrote:
OK, but how do you do that and still keep things looking nice? Saying that a byte can only be used by one subroutine is a waste of space, as that byte will probably be unused most of the time. And reusing bytes is not easy when you have many nested subroutines.

Note my limitation of "that aren't used across subroutine calls". This rules out using them for loop counters, for example (if the loop makes a subroutine call). I admit setting up local variables on the stack is cumbersome and inefficient.

Quote:
I've done very little Z80 work, but enjoyed the fact that I could perform some fairly complicated work without having to touch a byte of RAM.

What's so bad about touching RAM? You're constantly reading it anyway for the opcodes.

Quote:
And those shadow registers... that feature has to be useful!

I think it's mainly to allow extremely quick interrupt response, where the handler just exchanges registers then processes the interrupt. It doesn't have to save the previous values, and it can keep values in the shadow registers across interrupt handlings. For normal code, it doesn't seem very useful because it swaps so much.

Quote:
I know that instructions take more CPU cycles than on a 6502 though, probably even more than equivalent 6502 code using zero page RAM.

That's one problem, always paying for those extras even when the 6502's lean register set would be sufficient. My main gripe is the inconsistencies that you constantly run into. I actually like the SPC-700 sound processor in the SNES a bit better than the 6502. It's like a 6502 with fewer limitations on X and Y, and many instructions to really treat direct (zero) page variables as first-class registers. Most arithmetic and move instructions can use a direct page variable just as easily as A.

Bregalad wrote:
The main problem is that the rts adress is not the real adress, and I never know how many it should be added/removed to work.

Use RTI then:
Code:
lda #>addr
pha
lda #<addr
pha
php   ; RTI will restore status, so push it now
rti


Top
 Profile  
 
 Post subject:
PostPosted: Mon Nov 12, 2007 9:17 pm 
Offline
User avatar

Joined: Sun Jun 05, 2005 2:04 pm
Posts: 2149
Location: Minneapolis, Minnesota, United States
That's actually a really clever idea about the table thing. I never really thought about it. One thing that I do that I'm very glad I thought about is my NMI routine can do anything whenever it wants:

Code:
nmi:
   jmp ($00)
   jmp ($02)
   jmp ($04)
   jmp ($06)
   jmp ($08)
   jmp ($0A)
   jmp ($0C)
   jmp ($0E)
   jmp ($10)
   jmp ($12)
   jmp ($14)
   jmp ($16)
   jmp ($18)
   jmp ($1A)
   jmp ($1C)
   jmp ($1E)
   lda #$00
   sta $20
   rti
Return:
   inc $20
   inc $20
   inc $20
   jmp ($20)


There may be a slight delay at the end of every routine, but I think it's worth it. There's nothing more I hate than doing a bunch of comparisons to have the NMI figure out what to do and when. Bytes 0-$21 are used up in Zero Page, and $0-$1F start out containing the High and Low parts of the "Return" address. $20/$21 contain the High/Low parts of wherever you are in the NMI routine. I personally am very very happy with it. I almost considered it a trick, or cheating when I first thought about it.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Nov 12, 2007 10:31 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10605
Location: Rio de Janeiro - Brazil
My solution to have a bunch of different NMI routines is this:
Code:
NMI:
   jmp (NMIAddress)

That is all there is to the actual NMI routine indicated by the vector at the end of the ROM. The label "NMIAddress" points to a zero page location, and depending on where in the game we are, that location points to one of many different NMI routines:
Code:
NMITitleScreen:
   (...)
   rti

NMIPlayerSelect:
   (...)
   rti

NMIMainGame:
   (...)
   rti

NMIEndingSequence:
   (...)
   rti

I'm defining lots of "program modes" for my game, where each section (title screen, player select, title card screen, main game, bonus stage, etc) is represented by a program mode, that when initialized sets the address of the NMI routine it uses. All modes have triggers that enable the transition to other modes.

This way does not waste RAM (only 2 bytes are used to hold the address of the current NMI routine), and there is no speed penalty besides the time taken by the JMP instruction.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 13, 2007 12:18 am 
Offline
User avatar

Joined: Mon Sep 27, 2004 8:33 am
Posts: 3715
Location: Central Texas, USA
Why not just have the main code wait in a loop until NMI fires and sets a global flag? Then you don't have to worry about taking too long before the next NMI and having it interrupt a previous invocation. Or maybe you are saying you do this, you just also have a settable NMI routine that does things that must be done every frame, even if that frame appears the same as the previous.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 13, 2007 9:17 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10605
Location: Rio de Janeiro - Brazil
blargg wrote:
Why not just have the main code wait in a loop until NMI fires and sets a global flag?

I have other projects that require constant calculation in order to avoid severe slowdown. Waiting for the NMI would be a waste of time when you could already be preparing data for the next frame. Not in my current project though.

Quote:
Or maybe you are saying you do this, you just also have a settable NMI routine that does things that must be done every frame, even if that frame appears the same as the previous.

That is certainly true for the music routine, for example. And since I also enable rendering late in the frame, I need NMI's to always use the same ammount of time. I can't ignore a single VBlank, or else the screen will jump.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 13, 2007 9:38 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7443
Location: Chexbres, VD, Switzerland
Blargg, the method you describe is close to the one used in Final Fantasy, where the NMI just returns doing absolutely nothing. The game is free to call an NMI when it want without problems. The only true problem is that's it's possible to completely miss an NMI.
Quote:
That is certainly true for the music routine, for example. And since I also enable rendering late in the frame, I need NMI's to always use the same ammount of time. I can't ignore a single VBlank, or else the screen will jump.
In theory, Final Fantasy's music would also lag if the game does, but it does never lag anyways. You can however hear this in Final Fantasy II when you change rooms, the music don't change (like in Final Fantasy) and the music seriously lags (the game also seems to silent all channels for some reason, so the music will stop and restart a bit late on the next note), this also applies when entering/exiting menu.
I also remember Zelda and SMB happens to lag, with the music too. This looks extremely bad.

Final Fantasy III does exactly what Tokumary says, it has a "variable NMI vector", wich is slightly better, instead of wasting a jmp [xxx] instruction, the NMI vector directly points to RAM where a jmp instruction is stored (takes less time). This instruction is also ocasionally changed to a rti to completely ignore NMIs.

Celuis : I don't undersand anything to the method you described to us. It looks interesting however. Could you try to clarify it a little please ?

By the way I personally went the way of defining NMI the standard way (in ROM), and have it do the main graphics update and sound. That way, the music never lags, and even if the game lags, the NMI will still update the screen as fast as it can. It's even possible on the main program to synchronise on the graphic update flag (instead of the NMI flag) so if you want to update lots of graphics at once it takes more than a frame and causes no problem.
This just sounds sort of logical, and as long as different parts of the game use the same format of graphic buffers, the same NMI handler can be used for the whole game. That would be unoptimal for really big games, I think. (games with lots of unrelated minigames or such, which all have independant use of the screen, or a RPG where battle/field/menus, etc could be separated because they manage their screen completely differently in each case).


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 13, 2007 10:10 am 
Offline
User avatar

Joined: Sun Jun 05, 2005 2:04 pm
Posts: 2149
Location: Minneapolis, Minnesota, United States
Bregalad wrote:

Celuis : I don't undersand anything to the method you described to us. It looks interesting however. Could you try to clarify it a little please ?



Oh, sorry about that. Let's pretend that there was an Indirect Absolute JSR instruction:

Code:
nmi:
 jsr ($00)
 jsr ($02)
 jsr ($04)
 ...
 jsr ($1E)
 rti


That's pretty much what my routine would do. It has the Low/High parts of the addresses you want to go to in RAM. So if I want to do a screen updating routine at the beggining of the NMI, somewhere in the code, I'll store the Low/High parts of the address that points to wherever the screen updating routine is in $00 and $01, because at the beggining of the NMI, you see that it jumps to whatever address is stored in $00/$01. After the routine is done, we return to the NMI routine, and it goes onto the next address, which is stored at $02/$03. So if Indirect Absolute JSR was possible, it would look like this:

Code:
nmi:
 jsr($00)    ;The Low/High parts the lable "ScreenUpdate" are stored in $0/$01
 jsr($02)    ;The Low/High parts of the lable "Control" are stored in $02/$03.
 ...
 jsr ($1E)   ;The Low/High parts of the lable "Nothing" are stored in $04/$05.
 rti

ScreenUpdate:
  ....          ;We come here at the beggining of the NMI routine.
 rts

Control:
 ....           ;We come here after ScreenUpdate
 rts

Nothing:
 rts          ;This is a blank routine we come to if we have nothing to do


That's basically what my routine does, except there's no such JSR($xx), so I have to use JMP($xx). Instead of doing RTS at the end of every routine I go to, I jump to a lable called "Return". At the Return lable, I tell it to jump back into the NMI routine, but instead of jumping back to where it was before, it'll jump to that + 3 bytes, so it will move on to the next JMP($xx) instruction. If I'm not using all 16 addresses, I make the ones that I'm not using to point to the address of the Return lable. It's just like what I showed above with the JSR($xx), except I just manipulated jump instructions to have the same effect. If I still didn't explain that very well, I can try again if you want... I'm not very good at explaining things.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 13, 2007 1:04 pm 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7443
Location: Chexbres, VD, Switzerland
Oh, your idea looks quite good ! Maybe a little TOO flexible, but this can come in usefull.
I guess there is plenty way to improve it, have the NMI point in a ROM adress wich does jmp($00), then the code at $00 would automatically do jmp($02) when it's done, etc... The problem is that the order cannot be nested, and I guess you don't want to have this limitation. You can also have a big jsr xxxx jsr xxxx jsr xxxx table in RAM, have the NMI point to it, and just change the adress as you wish. You can replace the jsr by a cmp or something to skip the routine without wasting time, you can change the first jsr by rti to completely ignore the NMI, or you can just replace the adress after the last jsr by a rti, making a variable-lenght NMI routine (but still have a maximum of course).
You'd still want to push the registers on the stack before the first jsr.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 101 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group