8x16 and whatever else unreg wants to know

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

Thank you Kasumi for helping me so much! :D I did what you said
Kasumi wrote: If everything moved up as I'm understanding, the data is 100% wrong (because there is no value that is in the right place.). Run trace logger, break on writes to find out how the first thing ends up wrong. Fix it.
I ran trace logger and figured out that if I added this code

Code: Select all

0C662 A5 2F                       lda visible_left
0C664 38                          sec
0C665 E9 10                       sbc #$10
0C667 85 2F                       sta visible_left
right after the sixteen bit division of visible_left into CameraX+0 inside of my method scroll_screen... it draws the entire nametable columns 32-47 at the correct regular height.!!! Now I have three four new goals:

1.) Decrease grafiti... something is kind of wrong... there's a little bit more grafiti now after my subtraction of #$10 from visible_left. It is run every vblank... it's not the best fix right?... there must be some other way... it is the fix that my brain chose cause visible_left was sixteen higher and that's why it started one row below where it should.
2.) Fix the colors of my columns so they are correct... that shouldn't be too hard. :)
3.) Draw columns 48-63 at the appropriate time... also shouldn't be too hard. :)
4.) Make our my girl travel on the level... she needs to stop moving beyond the right edge of the screen. ...Maybe that would decrease the grafiti. :)

As the game gets bigger it blooms... this blooming is awesome! :mrgreen:

Thank you Kasumi for teaching me about new ways of debugging and trace logger is really cool!! :D :D

eedit.
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: 8x16 and whatever else unreg wants to know

Post by Kasumi »

Edit: Eh, got rid of that other stuff. The fix makes sense to me after reading scroll_screen. More theory, though.

scroll_screen (mostly) has zero need to be in your NMI.

The only thing that needs to be there is

Code: Select all

lda CameraX+0  ; time to MOVE THE CAMERA OBJECT!
   sta PPUSCROLL5     ; write the horizontal scroll count register
   
   lda #$00        ; (no vertical srolling)
    sta PPUSCROLL5     ; set the vertical scrol
Edit2: Ah, right. You also need a similar thing to get the lowest bit of CameraX+1 into $2000.

A good way to have your frame order is like this:
1. Start of main loop.
2. Update main character's position.
3. Update scroll position based on main character's position.
4. Use scroll position to find is new tiles need to be drawn. If so, put tiles in RAM buffer.
5. Set flag that tells the NMI it's safe to update the screen.
NMI:
1. Check flag to see if we should update the screen.
2. If yes, read from buffer and draw tiles to screen. (buffer was updated in step 4 of the main loop)
3. Write low byte of camera to $2005. (it was updated in step 3 of the main loop)

Right now, it seems like you're writing tiles in the NMI, THEN moving the camera in the NMI. This might scroll to new stuff, making the update you just did not necessarily cover the visible screen.

Simple question: What's the expected value of visible_left when camerax = $0000? If visible_left should be $00 (the first tile of camerax) in that case, your fix is not good and something else is wrong. If visible_left should be $F0, your fix is perfect. Same for if camerax = $0010 etc. Should visible_left be $10, or $00? (Note: The value it should be and what works might be different! If the subtract you added takes it to what you think it SHOULD be, you're golden. If you think it should be the value before the subtract, but just have the subtract because it works keep thinking.)
1.)Decrease grafiti... something is kind of wrong... there's a little bit more grafiti now after my subtraction of #$10 from visible_left
This might be a time to benchmark your NMI code using Nintendulator DX or VirtuaNES. If all the input is now good (i.e. You verified the RAM now has the correct 60 values, and you can't find issues with the code that reads from it) and you're getting bad output, it's possible you're trying to write to $2007 when rendering has begun. ~2270 cycles before rendering begins after the NMI. ~513 of those are eaten by sprite DMA. Anything from the start of the NMI to the last write to $2006/$2007 should happen in less.

Code: Select all

                     vblank: sta $401E
                            pha
               
                            tya
                            pha
                            txa
                            pha
*********SNIP*************
                      SkipUpdates: sta $401F
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

Wow! Once my game becomes alive... the Max part in Nintendulator debug says 2872... which is no good cause you said
Kasumi wrote:If all the input is now good (i.e. You verified the RAM now has the correct 60 values, and you can't find issues with the code that reads from it) and you're getting bad output, it's possible you're trying to write to $2007 when rendering has begun. ~2270 cycles before rendering begins after the NMI. ~513 of those are eaten by sprite DMA. Anything from the start of the NMI to the last write to $2006/$2007 should happen in less.

Code: Select all

                     vblank: sta $401E
                            pha
               
                            tya
                            pha
                            txa
                            pha
*********SNIP*************
                      SkipUpdates: sta $401F
so 2270-513 is quite less than 2000. Thank you for your example code... it helped me realize that my vblank is overused. :shock: Need to spend some time on using it less. :?: I am going to reread your reply. :) edit: Ok I'm going to think more about your simple question. It seems like a hard question right now. Ok it's supper time! Goodnight and thank you for all of this theory! :D
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: 8x16 and whatever else unreg wants to know

Post by Kasumi »

edit: Ok I'm going to think more about your simple question. It seems like a hard question right now.
It is up to you to define every bit about how this should work. It works as defined or it works by breaking the definition.

1. Is it defined such that upon striking zero, the zeroth column is drawn? (Zero should draw zero, negative is impossible)
2. Is it defined such that upon striking one, the zeroth column is drawn? (One should draw zero, zero would be negative so that update is skipped)
Something else? If you don't know how it should work, think about what makes the most sense, and commit to it. Then if something works by doing something that doesn't seem to agree with the definition, you can find out why and fix it. (Or fix your definition, if the working way makes sense and is actually better.) Just... don't guess. Find out why it works if it's better or if something else is wrong.

A bit more theory! Requires... well, somewhat large rewrites to put into place.

Think about the fastest way your updates could possibly happen.

Something like this?

Code: Select all

loop:
lda buffer,y;4
sta $2007;5
dey;2
bpl loop;3 on all but last
(Well... unrolled or partially unrolled or stack magic would be faster. But follow me on this. :wink: )
Obviously you need to load the value and store it. That's unavoidable. Then the end of your loop with just one dey. Why not set your buffer up beforehand so you can do exactly that? Or exactly whatever the fastest thing you can think of is?

Currently, you have the left and right columns interleaved. (left column tile) (right column tile) (left column tile) (right column tile) This means you have to loop through the list twice, and also use dey twice for each loop!

If you did this: (left column tile) (next left column tile) (etc.) ... (right column tile) (next right column table) (etc.)
One dey would take you to the next tile. Do you have to decide whether a column is even or odd? In your case, both are updated when they need updating, so you can just draw them in the same order.

Code: Select all

lda evencolumnaddrhi;3
sta $2006;4
lda evencolumnaddrlo;3
sta $2006;4

ldy #29;2
loop:
lda buffereven,y;4
sta $2007;5
dey;2
bpl loop;3 on all but last

lda oddcolumnaddrhi;3
sta $2006;4
lda oddcolumnaddrlo;3
sta $2006;4

ldy #29;2
loop2:
lda bufferodd,y;4
sta $2007;5
dey;2
bpl loop2;3 on all but last
The above is pretty bare minimum, and still takes ~870 cycles assuming no page crosses. Those 870 cycles do not include the ~513 for sprites. They do not include attribute updates. This is why your NMI needs to make as few decisions as possible. Your goal outside of the NMI should be to make all the decisions and set the data up so that the NMI can use it in the fastest possible way.

In your case, you do even, you do odd. If the routine that updates the buffers just used the same place in RAM every time, you wouldn't need a pointer for your NMI updates. You need pointers to the metatiles in case you have different sets, but the buffer for the NMI can be static. Heck using a pointer takes an extra cycle per load, plus you have to set it up. Static all the way!

Now, I didn't mention it before because it would not have helped your issue. But you can also make draw_RAMbuffers both simpler and faster. (Using a non interleaved buffer format or not!)

In the current code, you work really hard to preserve y (it contains where you are in the pointer.). But think about this! It takes just six cycles to store and restore it, and you could REALLY use it for other stuff.

I may not fully understand draw_RAMbuffers, but I think you can do something like this:

Code: Select all

;Metatile index is Y. Location in RAM buffer is in X.
 lda MetatileTile0, y;Assuming this top left tile
 sta RAMbuffereven, x;Even buffer
 lda MetatileTile1,y;Assuming this is top right tile
 sta RAMbufferodd, x;Odd buffer
 dex;Takes us to the next tile for BOTH buffers

 lda MetatileTile2, y;Assuming this bottom left tile
 sta RAMbuffereven, x;Even buffer
 lda MetatileTile3,y;Assuming this is bottom right tile
 sta RAMbufferodd, x;Odd buffer

lda pointerposition;used to be tya. You lose just one cycle doing this instead
  clc
 adc #$10 ;increment y by 16!!!!
tay

 dex
bpl
This avoids storing the tiles to temp RAM only to load them again. You only need to loop 30 times, and it covers two separate columns. You only lose one cycle from where the 16 is added to y, plus 3 for storing it someplace (not above, but of course needs to be done). But, because you no longer need to store/restore x in goodlocation you actually come out ahead. (Since you needed to move y anyway which was replaced with a load, but you didn't ever need to move x.) The added benefit is you can use y for something that really needs it.
I'm not sure if you're updating two 8x8 tiles, or two 16x16 tiles columns. It looks like your draw_RAMbuffers is doing two 16x16 columns, but I don't see much need for that. It's be tough to update that much in the NMI anyway.

You can also only update one 8x8 column at a time in your NMI. Even if you set up an even and odd buffer outside of the NMI, you can have the NMI draw the relevant column (just even or just odd) when scrolled to. It's not a problem if the data you setup isn't used exactly on the frame.

Anyway, enough from me, I start these posts and never stop writing... :|
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

Kasumi wrote:I'm not sure if you're updating two 8x8 tiles, or two 16x16 tiles columns. It looks like your draw_RAMbuffers is doing two 16x16 columns, but I don't see much need for that. It's be tough to update that much in the NMI anyway.

You can also only update one 8x8 column at a time in your NMI. Even if you set up an even and odd buffer outside of the NMI, you can have the NMI draw the relevant column (just even or just odd) when scrolled to. It's not a problem if the data you setup isn't used exactly on the frame.
My draw_RAMbuffers is doing two 16x16 columns at once because then it would be easier to update the attribute table colors. I think tepples has commented that Mario draws 4 columns each frame.

edit: I'm going to update my code so it's faster... your help you gave me is highly appreciated Kasumi! :D
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: 8x16 and whatever else unreg wants to know

Post by tepples »

Super Mario Bros. only draws half of a 16-pixel-wide metatile column per frame, but I seem to remember it draws four half columns in successive frames followed by the updated attributes. Frame-stepping with a nametable viewer open will show you what a particular game does.
User avatar
qbradq
Posts: 972
Joined: Wed Oct 15, 2008 11:50 am

Re: 8x16 and whatever else unreg wants to know

Post by qbradq »

Yea, byte-wide attribute updates are much easier to deal with, but there's just not enough time to push that much data in one frame. You need to split it between multiple frames with an FSM. Basically have a variable that keeps track of what kind of update you're going to do on the next frame, be it Tile Column 1, Tile Column 2, Attribute Stripe or Nothing.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: 8x16 and whatever else unreg wants to know

Post by tokumaru »

qbradq wrote:but there's just not enough time to push that much data in one frame.
There actually is, if you optimize the hell out of the code (unrolling loops and such)... I've managed to update 1 column of metatiles (60 name table bytes + 8 attribute bytes) + 1 row of metatiles (68 name table bytes + 16 attribute bytes), along with sprite DMA, all in regular VBlank time.

I'm not sure what my suggestion is for a beginner who's having trouble with this... I mean, code optimization isn't trivial, but splitting updates across different frames isn't exactly piece of cake either.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: 8x16 and whatever else unreg wants to know

Post by tepples »

tokumaru wrote:splitting updates across different frames isn't exactly piece of cake either.
You should see what I have to juggle for RHDE. At last count I have eight different kinds of updates that can happen in vblank during gameplay: no operation, 6x24 area of playfield, put up a blank pop-up window, clear a row of tiles to a constant color, fill a nametable, copy a 16-tile line (or two 8-tile lines) of text, copy 8 graphical tiles, and erase the part of a pop-up window that covers the playfield border. Worth it? Yes.

But in a side-scroller, you can easily fit the left and right halves of a column of metatiles plus attribute updates in one vblank.
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

I have a quick slow question: see, now I have four 30 byte RAMbuffers... RAMbufferw0even, RAMbufferw0odd, RAMbufferw1even, and RAMbufferw1odd. Currently I have decided that creating a copy of my draw_us_a_column method drawing RAMbufferw1even and RAMbufferw1odd might be too much waste of code... but it isn't very much code at all. What would you recommend I do instead? Is there some type of static pointer? Something to help me specify the address of either RAMbufferw0even or RAMbufferw1even and only requre 4* cycles? I feel a little bit more positive on my creating a copy idea. What would you do?

addition.
3gengames
Formerly 65024U
Posts: 2284
Joined: Sat Mar 27, 2010 12:57 pm

Re: 8x16 and whatever else unreg wants to know

Post by 3gengames »

If you can stack them on top of each other, 4x 32 byte buffers all next to each other, you can do something like this:

Code: Select all

  LDA ColumnNeeded
  ASL A;Multiply 2
  ASL A;4
  ASL A;8
  ASL A;16
  ASL A;32
  TAX ;X indexing varibale is A*32, making A our "select the buffer" value.
  LDY #$1F ;32 bytes
.Loop:
  LDA FourBufferBegin,X
  STA $2007
  INX ;(DEX depending on how they are arranged and such.)
  DEY ;Are we 32 tiles in yet?
  BPL .Loop ;Nope
  ;32 bytes have been written from one of the selected column buffers by here.
ETA: You say 4x 30 buffers. You can just shove 2 bytes in between then to make them even with each other, just some random engine variable. And then adjust the loop number and you should be able to see how this idea works by then. :)
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: 8x16 and whatever else unreg wants to know

Post by Kasumi »

ETA: You say 4x 30 buffers. You can just shove 2 bytes in between then to make them even with each other,
Or you can use a table which will allow you to have buffers that are 30 bytes long and will be smaller/faster anyhow. (5 asl instructions vs. 4 bytes that specify the offsets.)

Code: Select all

  ldx ColumnNeeded
  lda bufferoffsettable,x
  tax
A table can't be in the middle of this routine or it will be run as code, but elsewhere where it's safe put this:

Code: Select all

bufferoffsettable:
.db 0, 30, 60, 90
or maybe this:

Code: Select all

bufferoffsettable:
.db 29, 59, 89, 119
depending on if you're going forwards or backwards.

If you can't have 120 contiguous bytes, you have to use a pointer.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: 8x16 and whatever else unreg wants to know

Post by tepples »

I'd be tempted to allocate five 32-byte buffers $0100, $0120, $0140, $0160, and $0180.
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

Kasumi wrote:Now, I didn't mention it before because it would not have helped your issue. But you can also make draw_RAMbuffers both simpler and faster. (Using a non interleaved buffer format or not!)

In the current code, you work really hard to preserve y (it contains where you are in the pointer.). But think about this! It takes just six cycles to store and restore it, and you could REALLY use it for other stuff.

I may not fully understand draw_RAMbuffers, but I think you can do something like this:

Code: Select all

;Metatile index is Y. Location in RAM buffer is in X.
 lda MetatileTile0, y;Assuming this top left tile
 sta RAMbuffereven, x;Even buffer
 lda MetatileTile1,y;Assuming this is top right tile
 sta RAMbufferodd, x;Odd buffer
 dex;Takes us to the next tile for BOTH buffers

 lda MetatileTile2, y;Assuming this bottom left tile
 sta RAMbuffereven, x;Even buffer
 lda MetatileTile3,y;Assuming this is bottom right tile
 sta RAMbufferodd, x;Odd buffer

lda pointerposition;used to be tya. You lose just one cycle doing this instead
  clc
 adc #$10 ;increment y by 16!!!!
tay

 dex
bpl
This avoids storing the tiles to temp RAM only to load them again. You only need to loop 30 times, and it covers two separate columns. You only lose one cycle from where the 16 is added to y, plus 3 for storing it someplace (not above, but of course needs to be done). But, because you no longer need to store/restore x in goodlocation you actually come out ahead. (Since you needed to move y anyway which was replaced with a load, but you didn't ever need to move x.) The added benefit is you can use y for something that really needs it.
I highlighted part of your post. Let's start there. Ok, now your awesome code and ideas are working! :D Kasumi, thank you so much! ...I dont understand what you mean by "...you actually come out ahead." Well, I do understand what you mean, but I don't agree with you now. I'm lost please help me. I don't agree because I just decided to store my y value in goodLocation. So that's just the same... I really don't understand what you are talking about after the highlighted part. I needed to move y anyway, but i didn't need to move x? :?

Ok, now it's suppertime!! Must go eat! :)
User avatar
Kasumi
Posts: 1293
Joined: Wed Apr 02, 2008 2:09 pm

Re: 8x16 and whatever else unreg wants to know

Post by Kasumi »

The two pieces of code accomplish the same goal. (Though mine sets up the buffers differently. The different way would be faster for your NMI to read as well, though.) You don't need to store/restore X in goodlocation because it just stays in X. (I mean... you may still have to load it before the loop, but you no longer have to do it IN the loop.) You come out ahead because the code I added takes fewer cycles than the unneeded code I removed. (storing/restoring goodlocation)

I omitted some stuff, but the full thing would be like:

Code: Select all

ldx #29;Before everything. So not during the loop. This is like goodlocation
;But we load it with #29 instead of #59 for other reasons.
loop:
lda ($10), y;Originally omitted. Have to do that still to get the index, of course
sty pointerposition;This wasn't needed before, so we're 3 cycles behind
tay;This was needed before. We overwrote what was in y, which is why we stored it above


;Metatile index is Y. Location in RAM buffer is in X.
 lda MetatileTile0, y;Assuming this top left tile
 sta RAMbuffereven, x;Even buffer
 lda MetatileTile1,y;Assuming this is top right tile
 sta RAMbufferodd, x;Odd buffer
 dex;Takes us to the next tile for BOTH buffers

 lda MetatileTile2, y;Assuming this bottom left tile
 sta RAMbuffereven, x;Even buffer
 lda MetatileTile3,y;Assuming this is bottom right tile
 sta RAMbufferodd, x;Odd buffer

lda pointerposition;used to be tya. You lose just one cycle doing this instead
;But you gain that back by not having  
;ldx goodLocation and stx goodLocation (which would take 6 cycles)
;because X doesn't jobs in mine. It's always where you are in the buffer.

  clc
 adc #$10 ;increment y by 16!!!!
tay

 dex
bpl loop
After loading the metatile index, you did tax. Mine does this too (well... tay instead), in addition to storing the position to temp ram. That takes 3 extra cycles.

Later, you did tya because you can only add to A. Mine does lda tempRAM instead which takes 1 extra cycle than tya. (if zero page)

All together, I've made your metatile index transfer work another way. It takes 4 cycles extra.
I needed to move y anyway, but i didn't need to move x? :?
Right. You need X/Y for three tasks. 1. Loading from the pointer. (can only be done with Y) 2. Loading tiles from the metatile. 3. Storing the tiles to the buffer. This means either X or Y must change jobs, because two things can't do three jobs without changing. This is true for mine, and it was true for yours.

Because of how I preserved X instead of Y (which needed to be change jobs in both because it's needed to access the pointer), I've eliminated stx goodLocation and ldx goodLocation (DURING the loop anyway) which would have taken 6 cycles. So it ends up 2 cycles faster.

But mine is also faster for other reasons related to why I did the transfers that way. I dex once for every two times you do, because I do both even and odd at once using separate buffers. I avoid storing each tile of the metatile in the very beginning of the buffer RAM, because there's no need. I have where I am in the buffer in X already when I load the metatile index in y (you load where you are in the buffer later), so they're just stored exactly where they need to be. No need for the temp stores.

It saves a lot of cycles per loop. I think 42. 4 for doing dex twice instead of four times, 9*4=36 for not doing the indexed temp stores, 6 for not storing/restoring goodlocation. -4 for things I added.

This loops 15 times, so that's 630 cycles. 630 more if you do it twice for two 16x16 columns like it seems you're planning.

All that said, I make no guarantees this will work verbatim. There may be some extra stuff you need to do before/after the loop I'm forgetting, but I can't imagine any of it not making the savings worth it.

Edit: Heck, I was being safe, but you can move the clc before the add from the loop to before the loop if the pointer is set up such that y = 0 to access the first element. Nothing in the loop changes the carry except the add, and the adds during the loop will NEVER set the carry. (You add 16 to Y 15 times, which would only make it 240. Not greater than 255, so carry would be clear throughout.). This saves another 28 cycles per loop. 2*15 for not doing it in the loop -2 because you still need to do it before the loop.
Post Reply