It is currently Mon Oct 22, 2018 10:24 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 106 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 8  Next
Author Message
PostPosted: Sun Apr 05, 2015 1:04 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10909
Location: Rio de Janeiro - Brazil
Tsutarja wrote:
If it's not too complex, I could try to use faster method for reading the buffer.

The most straightforward way is to use the stack to hold the data, and use an unrolled sequence of PLA STA $2007, so that each byte takes 8 cycles to copy. That way, when processing the buffer, instead of counting down the number of bytes to copy you'd jump somewhere in the middle of this table.

First there's the unrolled code. 32 bytes seems like a good limit, because it's enough to update an entire row or column of tiles. Pattern updates would have to be broken down into blocks of 2 tiles, since each tile is 16 bytes.

Code:
Update32Bytes:
  PLA
  STA $2007
Update31Bytes:
  PLA
  STA $2007
Update30Bytes:
  PLA
  STA $2007
(...)
Update2Bytes:
  PLA
  STA $2007
Update1Byte:
  PLA
  STA $2007
UpdateNothing:

Then there's the jump table, so you know where to jump depending on how many bytes you have to copy:

Code:
JumpTableLo:
  .db <UpdateNothing, <Update1Byte, (...), <Update32Bytes

JumpTableHi:
  .db >UpdateNothing, >Update1Byte, (...), >Update32Bytes

Then, you'd do something like this when processing your update list:

Code:
  LDX LastUpdateIndex
ProcessUpdate:
  LDA UpdateAddressHi, x
  STA $2006
  LDA UpdateAddressLo, x
  STA $2006
  LDY UpdateCount
  LDA JumpTableLo, y
  STA Pointer+0
  LDA JumpTableHi, y
  STA Pointer+1
  JMP (Pointer)
Update32Bytes:
(...)
UpdateNothing:
  DEX
  BNE ProcessUpdate

You should also account for PPU address increments when setting up each transfer, so each update can select between increments of 1 or 32 bytes.

You can probably use 192 bytes of the stack for this, and still have 64 bytes left for normal stack use.


Top
 Profile  
 
PostPosted: Mon Apr 06, 2015 2:39 am 
Offline
User avatar

Joined: Sun Oct 12, 2014 11:06 am
Posts: 123
Location: Finland
If I store the data to the stack, doesn't that mess things up when NMI fires because its pushing the registers to the stack? The data for the background update isn't going to be at the top of the stack during NMI. Or do I move the register data to RAM temporarily? Or is it possible to change where the stack is begin read?

_________________
UP SIDE DOWN A B A B B A B A Hidari migi
L R L R STOP & DASH & UP & TALK Ijou nashi


Top
 Profile  
 
PostPosted: Mon Apr 06, 2015 6:30 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10909
Location: Rio de Janeiro - Brazil
Tsutarja wrote:
The data for the background update isn't going to be at the top of the stack during NMI. Or do I move the register data to RAM temporarily?

Yes, you have to swap between 2 different stack pointers. Say that the normal stack begins at $FF and grows down to $C0 (64 bytes), then the buffer begins at $BF and grows down to $00 (192 bytes). You'd need 2 variables to back up and restore the stack pointers as needed. Something like this:
Code:
   ;initialize the primary stack pointer
   ldx #$ff
   txs

   ;initialize the secondary stack pointer
   lda #$bf
   sta BufferSP

Then, whenever you need to write to the buffer, you switch to the secondary stack pointer:
Code:
   ;switch to the secondary stack pointer
   tsx
   stx NormalSP
   ldx BufferSP
   txs

After you're done, switch back to the normal stack pointer:
Code:
   ;switch back to the normal stack pointer
   tsx
   stx BufferSP
   ldx NormalSP
   txs

During VBlank, before executing the VRAM updates, you also switch stack pointers, and switch back when you're done.

It's not a problem if an NMI or IRQ fires when you're manipulating the buffer, because whatever gets pushed there will be taken back when the interrupt returns, no worries. Pulling values out of the wrong stack would be a problem, but pulling doesn't happen automatically, and you will not do it when the wrong stack is being used.

You do have to detect when an NMI has fired before the frame calculations have ended (lag frame), so that you don't try to use data from buffers that are only half full, but that would have to be done no matter what.


Top
 Profile  
 
PostPosted: Mon Apr 06, 2015 7:55 am 
Offline
User avatar

Joined: Sun Oct 12, 2014 11:06 am
Posts: 123
Location: Finland
tokumaru wrote:
You do have to detect when an NMI has fired before the frame calculations have ended (lag frame), so that you don't try to use data from buffers that are only half full, but that would have to be done no matter what.


Maybe I could use the sleeping (aka vblank_wait) variable to see if main loop has ended. If it's not, NMI will restore the registers and exit (or should I only skip graphical updates and leave sound engine etc. running?).

Then about the code you posted earlier:
I'm assuming that LastUpdateIndex is incremented every time individual update is requested and the starting addresses are stored to UpdateAddressHi and UpdateAddressLo (giving 16 or so bytes for both of them). I guess I store the UpdateCount and the PPU increment mode in similar style. Do I still need the bg_update_flag variable to see if there are any graphical updates, or does the LastUpdateIndex cover that one too?

_________________
UP SIDE DOWN A B A B B A B A Hidari migi
L R L R STOP & DASH & UP & TALK Ijou nashi


Top
 Profile  
 
PostPosted: Mon Apr 06, 2015 12:40 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10909
Location: Rio de Janeiro - Brazil
Tsutarja wrote:
Maybe I could use the sleeping (aka vblank_wait) variable to see if main loop has ended. If it's not, NMI will restore the registers and exit (or should I only skip graphical updates and leave sound engine etc. running?).

Yes, that's one way to do it. If the program isn't "sleeping" when the NMI fires, you can assume the frame logic hasn't finished. Most games update the sound even in lag frames. This is actually an advantage of having the game logic separate from the VBlank handler over having everything in the NMI or everything in the main loop. Besides sound, raster effects (like for status bars) also have to be configured in lag frames, otherwise they'll jump/glitch.

Quote:
I'm assuming that LastUpdateIndex is incremented every time individual update is requested and the starting addresses are stored to UpdateAddressHi and UpdateAddressLo (giving 16 or so bytes for both of them). I guess I store the UpdateCount and the PPU increment mode in similar style. Do I still need the bg_update_flag variable to see if there are any graphical updates, or does the LastUpdateIndex cover that one too?

Your understanding is correct. You could take advantage of the fact that PPU addresses only go up to $3FFF and use the upper bits of the address for extra information, like the PPU increment. That will save you a little bit of RAM.

The exact implementation is up to you, but I prefer to avoid redundancy as much as possible, so if I can deduce something from one variable I think it's pointless to have the same information in another one.


Top
 Profile  
 
PostPosted: Tue Apr 07, 2015 12:24 am 
Offline
User avatar

Joined: Sun Oct 12, 2014 11:06 am
Posts: 123
Location: Finland
Well, the buffer is already doing something when I input data to it, though its not doing what I want. Its supposed to draw text on screen vertically, but instead doing this:
Attachment:
Kemono-0.png
Kemono-0.png [ 1.91 KiB | Viewed 1998 times ]


Here is what I'm loading to the buffer:
Btw, just in case you are not aware of this (if you use some other assembler than NESASM), < is used for zero page addressing mode (at least I've been told so), which many other assemblers seem to use for "get low byte".

Code:
 TSX
 STX normal_sp
 LDX buffer_sp
 TXS

 LDX <bg_update_requests
 LDA #$05
 STA bg_update_count, x
 LDA #$20
 STA bg_update_address_lo, x
 LDA #$A0
 STA bg_update_address_hi, x
 LDA #$18
 PHA
 LDA #$15
 PHA
 PHA
 LDA #$0E
 PHA
 LDA #$11
 PHA
 INX
 STX <bg_update_requests

 TSX
 STX buffer_sp
 LDX normal_sp
 TXS


And here is how I read it in NMI:

Code:
BgUpdate:
 LDX <bg_update_requests
 CPX #$00
 BNE ReadBgBuffer
 JMP PaletteUpdate

ReadBgBuffer:
 TSX
 STX normal_sp
 LDX buffer_sp
 TXS

 LDX <bg_update_requests
ProcessBgUpdate:
 LDA bg_update_address_hi-1, x
 PHA
 AND #%00111111
 STA PPUAddr
 LDA bg_update_address_lo-1, x
 STA PPUAddr
 PLA
 AND #%10000000
 CMP #%10000000
 BNE HorizontalUpdate
 LDA <ppu_ctrl
 AND #%00000100
 CMP #%00000100
 BEQ IncModeDone
 CLC
 ADC #%00000100
 STA <ppu_ctrl
 STA PPUCtrl
 JMP IncModeDone
HorizontalUpdate
 LDA <ppu_ctrl
 AND #%00000100
 CMP #%00000100
 BNE IncModeDone
 SEC
 SBC #%00000100
 STA <ppu_ctrl
 STA PPUCtrl
IncModeDone:
 LDY bg_update_count-1, x
 LDA BufferJumpTableLo, y
 STA pointerLo
 LDA BufferJumpTableHi, y
 STA pointerHi
 JMP [pointerLo]
Update32Bytes:
 PLA
 STA PPUData
Update31Bytes:
 PLA
 STA PPUData

( ... )

Update01Bytes:
 PLA
 STA PPUData
Update00Bytes:
 DEX
 BEQ EndBuffer
 JMP ProcessBgUpdate

EndBuffer:
 STX <bg_update_requests
 TSX
 STX buffer_sp
 LDX normal_sp
 TXS

_________________
UP SIDE DOWN A B A B B A B A Hidari migi
L R L R STOP & DASH & UP & TALK Ijou nashi


Top
 Profile  
 
PostPosted: Tue Apr 07, 2015 7:17 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10909
Location: Rio de Janeiro - Brazil
EDIT: Oh, now I see you're accessing the tables with -1 when updating... let me check everything again. I kept the original answer below, but I'll post a new one if I catch the problem.

I see one big problem. After using the request index, you're incrementing it before saving it:
Tsutarja wrote:
Code:
 INX
 STX <bg_update_requests

...meaning that the saved value isn't pointing to the last written request, it's pointing to the NEXT (empty) slot. If you try to use that when updating you'll read junk. You probably want to change that so this variable points to the last written request. Maybe start it at 0 and increment before using, not after.

To make 0 be just a flag without sacrificing a memory position, you can access the list like this: lda bg_update_count-1, x (do this for all the properties). This is not mandatory, just a tip so you don't lose a byte of RAM at the beginning of each list.

Another tip, that I believe has already been pointed out:
Quote:
Code:
 LDX <bg_update_requests
 CPX #$00
 BNE ReadBgBuffer

There's no need for the CPX #$00. After every load or math operation, the Z flag is already set if the value is 0, there's no need to explicitly compare against 0. If you feel like keeping the instruction for clarity or something, that's fine. When I want to keep something unnecessary just for clarity, I simply comment the redundant instructions (such as a CLC before an addition when I know for sure that the carry is already clear). That way I don't sacrifice readability but I also don't lose the performance.


Top
 Profile  
 
PostPosted: Tue Apr 07, 2015 8:18 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10909
Location: Rio de Janeiro - Brazil
I still think you should increment the index before using the update slot, and use -1 everywhere the tables are accessed, for consistency. It's confusing to see the tables being accessed one way in one place and another way in another (this is what had me thinking I found the bug). This is up to you though.

The part that selects the increment is unnecessarily complicated. The first optimization you can do is not use the stack for backing up the high byte of the address, for 2 reasons: first, PHA and PLA (3 + 4 = 7 cycles) are slower than simply loading the value again (4 cycles), second, you don't even need to mask the high bits of the address before writing to $2006, because the PPU does that automatically.

Also, you don't need all the branching. The PPU increment setting is a just a bit, and the increment value you placed in the high byte of the address is also just a bit. Instead of checking values and branching you can simply copy the bit from one position to another, without having to make decisions or branch. After these optimizations you get this:

Code:
   LDA bg_update_address_hi-1, x
   STA PPUAddr
   LDY bg_update_address_lo-1, x
   STY PPUAddr
   AND #%10000000
   ASL
   ROL
   ROL
   ROL
   ORA <ppu_ctrl
   STA PPUCtrl

These are just improvements though, probably not related to the actual bug, which I can't find just from looking at the code. Did you try debugging in FCEUX (stepping through each instruction) to see when things start to go wrong? Is the "secondary stack" being filled correctly? Does the update code read the correct data?


Top
 Profile  
 
PostPosted: Tue Apr 07, 2015 9:33 am 
Offline
User avatar

Joined: Sun Oct 12, 2014 11:06 am
Posts: 123
Location: Finland
As far as I have tested so far, the incorrect background color is caused by the bit that I'm using for the PPU Increment mode selection. Also, I noticed that the first byte I'm pushing to the stack is not going there for some reason. The reason for the -1 is so that when bg_update_requests is zero there are no updates, but when it's not zero, it will read the buffer. So in other words its purpose it to prevent the code from skipping over the very first addresses it could read (if it was #$00).

Code:
 TSX
 STX normal_sp
 LDX buffer_sp
 TXS

 LDX <bg_update_requests
 INX
 LDA #$05
 STA bg_update_count-1, x
 LDA #$62
 STA bg_update_address_lo-1, x
 LDA #$20
 STA bg_update_address_hi-1, x
 LDA #$18                          ; This is not going to the stack (?)
 PHA
 LDA #$15
 PHA
 PHA
 LDA #$0E
 PHA
 LDA #$11
 PHA
 STX <bg_update_requests

 TSX
 STX buffer_sp
 LDX normal_sp
 TXS


I did some changes to the buffer reader in the NMI. Such as fixed some missing -1 markings, and the optimization you requested

Code:
BgUpdate:
 LDX <bg_update_requests
 CPX #$00
 BNE ReadBgBuffer
 JMP PaletteUpdate

ReadBgBuffer:
 TSX
 STX normal_sp
 LDX buffer_sp
 TXS

 LDX <bg_update_requests
ProcessBgUpdate:
 LDA bg_update_address_hi-1, x
 STA PPUAddr
 LDY bg_update_address_lo-1, x
 STY PPUAddr
 AND #%10000000
 ASL A
 ROL A
 ROL A
 ROL A
 ORA <ppu_ctrl
 STA PPUCtrl
 LDY bg_update_count-1, x
 LDA BufferJumpTableLo, y
 STA pointerLo
 LDA BufferJumpTableHi, y
 STA pointerHi
 JMP [pointerLo]
Update32Bytes:
 PLA
 STA PPUData
Update31Bytes:
 PLA
 STA PPUData

( ... )

Update01Bytes:
 PLA
 STA PPUData
Update00Bytes:
 DEX
 BEQ EndBuffer
 JMP ProcessBgUpdate

EndBuffer:
 STX <bg_update_requests
 TSX
 STX buffer_sp
 LDX normal_sp
 TXS

_________________
UP SIDE DOWN A B A B B A B A Hidari migi
L R L R STOP & DASH & UP & TALK Ijou nashi


Top
 Profile  
 
PostPosted: Tue Apr 07, 2015 10:40 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10909
Location: Rio de Janeiro - Brazil
I don't see anything obviously wrong with the code. Only debugging will solve this, I guess. If you don't mind sharing a ROM I can take a look when I have the time, but if you're familiar with FCEUX's debugger you could debug this yourself, instruction by instruction. It should be easy to tell whether the values are going to the stack or not, and why.


Top
 Profile  
 
PostPosted: Thu Apr 09, 2015 10:41 am 
Offline
User avatar

Joined: Sun Oct 12, 2014 11:06 am
Posts: 123
Location: Finland
Alright. The drawing and palette updates work correctly under the same NMI routine. Next up is the background data compression. I need to look into how to do it when I have more time.

_________________
UP SIDE DOWN A B A B B A B A Hidari migi
L R L R STOP & DASH & UP & TALK Ijou nashi


Top
 Profile  
 
PostPosted: Thu Apr 09, 2015 11:32 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10909
Location: Rio de Janeiro - Brazil
Tsutarja wrote:
Next up is the background data compression. I need to look into how to do it when I have more time.

The exact compression scheme will heavily depend on the game you're making: the dimensions of the levels, whether there's scrolling or not, how objects interact with the background... There are also technical aspects to consider, such as how much RAM you're willing to use for maps, so think carefully about all of that.


Top
 Profile  
 
PostPosted: Thu Apr 09, 2015 10:58 pm 
Offline
User avatar

Joined: Sun Oct 12, 2014 11:06 am
Posts: 123
Location: Finland
tokumaru wrote:
The exact compression scheme will heavily depend on the game you're making: the dimensions of the levels, whether there's scrolling or not, how objects interact with the background...


The levels are always 1 screen high. Width depends on stage. The stages themselves are broken into "rooms" of various sizes and you can scroll both left and right in the room as you want but you cannot go to the previous rooms (there may be exceptions to this). The only interaction between objects and background is collision with solid objects. There may be some special collisions such as spikes (or some other tiles that damages the player) and platforms that can be jumper through from below and dropped down with Down + A. This is the plan for now, but I may group all rooms together as a single room and use a "super mario bros scrolling". This depends on if I'm having trouble implementing the scrolling system I originally planned.

tokumaru wrote:
There are also technical aspects to consider, such as how much RAM you're willing to use for maps, so think carefully about all of that.


The amount of RAM depends on how it is used with maps.

_________________
UP SIDE DOWN A B A B B A B A Hidari migi
L R L R STOP & DASH & UP & TALK Ijou nashi


Top
 Profile  
 
PostPosted: Fri Apr 10, 2015 6:08 am 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10909
Location: Rio de Janeiro - Brazil
Tsutarja wrote:
The levels are always 1 screen high. Width depends on stage. The stages themselves are broken into "rooms" of various sizes and you can scroll both left and right in the room as you want but you cannot go to the previous rooms (there may be exceptions to this). The only interaction between objects and background is collision with solid objects. There may be some special collisions such as spikes (or some other tiles that damages the player) and platforms that can be jumper through from below and dropped down with Down + A.

Sounds straightforward enough. Not scrolling vertically avoids a lot of trouble. Let us know if you're unsure about how to implement something.


Top
 Profile  
 
PostPosted: Fri Apr 10, 2015 9:53 am 
Offline
User avatar

Joined: Sun Oct 12, 2014 11:06 am
Posts: 123
Location: Finland
I have now made some tile graphics for metatiles. I was thinking of the compression begin something like this:

Code:
 ; Metatiles are 32x32 pixels
Metatile00:                    ; e.g. Ground
 .db $80,$81,$80,$81
 .db $82,$83,$82,$83
 .db $80,$81,$80,$81
 .db $82,$83,$82,$83
 ;Attributes and collision data added here later

Metatile01:                    ; e.g. Wall
 .db $84,$85,$84,$85
 .db $86,$87,$86,$87
 .db $84,$85,$84,$85
 .db $86,$87,$86,$87
 ;Attributes and collision (none for a wall) data added here later

Screen00:                    ; Some indoors (or underground) screen with same floor and ceiling graphic
 .db Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00
 .db Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00
 .db Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01
 .db Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01
 .db Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01
 .db Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01,Metatile01
 .db Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00
 .db Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00,Metatile00
  ; Lower half of the last 32x32 metatile row is cut off, right?

Stage00:
 .db Screen00,Screen01,Screen02,Screen03 ; Let's pretend there are more screens than what I listed :P




What would be a good way of decompressing this? I probably need to have some kind of pointers in RAM that keeps track of things like current stage, current screen, etc. so the game knows where to look for the metatiles. I will probably update VRAM (when scrolling) so that I update 2 metatiles every frame the VRAM needs to be updated. So, it takes 4 frames to update one vertical row of metatiles. The player movement speed is around the same speed than what castlevania has, so I guess the updating speed is not too slow.

For now I can probably pretty safely say that I can allocate $0700 - $07FF for map RAM (overkill?). I'm assuming that I need to decompress the data to the map RAM and then push it to the "buffer stack".

Attachment:
File comment: Some graphics put together
kemono.png
kemono.png [ 19.59 KiB | Viewed 1860 times ]

_________________
UP SIDE DOWN A B A B B A B A Hidari migi
L R L R STOP & DASH & UP & TALK Ijou nashi


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 106 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6 ... 8  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group