VRAM copy in WRAM
-
- Posts: 18
- Joined: Fri Jul 24, 2015 1:30 pm
VRAM copy in WRAM
Hello. Initially making my game, I realized I ran out of VRAM space as it was taken up by all the main character sprite variations (standing, walking, etc.). It only Made sense to have a working copy in WRAM that would be uploaded at VBlank. However, I'm fairly sure I simply ran out of time, as I've noticed, the larger the pool of that data I was trying to copy, the harder my game would lag. DMA exists for OAM, however, no such thing for VRAM (I don't get it, but w/e).
After all my actual game calculations though, I'd go into a loop waiting for VBlank. In that loop, I'd attempt to send data in Mode 0 (Hblank). It seemed the game would run at a fine speed, however, I'd get my sprite to wobble, not collide properly and the screen would sometimes not scroll (which made some sense, since part of the frame on screen was adhering to calculations from the previous frame, whilst the other to the current calculations).
Thus, I got no clue on what to try next. Any tips?? Thanks
After all my actual game calculations though, I'd go into a loop waiting for VBlank. In that loop, I'd attempt to send data in Mode 0 (Hblank). It seemed the game would run at a fine speed, however, I'd get my sprite to wobble, not collide properly and the screen would sometimes not scroll (which made some sense, since part of the frame on screen was adhering to calculations from the previous frame, whilst the other to the current calculations).
Thus, I got no clue on what to try next. Any tips?? Thanks
-
- Posts: 271
- Joined: Sun Mar 27, 2011 10:49 am
- Location: Victoria, BC
Re: VRAM copy in WRAM
Unfortunately there's probably not an especially good answer for you, only some hard truths: vblank is short, the GB is slow, graphics are big. If you want to do anything interesting, your vblank routine needs to be as absolutely tight as possible (almost definitely in assembly and not C), and there's a very real limit to the possible. If you post some code we might be able to make some suggestions.
That said, a couple of things about your post caught my eye:
1) why keep a copy of the sprite data in WRAM? Are you decompressing it? WRAM is no faster than cart ROM, and definitely less plentiful.
2) In general, your main game loop should update an internal state of the game and your vblank should copy that information to the screen; inefficiencies in the latter shouldn't cause logic bugs in the former. When you talk about things not colliding properly and such, it doesn't quite sound like you're doing that, which suggests to me that maybe your vblank handler and the interaction it has with the rest of your game isn't necessarily thought out well enough.
That said, a couple of things about your post caught my eye:
1) why keep a copy of the sprite data in WRAM? Are you decompressing it? WRAM is no faster than cart ROM, and definitely less plentiful.
2) In general, your main game loop should update an internal state of the game and your vblank should copy that information to the screen; inefficiencies in the latter shouldn't cause logic bugs in the former. When you talk about things not colliding properly and such, it doesn't quite sound like you're doing that, which suggests to me that maybe your vblank handler and the interaction it has with the rest of your game isn't necessarily thought out well enough.
-
- Posts: 18
- Joined: Fri Jul 24, 2015 1:30 pm
Re: VRAM copy in WRAM
Code: Select all
@@VBlank:
@WaitLoop:
ldh A, ($44) ;load in the value from the LCDC Y-coordinate
cp $91 ;mask that checks if the processor is in mode 1 (V-blank)
jr nz, @WaitLoop
push AF
;;;check if we are in the intialization routine; if we are, simply return. Otherwise, do the proper VRAM updates
;;;;check for starter coordinates in WRAM - if bits 0, 1, 2 are loaded at $FF40 (LCD) we know that this is the after intialization.
ldh A, ($40)
and $07
cp $07
jr nz, @finishVBlank
ld A, $C0 ;SET A TO OAM START
call @@SpriteOAM ;Do sprite updates, the OAM table at the bottom of WRAM (don't have to call the macro, the subroutine's already in HRAM)
;load VRAM copy from WRAM...........
ld BC, $20
ld DE, MainSpr_NL_WRAM
ld HL, $8000
call @@CpyVRAMData ;again, reusing this subroutine
ld A, 1
ldh (vblank), A
pop AF
reti
@finishVBlank: ;used at initialization
pop AF
ret
Code: Select all
@@CpyVRAMData:
@VRAMDataLdLoop:
ld A, (DE)
inc DE
ld (HL+), A
dec BC
ld A, C
or B
jr nz, @VRAMDataLdLoop
ret
PS: this is the "original" version. At the moment I've modified the first few lines to this:
Code: Select all
ld BC, $400
ld DE, MainSpr_NL_WRAM
ld HL, $8000
@WaitLoop:
ldh A, ($41)
and ($03) ;this checks if the mode it's in is Mode 0, or HBlank. So while waiting for VBlank, I can still copy data...this works but not really
jr nz, @doneHBlankCopy
ld A, C
or B
jr z, @doneHBlankCopy
ld A, (DE)
inc DE
ld (HL+), A
dec BC
@doneHblankCopy:
ldh A, ($44) ;load in the value from the LCDC Y-coordinate
cp $91 ;mask that checks if the processor is in mode 1 (V-blank)
jr nz, @WaitLoop
-
- Posts: 271
- Joined: Sun Mar 27, 2011 10:49 am
- Location: Victoria, BC
Re: VRAM copy in WRAM
I'd strongly advise against trying to transfer during hblank - on the GB, almost all of your processing time happens while the screen's drawing. If you're spinwaiting for hblank, you're wasting almost all of your CPU time.
Doing the math, aside from the OAM DMA transfer, your vblank is taking around...1200 cycles, roughly? Which is still only about quarter of the actual length (4560 cycles) - you should be pretty comfortable. And now that I think about it, if you ran out of vblank time the result wouldn't be lag in your game - it would be corrupt graphics as you try to write to the screen while VRAM's locked and writes are blocked. That means that it's actually your main game loop that's taking too long. What are you doing in there?
Anyway, if you're keen, here are some nitpicks I have with your vblank handler:
First of all, why do you push/pop AF? Your wait loop is trashing them anyway.
Why are you waiting for vblank during initialization? Why do you have the screen on during initialization? If the screen's off vblank shouldn't fire...although I'm not sure if LY updates regardless, I use the method here to wait for vblank which doesn't involve spinning on LY, and which I recommend. At any rate, if you check this BEFORE you wait for vblank you can return right away, meaning that you don't need to do this test in vblank proper, saving you 36 cycles.
Why are you using all 16 bits of BC if you're only copying 32 bytes? If you do this instead:
...you save 4 + (12 * 20) = 244 cycles.
EDIT: Also, a bit more food for thought. If you make sure that your source data in VRAMDataLdLoop never crosses a 256 byte boundary, you can change that inc DE to inc E, saving you an additonal 4 cycles per iteration. Again, that's probably not necessary here, but thinking like that is always good with the GB.
Doing the math, aside from the OAM DMA transfer, your vblank is taking around...1200 cycles, roughly? Which is still only about quarter of the actual length (4560 cycles) - you should be pretty comfortable. And now that I think about it, if you ran out of vblank time the result wouldn't be lag in your game - it would be corrupt graphics as you try to write to the screen while VRAM's locked and writes are blocked. That means that it's actually your main game loop that's taking too long. What are you doing in there?
Anyway, if you're keen, here are some nitpicks I have with your vblank handler:
First of all, why do you push/pop AF? Your wait loop is trashing them anyway.
Code: Select all
;;;check if we are in the intialization routine; if we are, simply return. Otherwise, do the proper VRAM updates
;;;;check for starter coordinates in WRAM - if bits 0, 1, 2 are loaded at $FF40 (LCD) we know that this is the after intialization.
ldh A, ($40)
and $07
cp $07
jr nz, @finishVBlank
Code: Select all
ld BC, $20
...
@VRAMDataLdLoop:
ld A, (DE)
inc DE
ld (HL+), A
dec BC
ld A, C
or B
jr nz, @VRAMDataLdLoop
Code: Select all
ld B, $20
...
@VRAMDataLdLoop:
ld A, (DE)
inc DE
ld (HL+), A
dec B
jr nz, @VRAMDataLdLoop
EDIT: Also, a bit more food for thought. If you make sure that your source data in VRAMDataLdLoop never crosses a 256 byte boundary, you can change that inc DE to inc E, saving you an additonal 4 cycles per iteration. Again, that's probably not necessary here, but thinking like that is always good with the GB.
-
- Posts: 18
- Joined: Fri Jul 24, 2015 1:30 pm
Re: VRAM copy in WRAM
Thanks for the reply. However...
I DO transfer during HBlank, which you did condon. That brings me to my next point - I can't use HALT as H-Blank isn't treated as an interrupt. That loop looks for both V-Blank and H-Blank, so i don't see the point of using halt.
But thanks for the tip - I don't see corrupt graphics per se, however, bear in mind that's cause my main sprite tiles are occupying the first 4 tiles. Maybe it runs out of time whilst it's on the 143th tile or something, and that's not being used?
The push and pop of AF was totally useless though LOL definitely cut those out.
Also, I was trying to transfer a new copy occupying the entire VRAM memory - that's 800 bytes. That definitely crosses the 256 byte boundary thus I use BC as a counter.
In my game loop, I first calculate any changes in the main sprite's location, just x/y coordinates. Afterwards I check for collision (where I do some weird calculations, but it's not too hefty). Then, IF my character was attacking (I have a variable if he's in the attack animation, if so, input processing is skipped entirely) which isn't too hefty. That's it. I seriously doubt it's anywhere NEAR 66000 cycles.
I DO transfer during HBlank, which you did condon. That brings me to my next point - I can't use HALT as H-Blank isn't treated as an interrupt. That loop looks for both V-Blank and H-Blank, so i don't see the point of using halt.
But thanks for the tip - I don't see corrupt graphics per se, however, bear in mind that's cause my main sprite tiles are occupying the first 4 tiles. Maybe it runs out of time whilst it's on the 143th tile or something, and that's not being used?
The push and pop of AF was totally useless though LOL definitely cut those out.
Also, I was trying to transfer a new copy occupying the entire VRAM memory - that's 800 bytes. That definitely crosses the 256 byte boundary thus I use BC as a counter.
In my game loop, I first calculate any changes in the main sprite's location, just x/y coordinates. Afterwards I check for collision (where I do some weird calculations, but it's not too hefty). Then, IF my character was attacking (I have a variable if he's in the attack animation, if so, input processing is skipped entirely) which isn't too hefty. That's it. I seriously doubt it's anywhere NEAR 66000 cycles.
-
- Posts: 271
- Joined: Sun Mar 27, 2011 10:49 am
- Location: Victoria, BC
Re: VRAM copy in WRAM
Naw naw naw, I meant don't transfer during hblank. It's extremely short - only around 200 cycles - so you'll barely increase the number of bytes transferred, and you're burning through the time you should be spending on your game logic waiting for it. In fact, by my math, with all the overhead in your loop, you're only going to get one or two bytes per hblank in (there's also a race condition, in that you might detect that you're in hblank, but by the time you actually get to write your byte you're out of it and the write will break).
I'm still not exactly sure what you're trying to do - are you drawing your weapon tile data over top of the player tile data in WRAM? Why not use separate sprites for the player and the weapon? Then you could just blast tile data from ROM to VRAM directly. That's substantially less than 800 bytes to upload per vblank (aside: "a new copy occupying the entire VRAM memory - that's 800 bytes"; the entire VRAM is actually 8K (8192), and the tile data specifically is 6K) and would probably make your game loop nicer too.
I'm still not exactly sure what you're trying to do - are you drawing your weapon tile data over top of the player tile data in WRAM? Why not use separate sprites for the player and the weapon? Then you could just blast tile data from ROM to VRAM directly. That's substantially less than 800 bytes to upload per vblank (aside: "a new copy occupying the entire VRAM memory - that's 800 bytes"; the entire VRAM is actually 8K (8192), and the tile data specifically is 6K) and would probably make your game loop nicer too.
Re: VRAM copy in WRAM
Probably so as not to exceed 10 sprites per scanline, 40 sprites per scene, and 2048 bytes of sprite VRAM.adam_smasher wrote:Why not use separate sprites for the player and the weapon?
-
- Posts: 18
- Joined: Fri Jul 24, 2015 1:30 pm
Re: VRAM copy in WRAM
Lol sorry I misread. Alright, I'll take out the H-Blank spin; still though, my game would engage in it only if it's done with the calculations. Ill go through it with the debugger tonight and see how many times it steps through that H/V-Blank loop, cause, right, if it's done with calculations and there's time before V-Blank, surely it will copy a few bytes?adam_smasher wrote:Naw naw naw, I meant don't transfer during hblank. It's extremely short - only around 200 cycles - so you'll barely increase the number of bytes transferred, and you're burning through the time you should be spending on your game logic waiting for it. In fact, by my math, with all the overhead in your loop, you're only going to get one or two bytes per hblank in (there's also a race condition, in that you might detect that you're in hblank, but by the time you actually get to write your byte you're out of it and the write will break).
I'm still not exactly sure what you're trying to do - are you drawing your weapon tile data over top of the player tile data in WRAM? Why not use separate sprites for the player and the weapon? Then you could just blast tile data from ROM to VRAM directly. That's substantially less than 800 bytes to upload per vblank (aside: "a new copy occupying the entire VRAM memory - that's 800 bytes"; the entire VRAM is actually 8K (8192), and the tile data specifically is 6K) and would probably make your game loop nicer too.
Also, I meant $800 bytes, my bad. Which in decimal, as tepples said above, would be 2048 bytes (I wish it was 8k). $8000-$8800 (at least in my implementation) is sprite tile data. I have it set in $FF40 that my background tiles start at $8800+ so that area is irrelevant in terms of sprite manipulation.