It is currently Mon Nov 20, 2017 10:29 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 1386 posts ]  Go to page Previous  1 ... 89, 90, 91, 92, 93
Author Message
PostPosted: Wed Oct 18, 2017 9:16 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6446
Location: UK (temporarily)
VRC1, Bisqwit's MMC4 subset (doesn't use tile switching), and NINA-001 seem to all be reasonable ways to get 4+4 CHR banking with PRG banking, for less overhead than MMC1.

Other than the complication of possibly needing to handle interrupting MMC1 writes, I'm not clear the extra 4×(4+2)=24 cycles per bankswitch is particularly worth caring about, though.


Top
 Profile  
 
PostPosted: Wed Oct 18, 2017 9:42 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10114
Location: Rio de Janeiro - Brazil
lidnariq wrote:
I'm not clear the extra 4×(4+2)=24 cycles per bankswitch is particularly worth caring about, though.

Depends on how often you switch banks. If you do it 20 times or more per frame, it really starts to add up! 20 x 24 = 480 cycles, which's 1.7% of the 240 scanlines of picture. And it should actually be a little slower than that, because if you switch banks in the NMI for playing music, for example, the bankswitching routine used in the main thread has to do some additional steps to verify whether it was interrupted.


Top
 Profile  
 
PostPosted: Wed Oct 25, 2017 2:18 pm 
Offline
User avatar

Joined: Thu Apr 23, 2009 11:21 pm
Posts: 803
Location: cypress, texas
unregistered, on top of page 92 wrote:
edit2 20171006: thought it would be good to say that the code I mentioned "posted above" was reduced to
Code:
.rept 256
  .db <$
.endr
because the beginning of the first 15 banks in our game always start at $8000
Just wanted to say that this code was reduced by one more character:
Code:
.rept 256
  .dl $
.endr
that works with asm6 assembler. :) edit: fills a page of ROM with the low byte of the address. This helps implement tokumaru's idea (allows txy using ldy $8000, x and tyx using ldx $8000, y; takes an extra byte when compared with txa tay or tya tax, but tokumaru says his idea doesn't use the accumulator and that may result in saving space when not having to save and restore the accumulator; after rewriting the sections of code where this might be used, it is beneficial for me to use txa tay so haven't been able to use this yet) without using assembly instructions. :mrgreen: :D


Top
 Profile  
 
PostPosted: Thu Oct 26, 2017 10:33 am 
Offline
User avatar

Joined: Thu Apr 23, 2009 11:21 pm
Posts: 803
Location: cypress, texas
tokumaru, on page 43, wrote:
Setting the scroll should ALWAYS be the very last thing in your VBlank handler.
This is not true for our game. :) The end of my VBlank handler looked something like this:
Code:
jsr update_vram

;modify the flag
inc FrameReady

SkipUpdates:
jsr FamiToneUpdate
;"Setting the scroll should ALWAYS be the very last thing in your VBlank handler." -tokumaru pg 43
lda needscroll
beq +end
lda FORWARD_scroll
 beq +
jsr scroll_screen
jmp +end
+ jsr scroll_screen_left

+end ;return from the NMI (vblank)
pla
tax
pla
tay
pla
rti
Now, after being blessed with fixing a lot of the problems we found, I turned on the music and the screen started being lowered two rows (16 bits) whenever draw_our_column was being called. And that reminded me of tepples' guidance, that t gets clobbered everytime $2006 and $2007 are written to and so I should be sure to write $2000 and $2005 afterwards to fix t (his post is linked to at top of page 91). After checking, it was clear that $2000 and $2005 were being written to, inside a scroll_screen, every frame $2006 and $2007 were being written to, somewhere inside update_vram, and so it seemed that jsr FamiToneUpdate was taking too long. So after finding this earlier advice from tokumaru also on page 43 I tried moving jsr FamiToneUpdate to a spot right after +end because it doesn't include PPU operations. The game works and the screens don't lower 16 pixels for instances after draw_our_column is called inside update_vram while the music is playing!! :mrgreen: :D

edit: tokumaru, I understand now that there was a hidden reason behind your words that are quoted above... posted this to help others who read your quoted statement. :) I really respect you tokumaru; thank you so much for all of your fantastic help and ideas! :D

edit3: changed a "to" to "too" and changed text referring to page 92 to refer to page 91. Two mistakes corrected :)


Last edited by unregistered on Fri Oct 27, 2017 8:42 am, edited 2 times in total.

Top
 Profile  
 
PostPosted: Thu Oct 26, 2017 2:05 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10114
Location: Rio de Janeiro - Brazil
Correcting myself: setting the scroll should be the last PPU-related thing in your vblank handler. And you must keep track of how much time your vblank handler takes, because the scroll has to be set before the vertical blank ends. After setting the scroll, you're free to do things that do not affect rendering, such as playing music, reading the controllers, and whatever else you need to do at 60/50Hz.

If you know what you're doing, you can do a number of PPU operations outside of vblank, but if you're not sure, it's better to ask.


Top
 Profile  
 
PostPosted: Fri Nov 17, 2017 2:44 pm 
Offline
User avatar

Joined: Thu Apr 23, 2009 11:21 pm
Posts: 803
Location: cypress, texas
^Thanks tokumaru!! :D

tokumaru, your page of low bytes of the address, it has helped me to save two cycles and a zero page variable! :mrgreen: :D

in some code like this:
Code:
  ;sty ejectLRvalue
  ;code affected by y here
  lda altoX
  ldx FORWARD
  beq +zero
    clc
    adc $8000, y;will add whatever value is in y :) ;was adc ejectLRvalue
    ;rest of addition to 16bit value altoX here
    jmp DrawSprite
  +zero
    sec
    sbc $8000, y ;was sbc ejectLRvalue
    ;rest of subtraction from 16bit value altoX here
DrawSprite:

adc $8000, y takes 4 cycles and is 3 bytes big
adc ejectLRvalue took 3 cycles and was 2 bytes
but, being able to comment the sty ejectLRvalue saved 3 cycles and 2 bytes! :) So the code is the exact same size with the changes and it is 2 cycles faster! :mrgreen: :D (it will never run both sides of the branch)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1386 posts ]  Go to page Previous  1 ... 89, 90, 91, 92, 93

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group