It is currently Sat Nov 18, 2017 8:20 pm

All times are UTC - 7 hours





Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Fri Jun 03, 2016 5:40 pm 
Offline
User avatar

Joined: Sat Jul 12, 2014 3:04 pm
Posts: 946
Code:
<Bushytail>   beam chasing refers more to the beam that makes scanlines on a CRT, and trying to make code for graphical effects fast enough to fit
<Bushytail>   it comes up more on the 2600 for obvious reasons

This got me wondering just how many HBlank writes one can stuff into the NES. I choes to do this without WRAM, because it makes it actually possible to continuously run (as it takes a lot longer to rewrite the WRAM of an unrolled-loop, and picture-time is greater than Vblank, so…very much not having time in VBlank to rewrite all the HBlank code)

So, the biggest way to do it is to have a 240-byte table for each address/data of write. Initially I was thinking only of PPU registers, not mapper registers…which requires more than one byte of rewritable address.

Code:
;Maximum beamchasing...in RAM.
;Hblank is 28⅓, non is 86⅓ cycles
    ldx #16 (end of vblank)
looptop: (8)
    lda $write1addr,x ;256-byte tables of where we want each write
    sta $wr1+2 (16)
    lda $write2addr,x
    sta $wr2+2 (24)
    lda $write3addr,x
    sta $wr3+2 (32)
    lda $write4addr,x
    sta $wr4+2 (40)
    lda $write5addr,x
    sta $wr5+2 (48)
    lda $write6addr,x
    sta $wr6+2 (56)
    lda $write4,x;and what we want written
    sta $re4+1 (64)
    lda $write5,x
    sta $re5+1 (72)
    lda $write6,x
    sta $re6+1 (80)
   
    lda $write1,x (84)
    ldy $write2,x (88)
    sty $re2+1 (92)
    ldy #write3,x (96)
    stx $FF  (100)
re2:ldx #00 ;overwritten
wr1:sta $20ZZ   ;addr overwritten, write cycle is when hblank begins (103+1)
wr2:stx $20ZZ   ;addr overwritten (5)
wr3:sty $20ZZ   ;addr overwritten (9)
re4:lda #00     ;value overwritten (11)
wr4:sta $20ZZ   ;addr overwritten (15)
re5:lda #00     ;value overwritten (17)
wr5:sta $20ZZ   ;addr overwritten (21)
re6:lda #00     ;value overwritten (23)
wr6:sta $20ZZ   ;addr overwritten (27)
    ;…and we're out of hblank time. One cycle (and one-third) of leeway.
    ldx $ff ;(3)nonblank times
    inx ;(5)
    bne looptop;(8)
;so, 102+28 if we're perfect, we have the cycle in hblank but not out to spare
;92+28 if we're executing out of ZP

Should probably unrollx3 at least just to make it easy to deal with the third-cycles.
Obviously if one is writing out of WRAM one could unroll it all the way and just use ld#imm to easily fit, but that requires WRAM. I want t osee if it can fit in ZP in such a way, because it makes it easy to "bankswitch" our arbitrary tables (rewrite the 12 values, relatively easy to fit in Vblank)

You can get two more writes if you're doing the 2006/5/5/6 thing,but obviously you have to find some cycles to put them in.
Presently 91 bytes, and those 92+28 cycles (if ZP)...unrolling 3 times will drop some cycles, and make it easier to deal with the ⅔ cycle per line accruing.

(pre-post edit: save a cycle by changing the "save x" store to point at the load-x and making the ldx #imm; also means not having to not save a ZP slot for that.)

Of course, if we fix two of the writes to scroll registers, that will save the rewriting which-register-bytes…which is enough to drop it to fit a 3-unrolled into ZP, and also get it actually fitting under the cycle count, though sync cycles still need to be considered…

edit: or fix two to "disable render enable render", which makes for THREE ditched tables (2xaddr, 1xdata for the disable-render value)...but at cost of true-arbitrary writes.

edit2: added leading explanation. It occurs tome that a CHR bankswitch might be a desired write as well, which would require making one of the writes have its hi-address rewritable. Also fixed the ldx, as there are only 240 scanlines to write. :oops:

edit3,4: In sum: "[How] Can we fit six arbitrary Hblank (PPU register/CHR bank/VRAM) writes in every scanline every frame? If not, how much freedom needs sacrificing to fit them in?"


Last edited by Myask on Fri Jun 03, 2016 7:07 pm, edited 3 times in total.

Top
 Profile  
 
PostPosted: Fri Jun 03, 2016 5:52 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10112
Location: Rio de Janeiro - Brazil
I have no idea what you're trying to do. Care to write an introduction to your post to give it some context?


Top
 Profile  
 
PostPosted: Fri Jun 03, 2016 7:17 pm 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 1823
Location: DIGDUG
I feel like this post was written by some kind of automated nesdev post bot. It contains words that you would find in a typical post, but not in any order that makes any sense to me.

Quote:
Can we fit six arbitrary Hblank (PPU register/CHR bank/VRAM) writes in every scanline


Edit...my rough math says Hblank is about 30 cycles...
I suppose, sta stx sty is about 12 cycles, lda sta 8 cycles, lda sta 8 cycles...5 writes, per Hblank with timed code.

Even if you can time code for the entire screen, that gives you no rendering time for game logic, so is this for some kind of tech demo that changes a BG color every scanline?

(Edited) Disch in another post says Hblank is "28 cycles" long.

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Last edited by dougeff on Fri Jun 03, 2016 7:52 pm, edited 3 times in total.

Top
 Profile  
 
PostPosted: Fri Jun 03, 2016 7:35 pm 
Offline
User avatar

Joined: Sat Jul 25, 2015 1:22 pm
Posts: 501
dougeff wrote:
I feel like this post was written by some kind of automated nesdev post bot. It contains words that you would find in a typical post, but not in any order that makes any sense to me.

Quote:
Can we fit six arbitrary Hblank (PPU register/CHR bank/VRAM) writes in every scanline
LOL this is pretty funny. I want to try...


Quote:
Does MMC5 allow horizontal VRAM updates on consecutive odd cycle sprite NMIs?


I'm sorry, no offense intended and please split me if this is memeworthy but I found dougeff's post very humorous.


Last edited by darryl.revok on Fri Jun 03, 2016 7:48 pm, edited 2 times in total.

Top
 Profile  
 
PostPosted: Fri Jun 03, 2016 7:42 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6442
Location: UK (temporarily)
What Myask said is true; there are 341 pixels, 256 during rendering and 85 during hblank, so there are 28⅓ instruction cycles during hblank.

Except that whether we can use all 85 cycles during hblank depends on the nature of the raster effect. We might have as few as 62 pixels. (The light blue area on Ulfalizer's timing diagram; while the PPU is fetching patterns for sprites). Afterwards, we might collide with the tile fetches for the left-most two columns.

Also, the the relative alignment of the CPU and PPU means that we'll rarely get all 28(or 20) cycles; we probably only actually have 27(or 19) even given precision to single cycles.

Cycle-perfect timing means the first write can finish on the first cycle of hblank. That leads 26/18 cycles for any subsequent loads and stores.


Top
 Profile  
 
PostPosted: Fri Jun 03, 2016 7:42 pm 
Offline
User avatar

Joined: Sat Jul 25, 2015 1:22 pm
Posts: 501
Awww you edited it. I suppose I'll have to edit mine too. Anyway, yeah I see you figured out the context of the sentence.

Here's a thread from when I first started playing with them. viewtopic.php?f=2&t=13360 The discussion convinced me to limit to techniques not requiring disabling rendering during a scanline, for gameplay at least. There are specific situations where it could be useful, like a scroll bar, but for the most part, it doesn't seem feasible in gameplay, therefore a generic solution to do the technique at the best extent possible for the hardware isn't particularly useful, as it would have to be tailored to when and where something like a palette change could happen.

If you could somehow figure out a way to fit one palette color change in hBlank, that could be big, but that alone would take some magic. People keep figuring out new things all of the time though.


Top
 Profile  
 
PostPosted: Fri Jun 03, 2016 7:45 pm 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 1823
Location: DIGDUG
Sorry, you guys are too quick for me. I restored the original, so the replies make sense.

I have a bad habit of posting before I've fully thought through the comment, and then editing my comment after it's been posted.

Another edit...If the first write to PPU is the first half of a PPU address, then "yes" you can fit 6 in a Hblank, the first occurring just prior to it.

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Last edited by dougeff on Fri Jun 03, 2016 8:02 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Fri Jun 03, 2016 7:59 pm 
Offline

Joined: Sun Apr 13, 2008 11:12 am
Posts: 6442
Location: UK (temporarily)
Myask wrote:
"[How] Can we fit six arbitrary Hblank (PPU register/CHR bank/VRAM) writes in every scanline every frame? If not, how much freedom needs sacrificing to fit them in?"
Look at my annotated Indiana Jones title screen. 7 writes on three scanlines in a row, five clusters each vsync. (Yes, the first write takes advantage of the first write to $2006 being buffered)

... Oh, man, I don't know/keep forgetting the offset between PPU cycles and when pixel N shows up on-screen. That makes this annoying to say anything useful.

The last write (enable) is timed such that it has just a little clearance (pixel 311 at latest) before the first real nametable fetch restarts (cycle 320) ... and they've also got the "conceal leftmost 8 columns" bit on, whatever that means.


Top
 Profile  
 
PostPosted: Fri Jun 03, 2016 8:48 pm 
Online
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3482
Location: Indianapolis
Check out this attachment if you want to see some old and ugly code where I tried to pull something like this off. It does no less than 8 PPU register writes every scanline. It tries to shut the screen off early on every line, so the usable horizontal size is actually smaller than the NES screen. It was partly successful, it worked some of the time. I didn't understand why at the time, but I think it must have been from the differing CPU/PPU alignments on power-up. IIRC, moving it back 1 cycle or forward 1 cycle hosed everything up, so it was a fun little experiment that almost worked. I'm sure it could have been done better.


Attachments:
fautest.zip [19.35 KiB]
Downloaded 74 times
Top
 Profile  
 
PostPosted: Mon Jun 06, 2016 6:42 am 
Offline
User avatar

Joined: Sat Jul 25, 2015 1:22 pm
Posts: 501
Just a random thought on this matter.

If you really want to cram as many operations as possible in a scanline, I think you're going to have to write machine code into RAM, and load absolute values.

I'd be interested in seeing some developments in the raster arena that fit into game design. Doing fancy things in this department often requires some significant trade-off of cycles, inflexibility, or difficulty in implementation.

Perhaps a technique to safely cram updates after an already timed scroll split could be useful. Doing something like a status bar or some minimal parallax isn't often too hard. I started thinking about this last night and considered doing the same as I described, with absolute values in RAM, so that I could assign a unique palette to my top status bar.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group