It is currently Wed Nov 21, 2018 11:32 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 93 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next
Author Message
PostPosted: Sat Apr 07, 2018 10:11 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7580
Location: Chexbres, VD, Switzerland
NewRisingSun wrote:
I must say I find occasional sprite pop-up less annoying than constantly-black left 8 columns, especially if the graphics obviously weren't designed for it. A Boy and His Blob is the perfect example: thanks to the black bar on the left, the entire red border becomes assymetric. Ugly.

On a real TV, you won't notice any black border, let alone with a real CRT TV. However if the graphics aren't designed around it it's another problem. I must say 31 columns is not a very easy number to deal with


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 12:04 pm 
Offline

Joined: Wed Apr 04, 2018 7:29 pm
Posts: 38
Location: Montreal, Canada
Hi!

First of all, thanks everyone for the advice. I'm not going to reply to everyone but i did read the whole thing. I managed to make the whole thing work, the code is somewhat elegant, i think.

One thing I would recommend anyone doing that kind of thing is to take a few hours to create yourself a little reference implementation in a language that is a bit more expressive/flexible than ASM. I made myself a little C# control that behaves exactly like a PPU and and can show me which tiles/attributes are updated (red = tile, yellow = attribute). It allowed me to figure out what my algorithm was going to be and then I simply translated it in ASM. And when I had bugs in the ASM, I could simply compare and figure out where things went wrong. See attached image. If mesen could do this, it would be awesome.

One last problem I have is that in extreme conditions, like when going diagonally and being perfectly aligned in X and Y, and being on a frame where a full row and column of tiles AND attribute will load in, i will exceed the NMI cpu cycle limit by about ~400 cycles.

Since I am going to blank the top/bottom 16 scanlines, would it be possible to offload some of the PPU update work there? Like update the palettes there or part of the tiles/attributes? How common is this as a technique?

(I am also aware I could simply change my algorithm to, for example, just process 1 row or column per frame, but im too lazy to change that right now).

-Mat


Attachments:
Scroll.png
Scroll.png [ 19.27 KiB | Viewed 1707 times ]
Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 12:27 pm 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7580
Location: Chexbres, VD, Switzerland
bleubleu wrote:
One last problem I have is that in extreme conditions, like when going diagonally and being perfectly aligned in X and Y, and being on a frame where a full row and column of tiles AND attribute will load in, i will exceed the NMI cpu cycle limit by about ~400 cycles.

I'm fairly sure it should be possible to fit updates in VBlank, assuming you're talking about ONE row and ONE column of 8x8 tiles (not 16x16 metatiles).
You should use $2000.4 to your advantage when updating the nametable column; when updating an attribute table column this is more limited but you can still use this to your advantage knowing it will skip 3 rows, but you can still use 4 bulks of 2 bytes instead of 8 bulks of 1 byte.

So you should have the following:

  • Update a nametable row : Done in two bulks (because of vertical mirroring, you need to write to two screens), total of 32 bytes
  • Update an attribute table row : Done in one bulk of 8 bytes
  • Update a nametable column : Done in one bulk, total of 30 bytes (uses column mode)
  • Update an attribute table column : The most annoying, it has to be done in 4 bulks of 2 bytes. (uses column mode)

This means, in the absolute worst case, you have to write new address to $2006 8 times, and write 78 bytes of data to $2007. Assuming 4 cycles for load and 4 cycles for writing to the register, that's 8*(4+4+4+4) + 78*(4+4) = 752 cycles. Of course more cycles are needed for logic, etc... but this should be doable in VBlank without using any further tricks.

Quote:
Since I am going to blank the top/bottom 16 scanlines, would it be possible to offload some of the PPU update work there? Like update the palettes there or part of the tiles/attributes? How common is this as a technique?

This technique is uncommon, but was made probably popular by the game Battletoads (and it's sequel Battletoads and Double Dragon) which are very popular among NESDevers. Personally unless I'd really need the extra blanking time, I'd rather hide them using either a blank CHR-ROM bank or by disabling the background only and having 8 high priority sprites at Y=0 hiding the real sprites, avoiding Battletoads-style forced blanking on the top of the screen and all the problems this creates.

Also: if you aim at great scrolling you should hide the top scanlines, not the bottom, because sprites can't be shown partially on the screen on the top of the screen, but they can on the bottom. Also turning sprites rendering off during the frame can cause erratic problems.


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 12:31 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10979
Location: Rio de Janeiro - Brazil
If you turn rendering off at the top of the screen, as opposed to using blank tiles like Jurassic Park does, you can indeed use that time to keep accessing VRAM, but there are a couple of catches: Firstly, the NTSC dot crawl pattern will be different, because the variable PPU cycle at the beginning of the frame doesn't happen when rendering is off; Secondly, you don't get to use the MMC3 scanline counter to time the blanking area anymore, because it doesn't work when rendering is off. Sprite 0 hits are also not an option.

If you can deal with the slightly different appearance of the image (IIRC, Battletoads is like this, for example), and you have an alternate way to time the blanking area, then yeah, you can get quite a bit of extra vblank time.


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 12:43 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2775
Are you using a zero page buffer?


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 12:49 pm 
Offline

Joined: Wed Apr 04, 2018 7:29 pm
Posts: 38
Location: Montreal, Canada
Quote:
You should use $2000.4 to your advantage when updating the nametable column; when updating an attribute table column this is more limited but you can still use this to your advantage knowing it will skip 3 rows, but you can still use 4 bulks of 2 bytes instead of 8 bulks of 1 byte.


Right now i split my stuff in 3 buffers which use different strides: 1, 8 and 32. 1 and 32 uses $2000 to avoid having to increment the address manually. The 8 byte one is for attributes and needs to be handled manually.

But you are right, I think will try to avoid using generic buffers (which needs loops/logic) and I will try to unroll them in common update scenario (like a full column, etc.) in order to minimize the update cost.

Quote:
Are you using a zero page buffer?


No. Right, that should save a few cycles. I will look into that.

Quote:
If you turn rendering off at the top of the screen, as opposed to using blank tiles like Jurassic Park does, you can indeed use that time to keep accessing VRAM, but there are a couple of catche


Thanks. I have a lot to learn...

-Mat


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 1:35 pm 
Offline
User avatar

Joined: Wed Apr 02, 2008 2:09 pm
Posts: 1251
You can also use the stack instead of a zero page buffer. (If you're not.) Then you don't need to do iny or inx (if you are). Just pla, sta $2007 X times.

_________________
https://kasumi.itch.io/indivisible


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 2:11 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2775
Wouldn't you need rows of 33 tiles instead of 32?


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 2:31 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10979
Location: Rio de Janeiro - Brazil
Bregalad wrote:
I'm fairly sure it should be possible to fit updates in VBlank, assuming you're talking about ONE row and ONE column of 8x8 tiles (not 16x16 metatiles).

You can actually fit a lot in vblank depending on how optimized your code is. My engine can do both a column and a row of metatiles (i.e. 132 tiles) plus their attributes, along with a sprite DMA. I use completely unrolled code (i.e. no index increments or branches, which saves a lot of time) to barely fit this all in standard vblank time, and other types of updates (palettes, patterns, etc.) can only be done when the scrolling isn't taking all the time, but that's OK, because no game will ever scroll diagonally at 16 pixels per frame every frame, so there are plenty of opportunities for other types of updates.

Kasumi wrote:
You can also use the stack instead of a zero page buffer.

The stack is slower, though. That being said, I do find it a bit difficult to take advantage of ZP's faster load time. If you use indexing, the speed advantage is gone (takes the same time as absolute indexed or PLA, which's 4 cycles), so you need unrolled code to load from constant memory locations, but since 8-way scrolling means that rows and columns are nearly always split across 2 name tables, that's not trivial. It can be done, but you have to be clever.


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 2:55 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20794
Location: NE Indiana, USA (NTSC)
tokumaru wrote:
My engine can do both a column and a row of metatiles (i.e. 132 tiles) plus their attributes, along with a sprite DMA. I use completely unrolled code (i.e. no index increments or branches, which saves a lot of time) to barely fit this all in standard vblank time, and other types of updates (palettes, patterns, etc.) can only be done when the scrolling isn't taking all the time, but that's OK, because no game will ever scroll diagonally at 16 pixels per frame every frame

That infamous hill in Sonic the Hedgehog 2: Chemical Plant Zone act 2 is the exception that proves the rule.


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 3:26 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10979
Location: Rio de Janeiro - Brazil
It's a good thing I'm not particularly fond of Chemical Plant Zone so I wouldn't want to design a level like it anyway. Still, full speed on both axes is way too fast, so if at least one of the axis is slightly slower than 16 pixels per frame, maybe 14 or so, there'll still be some opportunities for other types of updates.

Another thing that prevents this from being a huge problem is that when the screen is scrolling that fast, the lack of other updates is much harder for the human eye to notice, and if someone does notice, they'll slow to look at it and things will immediately go back to normal, and there'll be nothing to see! :wink:


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 3:55 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2775
If you're using an unrolled loop, how do you jump across name tables?


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 4:00 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10979
Location: Rio de Janeiro - Brazil
The unrolled loop has several entry points, that you select based on the amount of tiles to transfer, and by using indexed addressing the index can be manipulated so the correct part of the buffer is read.


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 4:08 pm 
Offline

Joined: Sun Feb 07, 2016 6:16 pm
Posts: 526
bleubleu wrote:
If mesen could do this, it would be awesome.
Not a bad idea, shouldn't be too hard to highlight tile/attribute modifications in the nametable viewer, I think - I'll add it to my list.


Top
 Profile  
 
PostPosted: Sun Apr 08, 2018 4:33 pm 
Offline

Joined: Wed Apr 04, 2018 7:29 pm
Posts: 38
Location: Montreal, Canada
All right guys.

Thanks to all your advice I got my NMI running in < 1820 cycles all the times, even with crazy diagonal updates.

I unrolled all column loops, optimized the row (tile/att) updates, moved some stuff on ZP and everything works. My palette update loop wasn't unrolled, and not on ZP... shame on me. :(

It even simplified the X scrolling algorithm a bit.

Thanks!

-Mat


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 93 posts ]  Go to page Previous  1, 2, 3, 4, 5, 6, 7  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Google [Bot] and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group