It is currently Fri Oct 20, 2017 5:44 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 19 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Fri Jan 13, 2017 4:57 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1401
I'm playing around a bit with code for a new game. This new game will have a "The Legend of Zelda"-like scrolling, i.e. screen by screen.

To store the layout of each screen in ROM, I intended to use a method where each meta tile is represented by two bytes:
Low byte 1: x position
High byte 1: y position
Byte 2: Meta sprite index

The x and the y position would be aligned to 16 pixels each, so an x value of 3 would mean that the actual tiles start at pixel position 3 * 16 = 48.

Now, when it comes to scrolling, I would update the screen in "stripes".
I.e. if you go to the screen below you, the screen would be updated from the first to the last row, one row per frame.
If you go to the screen left of you, the next screen would be updated vertically, starting from the rightmost column, continuing to the leftmost column.

But to do this, I would need to extract the screen data first.
So, I thought that I create an array with 32 * 30 items where I include all the actual graphics tile values. Then, when the scrolling is being done, the procedure could grab the corresponding values from the array and copy them to the array that includes the PPU updates which is then read during NMI.

1. Once per scrolling:
Compressed data in ROM --> Uncompressed array in RAM

2. Every frame during scrolling until everything is drawn:
Single "stripe" of uncompressed array in RAM + PPU update address + drawing information (horizontal/vertical) --> PPU help array

3. Every frame during scrolling in NMI until everything is drawn:
PPU help array --> PPU


But now there's the first problem:

The uncompressed array is 32 * 30 items (+ color information). Which is already too much for the RAM. The RAM might have $600 bytes, but I also need bytes for the C software stack and enough room for non-zeropage variables. So, I fear that it might become pretty close.

Therefore my question: Is there another way to decompress the screen data?

I cannot extract line by line because the data is saved per meta tile. And each meta tile can have a variable size.
I.e. if the X and Y position of a tree is 6 and 5, then you still don't know how wide and tall the tiles of the tree will be until you check the meta sprite definition itself. So, it's pretty hard to say: "Extract me the 18th on-screen row from the compressed data."

What can I do here? How do I extract the data, to update the screen row by row or column by column during scrolling, without wasting 32 * 30 bytes for a buffer array?

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Fri Jan 13, 2017 5:05 pm 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10063
Location: Rio de Janeiro - Brazil
You can decompress to the hidden name table and copy the data from there row by row when scrolling vertically. Actually, that fits rather well with the second technique I showed you in your other thread.

When scrolling horizontally, you don't even have to do it column by column, since you always have a full name table free right beside the one being displayed. Just decompress the new screen metatile by metatile and scroll to it when you're done.

The only drawback with this technique is that there'll be a slight delay after the player touches the edge and before the scroll actually starts, since you'll most likely need a few frames to decompress the entire new screen, but a delay like this is *very* common in games with this kind of scrolling.

EDIT: Another option would be to scan the whole list of metatiles and only pick up the ones that belong in the row/column you're drawing. Gameplay is usually paused during these scroll animations, so CPU time is not much of a concern. Sorry, I didn't realize you have to deal with objects that have width and height. You can still buffer single rows/columns in this case, but it gets more complicated. For example, if you need to look for all the blocks that go on row 7, you'd subtract each object's Y coordinate from 7, and then compare the result against the object's height. If the result is less, it's the index of the row you need to draw. But at this cost I don't see any avantage over simply decompressing to the hidden name table.


Top
 Profile  
 
PostPosted: Fri Jan 13, 2017 5:32 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1401
tokumaru wrote:
You can decompress to the hidden name table and copy the data from there row by row when scrolling vertically. Actually, that fits rather well with the second technique I showed you in your other thread.

So, you mean I shall basically abuse the off-screen part of the PPU as a temporary data storage?

Well, it has the advantage that, during horizontal scrolling, the screen is drawn right away and doesn't need to be updated frame by frame after the decompression.

But is it even possible to read from the PPU, so that I can copy the data back to my primary screen when doing vertical scrolling? I always thought the PPU is a write-only issue.


Another thing that might be an issue:

If I'd decompress the ROM data to a temporary array, it's pretty straight forward.

Let's say, my current tile is: X = 3, Y = 5, meta tile = tree.

And the tree is defined as width = 2, tiles = 125, 126, 127, 128, 129, 130. (Which makes a height of 3.)

In this case, I would probably write something like this:
Code:
x = 3 * 2; // Absolute tile value of x position, i.e. aligned to eight pixels instead of 16.
y = 5 * 2; // Absolute tile value of y position.
width = Tree[0]; // = 2
xOffset = 0;

for (i = 1; i < ArraySize(Tree); ++i)
{
    TempArray[32 * y + x + xOffset] = Tree[i];

    if (++xOffset == width)
    {
        xOffset = 0;
        ++y;
    }
}


If I do an analogue code while writing to the PPU instead of a simple array, wouldn't that mean that it takes much longer since the values are not set directly, but always pushed into that register?

Also, the very first thing that I'd do is something like memset(TempArray, DefaultTile, 32 * 30) for walkable ground. Wouldn't this be much longer when writing it to the PPU?

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Fri Jan 13, 2017 6:01 pm 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10063
Location: Rio de Janeiro - Brazil
DRW wrote:
So, you mean I shall basically abuse the off-screen part of the PPU as a temporary data storage?

Yes. You don't have the PRG-RAM to hold the data, but the VRAM is just sitting there, unused.

Quote:
Well, it has the advantage that, during horizontal scrolling, the screen is drawn right away and doesn't need to be updated frame by frame after the decompression.

Well, it doesn't have to be written column by column, but you still won't be able to update it in one go because rendering will still be enabled. You'll still have to buffer the objects in normal RAM before writing them to VRAM at arbitrary positions.

Quote:
But is it even possible to read from the PPU, so that I can copy the data back to my primary screen when doing vertical scrolling?

Yes, you can LDA $2007 just fine. Just throw away the first value you read back, because there's a 1 byte delay when reading from VRAM. There's plenty of time in vblank to copy 64 bytes from the hidden name table to RAM, and them write them back to the visible name table.

Quote:
Another thing that might be an issue:

You'll basically have to translate the object definitions into address+data pairs that you can blast to the PPU. For example, a tree that's 2x3 tiles could be translated into 2 vertical strips of data, each with a target address and 3 bytes of data. You'll have to somehow manage your VRAM update budget so you can stop buffering when the quota for each frame is met, and resume on the next frame. If you're reading the list of objects and detect that there's not enough time to draw an object, stop buffering for this frame and resume on the next, with an empty buffer.

Quote:
If I do an analogue code while writing to the PPU instead of a simple array, wouldn't that mean that it takes much longer since the values are not set directly, but always pushed into that register?

I don't understand what register you're talking about.

Quote:
Also, the very first thing that I'd do is something like memset(TempArray, DefaultTile, 32 * 30) for walkable ground. Wouldn't this be much longer when writing it to the PPU?

Emptying a name table to a specific value requires writing the same value 960 times. At 4 cycles per write, you'd need at least 4 frames to clear the screen, if you unroll the code at least partially. I don't think this will increase the delay by a noticeable amount.


Top
 Profile  
 
PostPosted: Sat Jan 14, 2017 4:40 am 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1401
Alright, I'll try out what you told me.

Just one question:
tokumaru wrote:
Well, it doesn't have to be written column by column, but you still won't be able to update it in one go because rendering will still be enabled. You'll still have to buffer the objects in normal RAM before writing them to VRAM at arbitrary positions.

What was the reason again why you cannot just update graphics in the PPU outside of vblank when rendering is turned on, even if the graphics are totally off-screen? I know there was an issue when doing this, but I don't remember what it was.

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Sat Jan 14, 2017 4:50 am 
Offline

Joined: Tue Oct 06, 2015 10:16 am
Posts: 556
The PPU updates the position as it renders, so whatever you write would end up on the active screen.


Top
 Profile  
 
PostPosted: Sat Jan 14, 2017 9:19 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 19104
Location: NE Indiana, USA (NTSC)
It's because Nintendo did not design a write buffer into the NES and Super NES PPU. The Texas Instruments TMS9918 family and the Master System VDP and Genesis VDP based on it include a separate write address and data buffer to hold pending VRAM writes, and they reserve some time throughout each scanline to commit buffered writes to VRAM. The buffer on the Genesis is four 16-bit words deep, for instance. But Nintendo presumably didn't want to include this sort of complexity in the Famicom PPU, which I've read was put together quickly as a response to the SG-1000, despite there being plenty of downtime in both the background and sprite parts of the scanline, such as redundant attribute reads. So to simplify things, Nintendo gave the rendering logic exclusive use of video memory during rendering and included only one video memory pointer, which the program and the rendering logic take turns using.


Top
 Profile  
 
PostPosted: Sat Jan 14, 2017 11:13 am 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10063
Location: Rio de Janeiro - Brazil
It doesn't matter whether the NT is visible or not, it's all still part of the same RAM chip, and when the PPU is reading that memory for rendering, you just can't write to it.


Top
 Profile  
 
PostPosted: Thu Jan 19, 2017 9:39 am 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1401
A little question since this is what I'll do next:

Writing to the PPU shall only be done during vblank since it will corrupt the output otherwise.

But is reading from the PPU outside of vblank alright or can this also only safely be done during vblank?

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Thu Jan 19, 2017 10:30 am 
Offline
User avatar

Joined: Fri Feb 27, 2009 2:35 pm
Posts: 211
Location: Fort Wayne, Indiana
Reading from the PPU has to be done during vblank too. The PPU is hogging the bus the whole time rendering is going on in order to do its own reads.

If you wrote to $2006 during rendering you would also corrupt the scrolling, since the PPU reuses a register for both purposes.


Top
 Profile  
 
PostPosted: Thu Jan 19, 2017 11:53 am 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10063
Location: Rio de Janeiro - Brazil
A memory access is a memory access, it doesn't matter if it's a read or a write, and all access to VRAM during rendering will corrupt the picture, because the PPU is accessing VRAM like crazy during that time.

It may still be possible to RMW attribute data in a single vblank if you don't need to update many bytes. You can buffer only the new attributes beforehand, and during vblank you can read from the AT and combine what you read with what you have in the buffer and store the result back in the buffer, and when all bytes are done, set the address (through $2006) again and this time write the buffered data to VRAM. It's slow, but since attribute rows are just 8 bytes long, it should be possible.


Top
 Profile  
 
PostPosted: Thu Jan 19, 2017 12:38 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1401
In this case, it was not about the attributes. Attributes are mundane in this case: I always have a buffer array that stores all the 64 attribute bytes of the new screen that needs to be rendered. So, when an attribute update is due, I can simply prepare the relevant bytes from that buffer array outside of vblank and then send the data to the general PpuUpdateData buffer with all its additional information (PPU address high and low byte, data length). And this array then automatically gets written to the PPU in the UpdatePpu function that always gets called during vblank.

But in the current case, I was talking about the tile rows.

I did as you told me: While reading my compressed meta tiles that are not saved on a row by row basis, I don't extract them to a 32 x 30 bytes buffer array, but I write them directly into the off-screen. (Then I write the attributes all at once since I prepared the attributes buffer array during extraction.)

For horizontal scrolling, all I have to do now is, well, the scrolling. But for vertical scrolling, I now have to copy the PPU data from the off-screen into the visible screen, row by row, at predefined scrolling positions.

That's why I needed to know whether I can read from the PPU outside of vblank. In this case, I would have prepared the PpuUpdateData array before vblank.

Since I cannot do this, I have to create a function that runs in vblank and that first reads all necessary PPU data into an array and then writes the array back into the PPU immediately afterwards.

I hope that reading 32 bytes from the PPU and then writing 32 bytes to the PPU is doable during one vblank, is it? (In this case, my function is actually written in Assembly, not in C.)

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Thu Jan 19, 2017 12:50 pm 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10063
Location: Rio de Janeiro - Brazil
DRW wrote:
I hope that reading 32 bytes from the PPU and then writing 32 bytes to the PPU is doable during one vblank, is it? (In this case, my function is actually written in Assembly, not in C.)

In assembly, sure. You can transfer a little over 200 bytes (including addresses, of course) plus a sprite DMA if you unroll the code, so 64 is perfectly doable even with all the overhead.


Top
 Profile  
 
PostPosted: Thu Jan 19, 2017 1:24 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1401
What do you mean with unrolling the code?

Let me show you my NMI code from the previous game (and it will probably look exactly the same for the new game):
Code:
Nmi:

   PHA
   TXA
   PHA
   TYA
   PHA

   LDA WaitForNmi
   BEQ @end

   LDA #false
   STA WaitForNmi

   LDA PpuMaskValue
   STA PpuMask
   BEQ @end

   LDA #<Sprites
   STA OamAddr
   LDA #>Sprites
   STA OamDma

   JSR UpdatePpu

   LDA #0
   STA PpuScroll
   STA PpuScroll

   LDA #PpuCtrlHorizontal
   STA PpuCtrl

   ; The rest of the scrolling is done outside NMI
   ; when the corresponding places
   ; (status bar end/sprite overflow flag,
   ; parallax scrolling line/sprite 0 split)
   ; are reached.

@end:

   JSR FamiToneUpdate

   PLA
   TAY
   PLA
   TAX
   PLA

   RTI


And this is my code for updating the PPU (I removed the CC65 segments and all the .export stuff for readability here):
Code:
   PpuUpdateData: .res 4 + 32

UpdatePpu:

   LDX PpuUpdateData + 0
   BEQ @end

   LDA PpuUpdateData + 1
   STA PpuCtrl

   LDA PpuStatus

   LDA PpuUpdateData + 2
   STA PpuAddr
   LDA PpuUpdateData + 3
   STA PpuAddr

   LDY #4

@loop:

   LDA PpuUpdateData, Y
   INY
   STA PpuData

   DEX
   BNE @loop

   ; The data length is set to 0,
   ; so that the same update isn't repeated
   ; when the NMI hits next time.
   STX PpuUpdateData + 0

@end:

   RTS

The UpdatePpu function for the new game will differ in that the array can have multiple PPU data, so that it doesn't only contain data for one row or column, but for multiple ones. Makes it a bit more efficient for updating the meta tiles.
I set the new PpuUpdateData size to 128 bytes to make sure that we definitely don't run out of vblank time.


So, what do you mean when you talk about unrolling?

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Thu Jan 19, 2017 2:02 pm 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10063
Location: Rio de Janeiro - Brazil
Unrolling means copying several bytes on each iteration of the loop as opposed to a single byte, effectively reducing the overhead caused by counting and branching.

Rolled (15 cycles per byte):
Code:
Loop:
   lda buffer, y
   sta $2007
   iny
   dex
   bne Loop


Partially unrolled (9.625 cycles per byte, on average):
Code:
Loop:
   lda buffer+0, y
   sta $2007
   lda buffer+1, y
   sta $2007
   lda buffer+2, y
   sta $2007
   lda buffer+3, y
   sta $2007
   lda buffer+4, y
   sta $2007
   lda buffer+5, y
   sta $2007
   lda buffer+6, y
   sta $2007
   lda buffer+7, y
   sta $2007
   tya
   clc
   adc #$08
   tay
   dex
   bne Loop

Completely unrolled code doesn't use loops at all. it's just a bunch of LDAs STAs for maximum speed.

The main drawback of unrolled loops is that the amount of data to transfer has to be a multiple of the amount of bytes copied per iteration, or you can jump to the middle of the loop when starting the transfer (a look-up table of entry points could help here) and skip some transfers on the first iteration so the final amount can be anything you need.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 19 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Bing [Bot], Majestic-12 [Bot] and 8 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group