Memory issue when extracting level data

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems. See the NESdev wiki for more information.

Moderator: Moderators

User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Memory issue when extracting level data

Post by DRW »

I'm playing around a bit with code for a new game. This new game will have a "The Legend of Zelda"-like scrolling, i.e. screen by screen.

To store the layout of each screen in ROM, I intended to use a method where each meta tile is represented by two bytes:
Low byte 1: x position
High byte 1: y position
Byte 2: Meta sprite index

The x and the y position would be aligned to 16 pixels each, so an x value of 3 would mean that the actual tiles start at pixel position 3 * 16 = 48.

Now, when it comes to scrolling, I would update the screen in "stripes".
I.e. if you go to the screen below you, the screen would be updated from the first to the last row, one row per frame.
If you go to the screen left of you, the next screen would be updated vertically, starting from the rightmost column, continuing to the leftmost column.

But to do this, I would need to extract the screen data first.
So, I thought that I create an array with 32 * 30 items where I include all the actual graphics tile values. Then, when the scrolling is being done, the procedure could grab the corresponding values from the array and copy them to the array that includes the PPU updates which is then read during NMI.

1. Once per scrolling:
Compressed data in ROM --> Uncompressed array in RAM

2. Every frame during scrolling until everything is drawn:
Single "stripe" of uncompressed array in RAM + PPU update address + drawing information (horizontal/vertical) --> PPU help array

3. Every frame during scrolling in NMI until everything is drawn:
PPU help array --> PPU


But now there's the first problem:

The uncompressed array is 32 * 30 items (+ color information). Which is already too much for the RAM. The RAM might have $600 bytes, but I also need bytes for the C software stack and enough room for non-zeropage variables. So, I fear that it might become pretty close.

Therefore my question: Is there another way to decompress the screen data?

I cannot extract line by line because the data is saved per meta tile. And each meta tile can have a variable size.
I.e. if the X and Y position of a tree is 6 and 5, then you still don't know how wide and tall the tiles of the tree will be until you check the meta sprite definition itself. So, it's pretty hard to say: "Extract me the 18th on-screen row from the compressed data."

What can I do here? How do I extract the data, to update the screen row by row or column by column during scrolling, without wasting 32 * 30 bytes for a buffer array?
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Memory issue when extracting level data

Post by tokumaru »

You can decompress to the hidden name table and copy the data from there row by row when scrolling vertically. Actually, that fits rather well with the second technique I showed you in your other thread.

When scrolling horizontally, you don't even have to do it column by column, since you always have a full name table free right beside the one being displayed. Just decompress the new screen metatile by metatile and scroll to it when you're done.

The only drawback with this technique is that there'll be a slight delay after the player touches the edge and before the scroll actually starts, since you'll most likely need a few frames to decompress the entire new screen, but a delay like this is *very* common in games with this kind of scrolling.

EDIT: Another option would be to scan the whole list of metatiles and only pick up the ones that belong in the row/column you're drawing. Gameplay is usually paused during these scroll animations, so CPU time is not much of a concern. Sorry, I didn't realize you have to deal with objects that have width and height. You can still buffer single rows/columns in this case, but it gets more complicated. For example, if you need to look for all the blocks that go on row 7, you'd subtract each object's Y coordinate from 7, and then compare the result against the object's height. If the result is less, it's the index of the row you need to draw. But at this cost I don't see any avantage over simply decompressing to the hidden name table.
User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Re: Memory issue when extracting level data

Post by DRW »

tokumaru wrote:You can decompress to the hidden name table and copy the data from there row by row when scrolling vertically. Actually, that fits rather well with the second technique I showed you in your other thread.
So, you mean I shall basically abuse the off-screen part of the PPU as a temporary data storage?

Well, it has the advantage that, during horizontal scrolling, the screen is drawn right away and doesn't need to be updated frame by frame after the decompression.

But is it even possible to read from the PPU, so that I can copy the data back to my primary screen when doing vertical scrolling? I always thought the PPU is a write-only issue.


Another thing that might be an issue:

If I'd decompress the ROM data to a temporary array, it's pretty straight forward.

Let's say, my current tile is: X = 3, Y = 5, meta tile = tree.

And the tree is defined as width = 2, tiles = 125, 126, 127, 128, 129, 130. (Which makes a height of 3.)

In this case, I would probably write something like this:

Code: Select all

x = 3 * 2; // Absolute tile value of x position, i.e. aligned to eight pixels instead of 16.
y = 5 * 2; // Absolute tile value of y position.
width = Tree[0]; // = 2
xOffset = 0;

for (i = 1; i < ArraySize(Tree); ++i)
{
    TempArray[32 * y + x + xOffset] = Tree[i];

    if (++xOffset == width)
    {
        xOffset = 0;
        ++y;
    }
}
If I do an analogue code while writing to the PPU instead of a simple array, wouldn't that mean that it takes much longer since the values are not set directly, but always pushed into that register?

Also, the very first thing that I'd do is something like memset(TempArray, DefaultTile, 32 * 30) for walkable ground. Wouldn't this be much longer when writing it to the PPU?
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Memory issue when extracting level data

Post by tokumaru »

DRW wrote:So, you mean I shall basically abuse the off-screen part of the PPU as a temporary data storage?
Yes. You don't have the PRG-RAM to hold the data, but the VRAM is just sitting there, unused.
Well, it has the advantage that, during horizontal scrolling, the screen is drawn right away and doesn't need to be updated frame by frame after the decompression.
Well, it doesn't have to be written column by column, but you still won't be able to update it in one go because rendering will still be enabled. You'll still have to buffer the objects in normal RAM before writing them to VRAM at arbitrary positions.
But is it even possible to read from the PPU, so that I can copy the data back to my primary screen when doing vertical scrolling?
Yes, you can LDA $2007 just fine. Just throw away the first value you read back, because there's a 1 byte delay when reading from VRAM. There's plenty of time in vblank to copy 64 bytes from the hidden name table to RAM, and them write them back to the visible name table.
Another thing that might be an issue:
You'll basically have to translate the object definitions into address+data pairs that you can blast to the PPU. For example, a tree that's 2x3 tiles could be translated into 2 vertical strips of data, each with a target address and 3 bytes of data. You'll have to somehow manage your VRAM update budget so you can stop buffering when the quota for each frame is met, and resume on the next frame. If you're reading the list of objects and detect that there's not enough time to draw an object, stop buffering for this frame and resume on the next, with an empty buffer.
If I do an analogue code while writing to the PPU instead of a simple array, wouldn't that mean that it takes much longer since the values are not set directly, but always pushed into that register?
I don't understand what register you're talking about.
Also, the very first thing that I'd do is something like memset(TempArray, DefaultTile, 32 * 30) for walkable ground. Wouldn't this be much longer when writing it to the PPU?
Emptying a name table to a specific value requires writing the same value 960 times. At 4 cycles per write, you'd need at least 4 frames to clear the screen, if you unroll the code at least partially. I don't think this will increase the delay by a noticeable amount.
User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Re: Memory issue when extracting level data

Post by DRW »

Alright, I'll try out what you told me.

Just one question:
tokumaru wrote:Well, it doesn't have to be written column by column, but you still won't be able to update it in one go because rendering will still be enabled. You'll still have to buffer the objects in normal RAM before writing them to VRAM at arbitrary positions.
What was the reason again why you cannot just update graphics in the PPU outside of vblank when rendering is turned on, even if the graphics are totally off-screen? I know there was an issue when doing this, but I don't remember what it was.
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
calima
Posts: 1745
Joined: Tue Oct 06, 2015 10:16 am

Re: Memory issue when extracting level data

Post by calima »

The PPU updates the position as it renders, so whatever you write would end up on the active screen.
tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: Memory issue when extracting level data

Post by tepples »

It's because Nintendo did not design a write buffer into the NES and Super NES PPU. The Texas Instruments TMS9918 family and the Master System VDP and Genesis VDP based on it include a separate write address and data buffer to hold pending VRAM writes, and they reserve some time throughout each scanline to commit buffered writes to VRAM. The buffer on the Genesis is four 16-bit words deep, for instance. But Nintendo presumably didn't want to include this sort of complexity in the Famicom PPU, which I've read was put together quickly as a response to the SG-1000, despite there being plenty of downtime in both the background and sprite parts of the scanline, such as redundant attribute reads. So to simplify things, Nintendo gave the rendering logic exclusive use of video memory during rendering and included only one video memory pointer, which the program and the rendering logic take turns using.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Memory issue when extracting level data

Post by tokumaru »

It doesn't matter whether the NT is visible or not, it's all still part of the same RAM chip, and when the PPU is reading that memory for rendering, you just can't write to it.
User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Re: Memory issue when extracting level data

Post by DRW »

A little question since this is what I'll do next:

Writing to the PPU shall only be done during vblank since it will corrupt the output otherwise.

But is reading from the PPU outside of vblank alright or can this also only safely be done during vblank?
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
User avatar
NovaSquirrel
Posts: 483
Joined: Fri Feb 27, 2009 2:35 pm
Location: Fort Wayne, Indiana
Contact:

Re: Memory issue when extracting level data

Post by NovaSquirrel »

Reading from the PPU has to be done during vblank too. The PPU is hogging the bus the whole time rendering is going on in order to do its own reads.

If you wrote to $2006 during rendering you would also corrupt the scrolling, since the PPU reuses a register for both purposes.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Memory issue when extracting level data

Post by tokumaru »

A memory access is a memory access, it doesn't matter if it's a read or a write, and all access to VRAM during rendering will corrupt the picture, because the PPU is accessing VRAM like crazy during that time.

It may still be possible to RMW attribute data in a single vblank if you don't need to update many bytes. You can buffer only the new attributes beforehand, and during vblank you can read from the AT and combine what you read with what you have in the buffer and store the result back in the buffer, and when all bytes are done, set the address (through $2006) again and this time write the buffered data to VRAM. It's slow, but since attribute rows are just 8 bytes long, it should be possible.
User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Re: Memory issue when extracting level data

Post by DRW »

In this case, it was not about the attributes. Attributes are mundane in this case: I always have a buffer array that stores all the 64 attribute bytes of the new screen that needs to be rendered. So, when an attribute update is due, I can simply prepare the relevant bytes from that buffer array outside of vblank and then send the data to the general PpuUpdateData buffer with all its additional information (PPU address high and low byte, data length). And this array then automatically gets written to the PPU in the UpdatePpu function that always gets called during vblank.

But in the current case, I was talking about the tile rows.

I did as you told me: While reading my compressed meta tiles that are not saved on a row by row basis, I don't extract them to a 32 x 30 bytes buffer array, but I write them directly into the off-screen. (Then I write the attributes all at once since I prepared the attributes buffer array during extraction.)

For horizontal scrolling, all I have to do now is, well, the scrolling. But for vertical scrolling, I now have to copy the PPU data from the off-screen into the visible screen, row by row, at predefined scrolling positions.

That's why I needed to know whether I can read from the PPU outside of vblank. In this case, I would have prepared the PpuUpdateData array before vblank.

Since I cannot do this, I have to create a function that runs in vblank and that first reads all necessary PPU data into an array and then writes the array back into the PPU immediately afterwards.

I hope that reading 32 bytes from the PPU and then writing 32 bytes to the PPU is doable during one vblank, is it? (In this case, my function is actually written in Assembly, not in C.)
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Memory issue when extracting level data

Post by tokumaru »

DRW wrote:I hope that reading 32 bytes from the PPU and then writing 32 bytes to the PPU is doable during one vblank, is it? (In this case, my function is actually written in Assembly, not in C.)
In assembly, sure. You can transfer a little over 200 bytes (including addresses, of course) plus a sprite DMA if you unroll the code, so 64 is perfectly doable even with all the overhead.
User avatar
DRW
Posts: 2225
Joined: Sat Sep 07, 2013 2:59 pm

Re: Memory issue when extracting level data

Post by DRW »

What do you mean with unrolling the code?

Let me show you my NMI code from the previous game (and it will probably look exactly the same for the new game):

Code: Select all

Nmi:

	PHA
	TXA
	PHA
	TYA
	PHA

	LDA WaitForNmi
	BEQ @end

	LDA #false
	STA WaitForNmi

	LDA PpuMaskValue
	STA PpuMask
	BEQ @end

	LDA #<Sprites
	STA OamAddr
	LDA #>Sprites
	STA OamDma

	JSR UpdatePpu

	LDA #0
	STA PpuScroll
	STA PpuScroll

	LDA #PpuCtrlHorizontal
	STA PpuCtrl

	; The rest of the scrolling is done outside NMI
	; when the corresponding places
	; (status bar end/sprite overflow flag,
	; parallax scrolling line/sprite 0 split)
	; are reached.

@end:

	JSR FamiToneUpdate

	PLA
	TAY
	PLA
	TAX
	PLA

	RTI
And this is my code for updating the PPU (I removed the CC65 segments and all the .export stuff for readability here):

Code: Select all

	PpuUpdateData: .res 4 + 32

UpdatePpu:

	LDX PpuUpdateData + 0
	BEQ @end

	LDA PpuUpdateData + 1
	STA PpuCtrl

	LDA PpuStatus

	LDA PpuUpdateData + 2
	STA PpuAddr
	LDA PpuUpdateData + 3
	STA PpuAddr

	LDY #4

@loop:

	LDA PpuUpdateData, Y
	INY
	STA PpuData

	DEX
	BNE @loop

	; The data length is set to 0,
	; so that the same update isn't repeated
	; when the NMI hits next time.
	STX PpuUpdateData + 0

@end:

	RTS
The UpdatePpu function for the new game will differ in that the array can have multiple PPU data, so that it doesn't only contain data for one row or column, but for multiple ones. Makes it a bit more efficient for updating the meta tiles.
I set the new PpuUpdateData size to 128 bytes to make sure that we definitely don't run out of vblank time.


So, what do you mean when you talk about unrolling?
My game "City Trouble":
Gameplay video: https://youtu.be/Eee0yurkIW4
Download (ROM, manual, artworks): http://www.denny-r-walter.de/city.html
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Memory issue when extracting level data

Post by tokumaru »

Unrolling means copying several bytes on each iteration of the loop as opposed to a single byte, effectively reducing the overhead caused by counting and branching.

Rolled (15 cycles per byte):

Code: Select all

Loop:
	lda buffer, y
	sta $2007
	iny
	dex
	bne Loop
Partially unrolled (9.625 cycles per byte, on average):

Code: Select all

Loop:
	lda buffer+0, y
	sta $2007
	lda buffer+1, y
	sta $2007
	lda buffer+2, y
	sta $2007
	lda buffer+3, y
	sta $2007
	lda buffer+4, y
	sta $2007
	lda buffer+5, y
	sta $2007
	lda buffer+6, y
	sta $2007
	lda buffer+7, y
	sta $2007
	tya
	clc
	adc #$08
	tay
	dex
	bne Loop
Completely unrolled code doesn't use loops at all. it's just a bunch of LDAs STAs for maximum speed.

The main drawback of unrolled loops is that the amount of data to transfer has to be a multiple of the amount of bytes copied per iteration, or you can jump to the middle of the loop when starting the transfer (a look-up table of entry points could help here) and skip some transfers on the first iteration so the final amount can be anything you need.
Post Reply