Need some tips for a demo

Discussion of hardware and software development for Super NES and Super Famicom.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Posts: 8
Joined: Thu Feb 14, 2019 2:25 pm

Need some tips for a demo

Post by vivi168 » Thu Feb 14, 2019 2:48 pm


So I've been working on SNES demo for a while, and it's working somewhat well right now. I can move the sprite from one border to the other, it's animated, and the background is scrolling.

Now, I don't know if anything I've done is best practice (and I think it certainly isn't). Maybe someone more experienced can give me some tips/make a code review?

My main questions are related to addressing.

Currently, what I'm doing is defining RAM addresses for global variables like this:

Code: Select all

PLAYER_SX   = $0001
BGH_SCRL    = $0002
BGH_SCRH    = $0003
It's rather painful and ugly. I saw you can do it like that instead:
But I can't get it to work. I think maybe it has to do with bank switching? And for the case of of two bytes label, how can I access the high byte?

Code: Select all

.segment "ZEROPAGE"
PLAYER_SX: .res 1
BGH_SCR: .res 2
Also, for the assets, I have for example: (I manually declare the asset size and its ROM location.)
Isn't there a way to find out the same information automatically? The start address, the bank and the size?

Code: Select all

.segment "DATA"
.incbin "assets/background.png.vra"

BG_START = $8000
BG_SIZE = $800

; loading example, I've defined a macro
.segment "CODE"
; VRAM start, bank, asset start, asset size
transfer_vram #$0000, #$02, #BG_START, #BG_SIZE

You can find the complete source code here:

Thank you for your help :)

User avatar
Posts: 4218
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Need some tips for a demo

Post by koitsu » Thu Feb 14, 2019 3:33 pm

Note for readers: individual is using ca65/ld65 (I looked at the script). Note for OP: please be sure to tell people what assembler you're using next time. :-)

IMO, there is nothing wrong with what you're doing with regards to managing memory (read: large numbers of variable equates for what goes where in direct page). I am one of the few here on this forum who doesn't have a problem with that model; it's universally understood and makes debugging a bit easier.

What I'm wondering is: why do they bother you when you say "it's rather painful and ugly". Maybe if you could explain what you mean by that, and what you think wouldn't be painful/ugly, it would help?

However, ca65/ld65 offer some "better ways" of managing your memory topology. I will let others talk about ways that work well for them, as everyone has their own model/approach.

As for your question about using .res in the ZEROPAGE segment and how with BGH_SCR .res 2 you'd access the upper byte of BGH_SCR: BGH_SCR+1 would get you what you want, e.g. sep #$20 / lda BGH_SCR gets you the lower byte of the value, while lda BGH_SCR+1 gets you the upper byte. If using a 16-bit read, rep #$20 / lda BGH_SCR will get you the full 16-bit value.

All this is doing is telling the assembler to add one to the calculated address of BGH_SCR. That method works the exact same way if you were using equates like so:

Code: Select all

PLAYER_SX = $0000
BGH_SCR   = $0001
OTHERVAR  = $0003
Note that $0002 was skipped -- that's because that can be used for the upper byte of BGH_SCR. The way this was done historically (and still works fine today) was to use code comments and/or additions in the equates themselves. In fact, even in other PLs like Forth, this is how you keep track of the stack or similar operations. There's nothing wrong with it. Examples:

Code: Select all

PLAYER_SX = $0000   ; 1 byte
BGH_SCR   = $0001   ; 2 bytes
OTHERVAR  = $0003   ; 1 byte
OTHERVAR2 = $0004   ; 1 byte

Code: Select all

PLAYER_SX = $0000         ; 1 byte (address $0000)
BGH_SCR   = PLAYER_SX+1   ; 2 bytes (addresses $0001 and $0002)
OTHERVAR  = BGH_SCR+2     ; 1 byte (address $0003)
OTHERVAR2 = OTHERVAR+1    ; 1 byte (address $0004)
If you find the .res method easier/better for you, go for it! Do whatever works best for you. Just keep in mind that when moving from one method to another, you may want to enable generation of code listings using ca65's -l {filename} flag (that's lowercase-ELL) and compare before vs. after, to make sure all of your assembled results are identical to what they were before.

As for determining "asset size" (length) dynamically: others can help with this. I forget how its done in ca65.

Finally: do not forget that ca65 does not work so well with the 65816's direct page model (vs. the 6502's ZP model). As such, I suggest keeping D=$0000 at all times to try and alleviate any kind of pain or confusion. See this thread, and the referenced links/threads in the initial post, if you don't know what this means.

User avatar
Drew Sebastino
Formerly Espozo
Posts: 3503
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: Need some tips for a demo

Post by Drew Sebastino » Thu Feb 14, 2019 7:46 pm

If I'm not mistaken, wasn't I still using WLA DX at that time when that error occurred? I don't think I've ever run into any problems using direct page with ca65. It can just be hard to give up the ability to relocate it; without moving direct page, you cannot index more than one thing when going through your object code because you'll be forced to use either x or y. You also already mentioned about using it for writing to MMIO registers in time critical situations (during vblank).

User avatar
Site Admin
Posts: 3899
Joined: Mon Sep 20, 2004 6:04 am
Location: Indianapolis

Re: Need some tips for a demo

Post by Memblers » Thu Feb 14, 2019 8:50 pm

vivi168 wrote: Also, for the assets, I have for example: (I manually declare the asset size and its ROM location.)
Isn't there a way to find out the same information automatically? The start address, the bank and the size?
Here's how to get that info. The size isn't given automatically, but can be calculated by subtracting the end address from the start address.

Code: Select all

 .incbin "background.bin"

lda #<bg_start ; lower address byte
lda #>bg_start ; upper address byte
lda #^bg_start ; bank byte
bg_size = bg_end - bg_start

Posts: 8
Joined: Thu Feb 14, 2019 2:25 pm

Re: Need some tips for a demo

Post by vivi168 » Fri Feb 15, 2019 11:35 am

Thanks everyone for your answers, sorry for not mentioning which assembler I was using.
What I'm wondering is: why do they bother you when you say "it's rather painful and ugly". Maybe if you could explain what you mean by that, and what you think wouldn't be painful/ugly, it would help?
It's mainly when adding new locations. If I want to keep a certain order in the file, I need to modify each address manually. (For example adding a location in between two others). And also it would prevent errors (eg: two locations overlap each other in the ram because I forgot to bump the next address in the list).

I will try implementing each of your advices.

I have one more question, concerning background scrolling. Currently, if the player is far away from the map border, I lock its sprite in the middle of the screen and scroll the background instead. What I do is, store previous background scroll offset like so:

Code: Select all

and during the NMI interrupt, I apply the scroll offset like so:

Code: Select all

Is this the correct way to go about BG scrolling? Because I noticed a small flicker when the background is scrolling.

Edit: Maybe it’s normal?, I noticed the same thing in FF6 intro cinematic, when the background is scrolling

Also, if I wanted the map(level) to be composed of multiple tilemap, what would be a good strategy to "append" the next tilemap?

Posts: 8
Joined: Thu Feb 14, 2019 2:25 pm

Re: Need some tips for a demo

Post by vivi168 » Sat Feb 23, 2019 3:30 pm

I still can't wrap my head around backgroud scrolling.

I've observed how some games work with bsnes debugger and memory editor.

What I've observed is, when the character moves, the BG scroll AND portions of the tilemap in the vram gets replaced with some other portion of tilemaps from the rom. I can't find a way to replicate this.

For example, if I have a level composed of two tilemaps contigous in the rom. What strategy can I employ to take the last 30 (8pixels wide) columns of the first tilemap and the first 2 columns of the second tilemap and store it in the vram?

(Or maybe store each tilemaps for the current level in the VRAM, and assemble from there? instead of DMAing each frame from the ROM)

Does anybody know of a simple way to achieve this?

Here is a picture to illustrate my question :

User avatar
Posts: 4218
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Need some tips for a demo

Post by koitsu » Sat Feb 23, 2019 6:34 pm

The reason you're not getting any answers is because this is one of the things every game programmer has to solve in their own way, re: the "strategy", and there is no simple way to describe how to do it. It also varies heavily on the type of game you want to do and what you actually want the experience to be. What you're asking about is one of the key parts to an actual game engine.

This is one of the things I struggled with for literally years when starting out doing SNES stuff in the very early 90s. Conceptually it's simple -- "I just need to make sure the sides of the screen (columns of tiles) get updated before I pan the screen left/right" (or for scrolling vertically, updating rows of tiles) -- except actually implementing the "how do I correlate my map with the actual tiles on screen" part is not easy at all. This is further compounded by technical complexities such as the SNES's graphics modes (mode 1 is pretty common, but it all depends on what you want, maybe you'd prefer mode 2 or 3), and then later, intermediary data formats (that make up your "virtual world") and having to convert from that to SC data, handle sprites, collision, layering, etc..

There is not enough PPU RAM (VRAM) for "an entire world" -- there is only enough for essentially 2 horizontal screens and 2 vertical screens of tilemaps (SC data) on a per-background basis (for modes 0, 1, 2, 3, and 4). 2x2 is what you get when you use a screen size of 3 (e.g. bits 1 and 0 of $2107/2108/2019/210A are both set).

The BG scroll MMIO registers essentially let you "pan around" this PPU RAM, but you yourself have to code the routine that updates the tilemap (SC data), the system does not do it itself. Thus, you yourself end up having to write a bunch of 65816 that updates the non-visible parts, so that when the BG scroll registers pan the screen around, there aren't any visual artefacts.

For starters: do you understand this? If not, then start with trying to understand it. I recommend referring to the developers manual, specifically pages 2-27-4 (describing $2107/2109/2109/210A), combined with page A-10 (SC data format, but also gives a visual of how it all works), and A-21. Don't bother with page A-22, as that's for modes 5 and 6, which will confuse the hell out of you.

If you do understand it: great, then your question is truly "how do I write the code that does all of this?"

A big part of the question is exactly what kind of game you're wanting. Your screenshots seem to imply you want an overhead world RPG-esque type of thing, like Zelda 3. With that game, there are "areas" that have limited size (which span several screens that pan/update smoothly). But if you pay close attention, you'll see that the world is not 100% seamless -- you reach "edges" of an area. The areas themselves are big (several screens in size), but there are still "edges". That's because there's limited system resources in general (RAM, ROM, whatever) and technical limitations; the "edges" are essentially points where new data is loaded and thus a new "area" becomes available. Even "huge" world games like Super Metroid work this way too.

Quite often there's an intermediary format of the data used, i.e. a unique data format that you yourself design that represents all the "stuff" that makes up an area -- not just raw data you can DMA to PPU RAM. Your code has to load/translate that data into SC data, probably reading it from ROM and storing the DMA-able results in RAM (banks $7E/7F somewhere), then DMAing portions as needed. You should not try to DMA entire screens of SC data every frame -- there isn't enough time to do this. You need to use $2115 to change the PPU RAM increment value into something that can work with columns (e.g. 32) when dealing with panning left/right, or with rows (e.g. 1) when panning up/down.

Essentially you need to track everything that's being done -- everything. Welcome to how complicated video games actually are.

Personal note: I think I spent months writing down code on paper trying to figure out how exactly to go about implementing such a thing. Pretty much every single game does it, so it's not impossible, it's just complicated to think about (IMO).

The overall "strategy" is used identically on the NES as well (though on the NES, unlike the SNES, scrolling both directions is a bit tricky, which is why most games you'd see only pan left-to-right or top-to-bottom and not both simultaneously. This has to do with the limited RAM on the NES), so anyone here who has done it should be able to help talk about the method/model and the overall thought process/implementation would apply to the SNES too. I'll post something in the NES-oriented boards asking folks to look at this thread + describe the models they use (edit: posted as promised).

I would suggest start with something simple: don't worry about the intermediary data format and what not yet. Start with multiple screens of raw SC data in ROM. Try to figure out a routine that can update rows and columns of screen data, DMAing those relevant portions from ROM into PPU RAM, while letting you pan the background around up/down/left/right using the joypad.
Last edited by koitsu on Sat Feb 23, 2019 6:56 pm, edited 1 time in total.

Posts: 29
Joined: Tue Oct 24, 2017 11:07 pm

Re: Need some tips for a demo

Post by ndiddy » Sat Feb 23, 2019 6:49 pm

What I did may not be the best way or the most efficient way, but it's pretty simple. I have a pointer in zero page called "scrollScreenAddr" that gets initialized to the first "screen" (32x14 tile tilemap)'s location in ROM. I also keep track of the leftmost column onscreen (this is essentially the scroll x position shifted right 4 bits). If the current column is greater than the last column copied, I add 1 to it, AND it by #$1f (31 in decimal) and then store the result in the column value. If the column value is 0, I know I'm on a screen boundary and add the size of one screen (in my case #$380 bytes) to the pointer. The same is done if the current column is less than the last column, you just subtract 1 and subtract the size of a screen if you're on a boundary. Next, I copy the tilemap at that column into a buffer in WRAM (useful for collision detection, etc.) and then copy the same column from WRAM to VRAM during vblank (you could also write the tilemap values into a queue during the WRAM transfer and then DMA them if you're running out of vblank time).

I'm not planning 4 way scrolling for my game, so this method may not be ideal for that. Here's the source if my explanation wasn't sufficiently understandable.

Posts: 22288
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)

Re: Need some tips for a demo

Post by tepples » Sat Feb 23, 2019 8:11 pm

It looks like you're trying to figure out how to push the data at the seam, represented by paw blocks in this animation

Tilemap contents during first four screens of a level in Nova the Squirrel

Assuming you don't also scroll vertically: What you can do is make a 64-byte (32-entry) buffer in WRAM, copy tile map indices out of ROM into the buffer, and then DMA that buffer with VMAIN set to +32 words after $2119 write ($2115 = $81). Then during vblank, set the destination address ($2116-$2117) to the top of that tilemap column, and start a 64-byte DMA in ascending VRAM to alternating [$2100+x, $2101+x] I/O mode ($4300 = $01) from that buffer to $2118. This procedure doesn't change much whether your map is in ROM or WRAM, whether it's compressed with RLE, metatiles, or objects, or whatever.

If you also scroll vertically (8 ways), you'll need to make a second 128-byte DMA buffer for vertical updates and make two 64-byte DMA copies in vblank, one for the first (left) half of the tilemap and one for the second (right) half. You'll also need to be prepared to fill both the horizontal and vertical update buffers not from the beginning but from the middle, based on the scroll position.

Posts: 1080
Joined: Tue Feb 07, 2017 2:03 am

Re: Need some tips for a demo

Post by Oziphantom » Sun Feb 24, 2019 3:20 am

as Koitsu says, there is no one way and one tends to make their own.

However the NES ways are needlessly complex for the SNES and one doesn't have to use such "on the edge" methods.

I suggest you get Left and right working, then up and down then 8 ways. The secret to solving as complex task is to break it down into smaller simpler tasks.

the SNES lets you have 2x wide screens so 64x64. You set a windows within this range, however the trick is the window wraps ( see Tepple's gif for an example of this, is kind of works the same way on the SNES ). So as you scroll Left and Right you need two indices. Left Visible and Right Visible. If you choose to do visible, or next visible is up to you and you will need to adjust your maths accordingly. Sorting out this mess if a "rite of passage".

So as you move around you update your (next)visible +/- 1, and I suggest you make a simple looping 2 screen wide map to make sure this works and you understand it.

to make the map larger than, 2 screens wide, you need to put new data in. Since the SNES has so much spare room, you don't need to do it just in time, you can do for example + 4 or + 8 ahead of the "current visible". So when you move + 1 you then draw a column of "new" data at say +4 from the right edge, -1 you draw a "new" column at -4 from the left edge. Its like a train throwing down track in front of it so it always has track to move on. Remember though the "wrap" the values at 64, so this is
lda RightEdge
ADC #4
AND #$3F
; draw on this column

How you work out what is new Data is another problem. For now you just make a Map that has the raw tile data 1x1.

Once you have Left and Right working. Do up and Down. Same idea only now you need a top visible and a bottom visible index, and you update it basically the same.

8 way is now you move the top and bottom, left and right and then draw a new row and a new column.

Now that you have that working, you can step back and make a new piece of the puzzle. Feeding the "new content", you can make "blocks", which are some arrangement of 1x1 tiles, 2x2, 4x4 are the popular sizes however if 2x8 is work works best then so be it. The beauty of the SNES is you don't need to cache this at all, and as long as your blocks are <16 wide/height you can just decode them directly into VRAM. So if you have 4x4 blocks, you can draw 4 columns of data into the VRAM at your visible + ahead, and then not draw anything new for 4 updates. Then draw a new set, the SNES is really convenient in this way.

User avatar
Posts: 12003
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: Need some tips for a demo

Post by tokumaru » Sun Feb 24, 2019 4:25 am

To master scrolling, you have to understand that there are 2 separate spatial domains you're working with: map space and screen space. Scrolling basically consists in tracking the position of the camera in both spaces, and when a certain pixel boundary (normally dictated by the size of your metatiles) is crossed, you check the direction of the movement to tell which edge of the screen needs updating, so you can calculate the source address (in the map) and the target address (in VRAM) based on the reference positions of the camera in each space.

Finally, all you need to do is read a row or column of blocks from source address and buffer it, so that during vblank you can copy that buffered data to the target address in VRAM.

Here are a few tips on how to accomplish the above steps:

1- IIRC, tilemaps on the SNES are power-of-two-sized (e.g. 64x64 tiles, as opposed to 64x60 on the NES), so you might get away with using a single set of camera coordinates, by using the highest bits only when calculating map addresses, and ignoring them when calculating VRAM addresses.

2- Detecting the exact moment when a new row/column of blocks is needed consists in detecting when the camera crosses a certain pixel boundary, usually 16 pixels, but other numbers may be more convenient depending on how your map is encoded. To do this, you save the old camera coordinates before changing them, and then compare the old values with the new ones, and if a certain bit is different, that means a boundary was crossed. For example:

Code: Select all

  lda OldCameraX
  eor CameraX
  and #16
  bne Update column
EOR combines 2 values so that bits that are different become 1s, and bits that are equal become 0s, so the result of an EOR operation is basically a mask indicating which bits don't match in the 2 values. With that information, you just isolate the bit of interest (in this case, bit 4, because we're working with 16x16-pixel blocks) with an AND operation. Do this for both the X and Y axis to decide when a new column and/or row of blocks is necessary.

3- detecting the direction of the movement should be easy if you keep the camera displacements stored in their own variables: negative values mean left/up, positive values mean right/down. Say that the camera crossed a boundary in the Y axis, and you need to calculate the coordinates of a new row of blocks. Rows always start at the left edge of the screen, so the X coordinate is just CameraX, pure and simple. The Y coordinate though, can be either CameraY, if moving up, or CameraY+Screen height if moving down. IIRC, the screen height on the SNES is 224 (it's 240 on the NES). Do the same for scrolling on the X axis, where columns of blocks always start at the top of the screen (CameraY), but the X coordinate can be either the left edge (CameraX) or the right edge (CameraX+ScreenWidth).

4- Once you have the coordinates of the new row/column, you just need to shuffle/combine the bits to calculate the source address and the target address. Since I don't know how your maps are stored in ROM and I'm fairly illiterate on SNES VRAM layout, I can't give you any exact formulas. What I can tell you is that the basic formula for converting 2D coordinates into memory offsets is Y * Width + X. If the map in ROM is not compressed in any way, that's the exact formula you'd use. As for the VRAM address, remember to clip the coordinates to the limits of the tilemap dimensions before any calculations.

Posts: 8
Joined: Thu Feb 14, 2019 2:25 pm

Re: Need some tips for a demo

Post by vivi168 » Sun Feb 24, 2019 9:49 am

It looks like you're trying to figure out how to push the data at the seam, represented by paw blocks in this animation
Yes, that's exactly it.

I know how to write a new row of tile where I want to, but what I'm trying to figure out, is how to write a column. Technically, I think I could already implement vertical scrolling with a loading seam, but what I want to figure out, is how to do the same horizontally.

Thank to all of you replies, I now know it has to do with the value stored in the the $2115 (VMAIN) registers. Right now I'm experimenting with the value $81, to increment next VRAM destination address by 32 after each write to $2119, and so, write in a "column" fashion.

The problem is, I should also increment the source address by 32, to skip the current row remaining tiles. Is there a way to do this during DMA?

Posts: 8
Joined: Thu Feb 14, 2019 2:25 pm

Re: Dealing with large worlds/maps, screen panning

Post by vivi168 » Sun Feb 24, 2019 10:18 am

Thank you Koitsu for asking for help on my behalf :)
Your last two paragraphs in this post start to touch base on that, and is super helpful.
Yes, it was super helpful :) (and as for the gif, at first I wanted to link that same exact one, but decided to make my own picture)

I posted a reply to the other topic, in which I state that the main thing I'm trying to figure out is how to write a column.
In case of a 32x32 map, setting VMAIN (2115) to $81 helps me in that regard by incrementing the next destination address by 32 after each write.
The next thing I'm trying to figure out is how to increment the source address by 32 after each write.
I could theoritically do that in a loop, and make multiple DMA write (each the size of the colum width), but I'm sure there is a better way (maybe a DMA parameter?)

Posts: 1327
Joined: Fri Feb 24, 2012 12:09 pm

Re: Dealing with large worlds/maps, screen panning

Post by nocash » Sun Feb 24, 2019 12:25 pm

No, there is no way for reading in steps of 32 with DMA. But many ways with or without DMA...
Rotate your map in rom, so you can read in steps of 1.
Or copy or decrompress map from rom to ram via CPU, then do DMA from ram to vram during vblank.
Or update horizontal rows instead vertical columns (no visible difference if it's offscreen).
Or update only 4 tiles per 1 pixel scroll step, instead of 32 tiles after each 8th pixel step.

Using DMA is needed only if you want to update many (other) things in vram, and run short of enough vblank time. If that's the case, then it makes sense to prepare data in ROM or RAM in a DMA-friendly format. If you just want update a few map entries then you may get away with slower CPU transfers.

User avatar
Posts: 4218
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: Need some tips for a demo

Post by koitsu » Sun Feb 24, 2019 1:00 pm

nocash has the answer for that (I've asked tepples to try and move those posts into this thread so everything is kept in one place): ... 72#p234972

The answer is no, you can't control the "source increment" while DMAing to PPU RAM; it always reads in increments of 1, because that's the nature DMA (I don't know of any DMA systems that let you control that, but I suspect the one on the PS2 probably does -- it's DMA implementation is crazy).

$2115 just controls how to increment the PPU RAM address when writing to $2118/2119 (or $2139/213A if reading from PPU RAM, if you ever had some reason to do that).

You don't *have* to use DMA for these updates/writes, of course! It may make more sense to use DMA just for horizontal panning situations, and to do the $2118/2119 writes yourself natively for vertical panning situations -- or you can just make sure your the data you're DMAing is in RAM/WRAM (i.e. you write it to WRAM yourself, then you do the DMA where the source address is in WRAM). There's no "universal standard" in what method/approach you can take; it doesn't take *that* much CPU time to write the data to $2118/2119 yourself in either case, because amount of data you're transferring is not particularly large (when just doing 1 or 2 rows or columns of SC data).

Post Reply