It is currently Mon Jul 22, 2019 5:58 pm

All times are UTC - 7 hours



Forum rules





Post new topic Reply to topic  [ 38 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Thu Feb 14, 2019 2:48 pm 
Offline

Joined: Thu Feb 14, 2019 2:25 pm
Posts: 8
Hello,

So I've been working on SNES demo for a while, and it's working somewhat well right now. I can move the sprite from one border to the other, it's animated, and the background is scrolling.

Now, I don't know if anything I've done is best practice (and I think it certainly isn't). Maybe someone more experienced can give me some tips/make a code review?

My main questions are related to addressing.

Currently, what I'm doing is defining RAM addresses for global variables like this:
Code:
PLAYER_SX   = $0001
BGH_SCRL    = $0002
BGH_SCRH    = $0003

It's rather painful and ugly. I saw you can do it like that instead:
But I can't get it to work. I think maybe it has to do with bank switching? And for the case of of two bytes label, how can I access the high byte?
Code:
.segment "ZEROPAGE"
PLAYER_SX: .res 1
BGH_SCR: .res 2

Also, for the assets, I have for example: (I manually declare the asset size and its ROM location.)
Isn't there a way to find out the same information automatically? The start address, the bank and the size?
Code:
.segment "DATA"
.incbin "assets/background.png.vra"

BG_START = $8000
BG_SIZE = $800

; loading example, I've defined a macro
.segment "CODE"
; VRAM start, bank, asset start, asset size
transfer_vram #$0000, #$02, #BG_START, #BG_SIZE


-----

You can find the complete source code here:

https://github.com/vivi168/SNES_utils

Thank you for your help :)


Top
 Profile  
 
PostPosted: Thu Feb 14, 2019 3:33 pm 
Online
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4156
Location: A world gone mad
Note for readers: individual is using ca65/ld65 (I looked at the make.sh script). Note for OP: please be sure to tell people what assembler you're using next time. :-)

IMO, there is nothing wrong with what you're doing with regards to managing memory (read: large numbers of variable equates for what goes where in direct page). I am one of the few here on this forum who doesn't have a problem with that model; it's universally understood and makes debugging a bit easier.

What I'm wondering is: why do they bother you when you say "it's rather painful and ugly". Maybe if you could explain what you mean by that, and what you think wouldn't be painful/ugly, it would help?

However, ca65/ld65 offer some "better ways" of managing your memory topology. I will let others talk about ways that work well for them, as everyone has their own model/approach.

As for your question about using .res in the ZEROPAGE segment and how with BGH_SCR .res 2 you'd access the upper byte of BGH_SCR: BGH_SCR+1 would get you what you want, e.g. sep #$20 / lda BGH_SCR gets you the lower byte of the value, while lda BGH_SCR+1 gets you the upper byte. If using a 16-bit read, rep #$20 / lda BGH_SCR will get you the full 16-bit value.

All this is doing is telling the assembler to add one to the calculated address of BGH_SCR. That method works the exact same way if you were using equates like so:

Code:
PLAYER_SX = $0000
BGH_SCR   = $0001
OTHERVAR  = $0003

Note that $0002 was skipped -- that's because that can be used for the upper byte of BGH_SCR. The way this was done historically (and still works fine today) was to use code comments and/or additions in the equates themselves. In fact, even in other PLs like Forth, this is how you keep track of the stack or similar operations. There's nothing wrong with it. Examples:

Code:
PLAYER_SX = $0000   ; 1 byte
BGH_SCR   = $0001   ; 2 bytes
OTHERVAR  = $0003   ; 1 byte
OTHERVAR2 = $0004   ; 1 byte

Code:
PLAYER_SX = $0000         ; 1 byte (address $0000)
BGH_SCR   = PLAYER_SX+1   ; 2 bytes (addresses $0001 and $0002)
OTHERVAR  = BGH_SCR+2     ; 1 byte (address $0003)
OTHERVAR2 = OTHERVAR+1    ; 1 byte (address $0004)

If you find the .res method easier/better for you, go for it! Do whatever works best for you. Just keep in mind that when moving from one method to another, you may want to enable generation of code listings using ca65's -l {filename} flag (that's lowercase-ELL) and compare before vs. after, to make sure all of your assembled results are identical to what they were before.

As for determining "asset size" (length) dynamically: others can help with this. I forget how its done in ca65.

Finally: do not forget that ca65 does not work so well with the 65816's direct page model (vs. the 6502's ZP model). As such, I suggest keeping D=$0000 at all times to try and alleviate any kind of pain or confusion. See this thread, and the referenced links/threads in the initial post, if you don't know what this means.


Top
 Profile  
 
PostPosted: Thu Feb 14, 2019 7:46 pm 
Offline
Formerly Espozo
User avatar

Joined: Mon Sep 15, 2014 4:35 pm
Posts: 3501
Location: Richmond, Virginia
If I'm not mistaken, wasn't I still using WLA DX at that time when that error occurred? I don't think I've ever run into any problems using direct page with ca65. It can just be hard to give up the ability to relocate it; without moving direct page, you cannot index more than one thing when going through your object code because you'll be forced to use either x or y. You also already mentioned about using it for writing to MMIO registers in time critical situations (during vblank).


Top
 Profile  
 
PostPosted: Thu Feb 14, 2019 8:50 pm 
Offline
Site Admin
User avatar

Joined: Mon Sep 20, 2004 6:04 am
Posts: 3697
Location: Indianapolis
vivi168 wrote:
Also, for the assets, I have for example: (I manually declare the asset size and its ROM location.)
Isn't there a way to find out the same information automatically? The start address, the bank and the size?


Here's how to get that info. The size isn't given automatically, but can be calculated by subtracting the end address from the start address.

Code:
bg_start:
 .incbin "background.bin"
bg_end:

lda #<bg_start ; lower address byte
lda #>bg_start ; upper address byte
lda #^bg_start ; bank byte
bg_size = bg_end - bg_start


Top
 Profile  
 
PostPosted: Fri Feb 15, 2019 11:35 am 
Offline

Joined: Thu Feb 14, 2019 2:25 pm
Posts: 8
Thanks everyone for your answers, sorry for not mentioning which assembler I was using.

Quote:
What I'm wondering is: why do they bother you when you say "it's rather painful and ugly". Maybe if you could explain what you mean by that, and what you think wouldn't be painful/ugly, it would help?

It's mainly when adding new locations. If I want to keep a certain order in the file, I need to modify each address manually. (For example adding a location in between two others). And also it would prevent errors (eg: two locations overlap each other in the ram because I forgot to bump the next address in the list).

I will try implementing each of your advices.

I have one more question, concerning background scrolling. Currently, if the player is far away from the map border, I lock its sprite in the middle of the screen and scroll the background instead. What I do is, store previous background scroll offset like so:
Code:
ldx BGH_SCRL
dex
dex
stx BGH_SCRL

and during the NMI interrupt, I apply the scroll offset like so:
Code:
lda BGH_SCRL
sta BG1HOFS
lda BGH_SCRH
sta BG1HOFS

Is this the correct way to go about BG scrolling? Because I noticed a small flicker when the background is scrolling.

Edit: Maybe it’s normal?, I noticed the same thing in FF6 intro cinematic, when the background is scrolling

Also, if I wanted the map(level) to be composed of multiple tilemap, what would be a good strategy to "append" the next tilemap?


Top
 Profile  
 
PostPosted: Sat Feb 23, 2019 3:30 pm 
Offline

Joined: Thu Feb 14, 2019 2:25 pm
Posts: 8
I still can't wrap my head around backgroud scrolling.

I've observed how some games work with bsnes debugger and memory editor.

What I've observed is, when the character moves, the BG scroll AND portions of the tilemap in the vram gets replaced with some other portion of tilemaps from the rom. I can't find a way to replicate this.

For example, if I have a level composed of two tilemaps contigous in the rom. What strategy can I employ to take the last 30 (8pixels wide) columns of the first tilemap and the first 2 columns of the second tilemap and store it in the vram?

(Or maybe store each tilemaps for the current level in the VRAM, and assemble from there? instead of DMAing each frame from the ROM)

Does anybody know of a simple way to achieve this?

Here is a picture to illustrate my question :
Image


Top
 Profile  
 
PostPosted: Sat Feb 23, 2019 6:34 pm 
Online
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4156
Location: A world gone mad
The reason you're not getting any answers is because this is one of the things every game programmer has to solve in their own way, re: the "strategy", and there is no simple way to describe how to do it. It also varies heavily on the type of game you want to do and what you actually want the experience to be. What you're asking about is one of the key parts to an actual game engine.

This is one of the things I struggled with for literally years when starting out doing SNES stuff in the very early 90s. Conceptually it's simple -- "I just need to make sure the sides of the screen (columns of tiles) get updated before I pan the screen left/right" (or for scrolling vertically, updating rows of tiles) -- except actually implementing the "how do I correlate my map with the actual tiles on screen" part is not easy at all. This is further compounded by technical complexities such as the SNES's graphics modes (mode 1 is pretty common, but it all depends on what you want, maybe you'd prefer mode 2 or 3), and then later, intermediary data formats (that make up your "virtual world") and having to convert from that to SC data, handle sprites, collision, layering, etc..

There is not enough PPU RAM (VRAM) for "an entire world" -- there is only enough for essentially 2 horizontal screens and 2 vertical screens of tilemaps (SC data) on a per-background basis (for modes 0, 1, 2, 3, and 4). 2x2 is what you get when you use a screen size of 3 (e.g. bits 1 and 0 of $2107/2108/2019/210A are both set).

The BG scroll MMIO registers essentially let you "pan around" this PPU RAM, but you yourself have to code the routine that updates the tilemap (SC data), the system does not do it itself. Thus, you yourself end up having to write a bunch of 65816 that updates the non-visible parts, so that when the BG scroll registers pan the screen around, there aren't any visual artefacts.

For starters: do you understand this? If not, then start with trying to understand it. I recommend referring to the developers manual, specifically pages 2-27-4 (describing $2107/2109/2109/210A), combined with page A-10 (SC data format, but also gives a visual of how it all works), and A-21. Don't bother with page A-22, as that's for modes 5 and 6, which will confuse the hell out of you.

If you do understand it: great, then your question is truly "how do I write the code that does all of this?"

A big part of the question is exactly what kind of game you're wanting. Your screenshots seem to imply you want an overhead world RPG-esque type of thing, like Zelda 3. With that game, there are "areas" that have limited size (which span several screens that pan/update smoothly). But if you pay close attention, you'll see that the world is not 100% seamless -- you reach "edges" of an area. The areas themselves are big (several screens in size), but there are still "edges". That's because there's limited system resources in general (RAM, ROM, whatever) and technical limitations; the "edges" are essentially points where new data is loaded and thus a new "area" becomes available. Even "huge" world games like Super Metroid work this way too.

Quite often there's an intermediary format of the data used, i.e. a unique data format that you yourself design that represents all the "stuff" that makes up an area -- not just raw data you can DMA to PPU RAM. Your code has to load/translate that data into SC data, probably reading it from ROM and storing the DMA-able results in RAM (banks $7E/7F somewhere), then DMAing portions as needed. You should not try to DMA entire screens of SC data every frame -- there isn't enough time to do this. You need to use $2115 to change the PPU RAM increment value into something that can work with columns (e.g. 32) when dealing with panning left/right, or with rows (e.g. 1) when panning up/down.

Essentially you need to track everything that's being done -- everything. Welcome to how complicated video games actually are.

Personal note: I think I spent months writing down code on paper trying to figure out how exactly to go about implementing such a thing. Pretty much every single game does it, so it's not impossible, it's just complicated to think about (IMO).

The overall "strategy" is used identically on the NES as well (though on the NES, unlike the SNES, scrolling both directions is a bit tricky, which is why most games you'd see only pan left-to-right or top-to-bottom and not both simultaneously. This has to do with the limited RAM on the NES), so anyone here who has done it should be able to help talk about the method/model and the overall thought process/implementation would apply to the SNES too. I'll post something in the NES-oriented boards asking folks to look at this thread + describe the models they use (edit: posted as promised).

I would suggest start with something simple: don't worry about the intermediary data format and what not yet. Start with multiple screens of raw SC data in ROM. Try to figure out a routine that can update rows and columns of screen data, DMAing those relevant portions from ROM into PPU RAM, while letting you pan the background around up/down/left/right using the joypad.


Last edited by koitsu on Sat Feb 23, 2019 6:56 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Sat Feb 23, 2019 6:49 pm 
Offline

Joined: Tue Oct 24, 2017 11:07 pm
Posts: 26
What I did may not be the best way or the most efficient way, but it's pretty simple. I have a pointer in zero page called "scrollScreenAddr" that gets initialized to the first "screen" (32x14 tile tilemap)'s location in ROM. I also keep track of the leftmost column onscreen (this is essentially the scroll x position shifted right 4 bits). If the current column is greater than the last column copied, I add 1 to it, AND it by #$1f (31 in decimal) and then store the result in the column value. If the column value is 0, I know I'm on a screen boundary and add the size of one screen (in my case #$380 bytes) to the pointer. The same is done if the current column is less than the last column, you just subtract 1 and subtract the size of a screen if you're on a boundary. Next, I copy the tilemap at that column into a buffer in WRAM (useful for collision detection, etc.) and then copy the same column from WRAM to VRAM during vblank (you could also write the tilemap values into a queue during the WRAM transfer and then DMA them if you're running out of vblank time).

I'm not planning 4 way scrolling for my game, so this method may not be ideal for that. Here's the source if my explanation wasn't sufficiently understandable.


Top
 Profile  
 
PostPosted: Sat Feb 23, 2019 8:11 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 21510
Location: NE Indiana, USA (NTSC)
It looks like you're trying to figure out how to push the data at the seam, represented by paw blocks in this animation

Image
Tilemap contents during first four screens of a level in Nova the Squirrel


Assuming you don't also scroll vertically: What you can do is make a 64-byte (32-entry) buffer in WRAM, copy tile map indices out of ROM into the buffer, and then DMA that buffer with VMAIN set to +32 words after $2119 write ($2115 = $81). Then during vblank, set the destination address ($2116-$2117) to the top of that tilemap column, and start a 64-byte DMA in ascending VRAM to alternating [$2100+x, $2101+x] I/O mode ($4300 = $01) from that buffer to $2118. This procedure doesn't change much whether your map is in ROM or WRAM, whether it's compressed with RLE, metatiles, or objects, or whatever.

If you also scroll vertically (8 ways), you'll need to make a second 128-byte DMA buffer for vertical updates and make two 64-byte DMA copies in vblank, one for the first (left) half of the tilemap and one for the second (right) half. You'll also need to be prepared to fill both the horizontal and vertical update buffers not from the beginning but from the middle, based on the scroll position.

_________________
Pin Eight | Twitter | GitHub | Patreon


Top
 Profile  
 
PostPosted: Sun Feb 24, 2019 3:20 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 733
as Koitsu says, there is no one way and one tends to make their own.

However the NES ways are needlessly complex for the SNES and one doesn't have to use such "on the edge" methods.

I suggest you get Left and right working, then up and down then 8 ways. The secret to solving as complex task is to break it down into smaller simpler tasks.

the SNES lets you have 2x wide screens so 64x64. You set a windows within this range, however the trick is the window wraps ( see Tepple's gif for an example of this, is kind of works the same way on the SNES ). So as you scroll Left and Right you need two indices. Left Visible and Right Visible. If you choose to do visible, or next visible is up to you and you will need to adjust your maths accordingly. Sorting out this mess if a "rite of passage".

So as you move around you update your (next)visible +/- 1, and I suggest you make a simple looping 2 screen wide map to make sure this works and you understand it.

to make the map larger than, 2 screens wide, you need to put new data in. Since the SNES has so much spare room, you don't need to do it just in time, you can do for example + 4 or + 8 ahead of the "current visible". So when you move + 1 you then draw a column of "new" data at say +4 from the right edge, -1 you draw a "new" column at -4 from the left edge. Its like a train throwing down track in front of it so it always has track to move on. Remember though the "wrap" the values at 64, so this is
lda RightEdge
clc
ADC #4
AND #$3F
; draw on this column

How you work out what is new Data is another problem. For now you just make a Map that has the raw tile data 1x1.

Once you have Left and Right working. Do up and Down. Same idea only now you need a top visible and a bottom visible index, and you update it basically the same.

8 way is now you move the top and bottom, left and right and then draw a new row and a new column.

Now that you have that working, you can step back and make a new piece of the puzzle. Feeding the "new content", you can make "blocks", which are some arrangement of 1x1 tiles, 2x2, 4x4 are the popular sizes however if 2x8 is work works best then so be it. The beauty of the SNES is you don't need to cache this at all, and as long as your blocks are <16 wide/height you can just decode them directly into VRAM. So if you have 4x4 blocks, you can draw 4 columns of data into the VRAM at your visible + ahead, and then not draw anything new for 4 updates. Then draw a new set, the SNES is really convenient in this way.


Top
 Profile  
 
PostPosted: Sun Feb 24, 2019 4:25 am 
Online
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 11376
Location: Rio de Janeiro - Brazil
To master scrolling, you have to understand that there are 2 separate spatial domains you're working with: map space and screen space. Scrolling basically consists in tracking the position of the camera in both spaces, and when a certain pixel boundary (normally dictated by the size of your metatiles) is crossed, you check the direction of the movement to tell which edge of the screen needs updating, so you can calculate the source address (in the map) and the target address (in VRAM) based on the reference positions of the camera in each space.

Finally, all you need to do is read a row or column of blocks from source address and buffer it, so that during vblank you can copy that buffered data to the target address in VRAM.

Here are a few tips on how to accomplish the above steps:

1- IIRC, tilemaps on the SNES are power-of-two-sized (e.g. 64x64 tiles, as opposed to 64x60 on the NES), so you might get away with using a single set of camera coordinates, by using the highest bits only when calculating map addresses, and ignoring them when calculating VRAM addresses.

2- Detecting the exact moment when a new row/column of blocks is needed consists in detecting when the camera crosses a certain pixel boundary, usually 16 pixels, but other numbers may be more convenient depending on how your map is encoded. To do this, you save the old camera coordinates before changing them, and then compare the old values with the new ones, and if a certain bit is different, that means a boundary was crossed. For example:

Code:
  lda OldCameraX
  eor CameraX
  and #16
  bne Update column

EOR combines 2 values so that bits that are different become 1s, and bits that are equal become 0s, so the result of an EOR operation is basically a mask indicating which bits don't match in the 2 values. With that information, you just isolate the bit of interest (in this case, bit 4, because we're working with 16x16-pixel blocks) with an AND operation. Do this for both the X and Y axis to decide when a new column and/or row of blocks is necessary.

3- detecting the direction of the movement should be easy if you keep the camera displacements stored in their own variables: negative values mean left/up, positive values mean right/down. Say that the camera crossed a boundary in the Y axis, and you need to calculate the coordinates of a new row of blocks. Rows always start at the left edge of the screen, so the X coordinate is just CameraX, pure and simple. The Y coordinate though, can be either CameraY, if moving up, or CameraY+Screen height if moving down. IIRC, the screen height on the SNES is 224 (it's 240 on the NES). Do the same for scrolling on the X axis, where columns of blocks always start at the top of the screen (CameraY), but the X coordinate can be either the left edge (CameraX) or the right edge (CameraX+ScreenWidth).

4- Once you have the coordinates of the new row/column, you just need to shuffle/combine the bits to calculate the source address and the target address. Since I don't know how your maps are stored in ROM and I'm fairly illiterate on SNES VRAM layout, I can't give you any exact formulas. What I can tell you is that the basic formula for converting 2D coordinates into memory offsets is Y * Width + X. If the map in ROM is not compressed in any way, that's the exact formula you'd use. As for the VRAM address, remember to clip the coordinates to the limits of the tilemap dimensions before any calculations.


Top
 Profile  
 
PostPosted: Sun Feb 24, 2019 9:49 am 
Offline

Joined: Thu Feb 14, 2019 2:25 pm
Posts: 8
Quote:
It looks like you're trying to figure out how to push the data at the seam, represented by paw blocks in this animation


Yes, that's exactly it.

I know how to write a new row of tile where I want to, but what I'm trying to figure out, is how to write a column. Technically, I think I could already implement vertical scrolling with a loading seam, but what I want to figure out, is how to do the same horizontally.

Thank to all of you replies, I now know it has to do with the value stored in the the $2115 (VMAIN) registers. Right now I'm experimenting with the value $81, to increment next VRAM destination address by 32 after each write to $2119, and so, write in a "column" fashion.

The problem is, I should also increment the source address by 32, to skip the current row remaining tiles. Is there a way to do this during DMA?


Top
 Profile  
 
PostPosted: Sun Feb 24, 2019 10:18 am 
Offline

Joined: Thu Feb 14, 2019 2:25 pm
Posts: 8
Thank you Koitsu for asking for help on my behalf :)

Quote:
Your last two paragraphs in this post start to touch base on that, and is super helpful.


Yes, it was super helpful :) (and as for the gif, at first I wanted to link that same exact one, but decided to make my own picture)

I posted a reply to the other topic, in which I state that the main thing I'm trying to figure out is how to write a column.
In case of a 32x32 map, setting VMAIN (2115) to $81 helps me in that regard by incrementing the next destination address by 32 after each write.
The next thing I'm trying to figure out is how to increment the source address by 32 after each write.
I could theoritically do that in a loop, and make multiple DMA write (each the size of the colum width), but I'm sure there is a better way (maybe a DMA parameter?)


Top
 Profile  
 
PostPosted: Sun Feb 24, 2019 12:25 pm 
Offline

Joined: Fri Feb 24, 2012 12:09 pm
Posts: 941
No, there is no way for reading in steps of 32 with DMA. But many ways with or without DMA...
Rotate your map in rom, so you can read in steps of 1.
Or copy or decrompress map from rom to ram via CPU, then do DMA from ram to vram during vblank.
Or update horizontal rows instead vertical columns (no visible difference if it's offscreen).
Or update only 4 tiles per 1 pixel scroll step, instead of 32 tiles after each 8th pixel step.

Using DMA is needed only if you want to update many (other) things in vram, and run short of enough vblank time. If that's the case, then it makes sense to prepare data in ROM or RAM in a DMA-friendly format. If you just want update a few map entries then you may get away with slower CPU transfers.


Top
 Profile  
 
PostPosted: Sun Feb 24, 2019 1:00 pm 
Online
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 4156
Location: A world gone mad
nocash has the answer for that (I've asked tepples to try and move those posts into this thread so everything is kept in one place): viewtopic.php?p=234972#p234972

The answer is no, you can't control the "source increment" while DMAing to PPU RAM; it always reads in increments of 1, because that's the nature DMA (I don't know of any DMA systems that let you control that, but I suspect the one on the PS2 probably does -- it's DMA implementation is crazy).

$2115 just controls how to increment the PPU RAM address when writing to $2118/2119 (or $2139/213A if reading from PPU RAM, if you ever had some reason to do that).

You don't *have* to use DMA for these updates/writes, of course! It may make more sense to use DMA just for horizontal panning situations, and to do the $2118/2119 writes yourself natively for vertical panning situations -- or you can just make sure your the data you're DMAing is in RAM/WRAM (i.e. you write it to WRAM yourself, then you do the DMA where the source address is in WRAM). There's no "universal standard" in what method/approach you can take; it doesn't take *that* much CPU time to write the data to $2118/2119 yourself in either case, because amount of data you're transferring is not particularly large (when just doing 1 or 2 rows or columns of SC data).


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 38 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Bing [Bot], tokumaru and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group