16bit table indexing problem

Discussion of hardware and software development for Super NES and Super Famicom. See the SNESdev wiki for more information.

Moderator: Moderators

Forum rules
  • For making cartridges of your Super NES games, see Reproduction.
Post Reply
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: 16bit table indexing problem

Post by koitsu »

nicklausw wrote:Update: Decided to download the newest sources and try out a 32-bit distribution of cygwin to see if it could compile it. It did, and I'd like you to test these binaries out.
I can verify the binaries work on a 32-bit XP box (and I personally have no issues with having cygwin1.dll included with it, but absolutely understand Tepple's question/point too). Thanks a ton!

@Espozo, can you please give me a copy of your current source code (and tell me which filename I should be using for testing) so I can poke around and see if I can figure out this WLA DX bug? I want to see if .INCLUDE + listings are fixed (I've reviewed the commit history on github and I doubt it, but the commit messages are not always that great).
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: 16bit table indexing problem

Post by koitsu »

tepples wrote:Yes, a well-known synonym which I seem to remember came from WDC's datasheet. I use tad because it doesn't confuse "C" (BA) with "C" (bit 0 of P). Given the existence of xce (exchange carry with emulation); tcd looks like it could stand for "copy carry to decimal mode".
Maybe it has to do with age or era, but I was taught that on the 65816, the accumulator is represented by 3 separate letters depending on usage context: A represented the lower byte (of the 16-bit value), B represented the upper byte (of 16-bit value), and C represented the full 16-bit (of B+A). A, in 16-bit mode, could also refer to the same thing as C -- and mainly because it's compatible with the 65c02, otherwise they would have had to extend the opcode range from a single byte ($00-FF) to something larger, simply to have separate opcodes like lda vs. ldb vs. ldc. If you want proof of the A/B nomenclature, I point you to the xba and tcs/tsc opcodes. I believe the Eyes/Lichty/WDC book goes over this naming convention too. But as I've said before I spent more time doing 65816 than I did 6502/65c02, but the conventions made perfect sense to me.

tcd to me always meant "transfer (copy) C into D" where D = direct page (and its inverse, tdc), simply because there's no purpose I can think of for a person wanting to transfer the carry bit (c of P) into the decimal mode bit (d of P). Likewise, opcodes named cl* and se* tend to refer to bits of P, not registers.

So all that said: what does tsb make you think of? Test Stack pointer against B (accumulator)? Test Snakes against b of P (only applicable to emulation mode)? Same for trb. Same for stp (which I actually use semi-often).

I think our experiences/minds just differ here. I basically just remember what an opcode does and remember what opcode correlates with that task -- my brain is just a big lookup table/chart. That's just how my brain works. Maybe yours is different. For example, you know the clever little mnemonics used for memorisation of Perl variables? I find them ridiculous and confusing (the little mnemonics at the end of each definition). I just simply remember what variable does what, and if I can't remember, I refer to documentation.

Footnote: the more I review this WDC PDF, the more massive/major typos I find. For example, the opcode definition for TAY says TAX. *sigh* Maybe that's why they pulled the PDF from their site -- they have too many typos. I really need to spend the money and just get a hard copy of their manual from them. I'd really love a 3-ring-binder version. But the actual Eyes/Lichty book doesn't have any of these mistakes, so they're purely something WDC induced during the PDF conversion (possibly OCR mistakes -- and if so, then whoever did the proofreading should be fired. This is like the 6th or 7th mistake I've found).
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: 16bit table indexing problem

Post by Drew Sebastino »

Sure thing! Here it is:
MetaspriteDemo.rar
(1.04 MiB) Downloaded 655 times
koitsu wrote:(and tell me which filename I should be using for testing)
You're just talking about the main file for the demo that's not working, right? If so, it's MetaspriteTest2 (No names have been changed since the last couple of times I've posted.) Also, the new WLADX that nicklausw posted (thank you!) is in there and I assembled the file using it, but it still didn't work so I guess the bug is still present. Disgraceful! :P Also, I know I'm going to get hammered for asking this but, what is direct paging? :oops: Isn't it just 24 bit addressing? (Can't you access the entire SNES memory space with 24 bits and a single bank with 16?)

Edit: According to Jay's ASM tutorial on superfamicom.org, it says:
The direct page register is a pointer that points to a region within the first 64k of memory. This register is used to access memory in direct addressing modes. In direct addressing mode, a 8-bit value (0-255) is added to the direct page address, which will form an effective address.
Doesn't that mean you can only access half of the largest rom size available? (128k I think?)
tepples
Posts: 22705
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: 16bit table indexing problem

Post by tepples »

A "bank" is 65536 bytes. "Direct page" is the ability to relocate the 65816's counterpart to the 6502's zero page anywhere in bank $00. Direct page addressing modes end up behaving more like a frame pointer (like EBP on x86), even if 6502 fans set it to $0000 for familiarity.
93143
Posts: 1715
Joined: Fri Jul 04, 2014 9:31 pm

Re: 16bit table indexing problem

Post by 93143 »

Espozo wrote:
The direct page register is a pointer that points to a region within the first 64k of memory. This register is used to access memory in direct addressing modes. In direct addressing mode, a 8-bit value (0-255) is added to the direct page address, which will form an effective address.
Doesn't that mean you can only access half of the largest rom size available? (128k I think?)
Okay, first off, 128 kB is nowhere near the largest ROM size available (official games got as big as 6 MB, and I believe the largest maps offered by Nintendo were 8 MB, but neither of those represented a practical limit - there's a hack of Star Ocean that removes the S-DD1 dependency at the cost of doubling the game's size to 12 MB, and as the linked post says you can exceed that if you try). The CPU's work RAM is 128 kB, though... and it's true that bank $00 is what you might call a LoROM bank, in which you can typically only access half a bank worth of ROM regardless of memory mode because the bottom half is mirrored RAM and system registers and reserved areas... and it's also true that the gross size of the memory map is 128 Mbit, disregarding mirrors and such...

Second, yes, this means that you can only set up direct page to access the first bank. If you set D to a 16-bit value, then the direct page is the 256 bytes of memory starting at that absolute address in bank $00. If, for instance, D is $0000, the direct page is the first 256 bytes of WRAM (since the first 8 kB of WRAM are mirrored to the bottom of banks $00-$3F and $80-$BF). If D is $2100, the direct page corresponds to the B bus address range and can be used to access the PPU and such.

Direct page allows you to use 8-bit addressing, which saves a cycle when loading DP-addressed instructions. Unfortunately, if the pointer in D has a nonzero low byte, you lose a cycle again adding it to the instruction address, and it's no faster than just using absolute addresses. So if you're going to use direct page, it's best to keep D to values of $xx00 unless you're using it for reasons other than speed.
User avatar
nicklausw
Posts: 376
Joined: Sat Jan 03, 2015 5:58 pm
Location: ...
Contact:

Re: 16bit table indexing problem

Post by nicklausw »

tepples wrote: Is Cygwin needed, or would MinGW work as well? MinGW doesn't need quite as big of a C runtime DLL because it doesn't try to implement a huge swath of POSIX. Instead, it uses MSVC 6's DLL.
I suppose it could be used, but it didn't work well with the batch file given with the sources. Tomorrow I'll try out Cygwin's distribution of it.
User avatar
Drew Sebastino
Formerly Espozo
Posts: 3496
Joined: Mon Sep 15, 2014 4:35 pm
Location: Richmond, Virginia

Re: 16bit table indexing problem

Post by Drew Sebastino »

93143 wrote:Okay, first off, 128 kB is nowhere near the largest ROM size available
Oops... :oops: (I was thinking megabits, not kilobits.) I thought the uncompressed Star Ocean hack was 128 megabits, not 96. (12x8=96, obviously)

About the direct page and stuff, isn't the assembler (supposed to) deal with this stuff automatically? The only manual input I remember is when it told me that bra didn't work because the number was too high or something, so I switched to brl and it worked fine. (bra is 8 bit, while brl is 16 bit?)

Also, I guess it would probably be good to learn about direct paging and stuff? :P (You know, incase the assembler freaks out... :roll: )
93143
Posts: 1715
Joined: Fri Jul 04, 2014 9:31 pm

Re: 16bit table indexing problem

Post by 93143 »

Espozo wrote:About the direct page and stuff, isn't the assembler (supposed to) deal with this stuff automatically?
No, for two reasons. First, if the direct-page pointer D isn't zero, there will be a constant offset between the direct-page and absolute addressing modes; this is by design and should not be compensated for. Second, absolute addressing works within the bank selected by the data bank register, while direct-page always works in bank $00, so in the general case the data accessed won't be the same even if D is zero.

Suppose you had D set to $2100, for fast PPU access (not something the assembler can do automatically, BTW). Now suppose you told the assembler to lda $001E or something (let's assume for simplicity that you're in bank $00). In WLA, since that address fits in 8 bits, it would probably assemble into "A5 1E", and instead of loading the value from $00001E as desired, the system would try to read $00:(D+$1E) = $00211E, which is a Mode 7 matrix register and also write-only, and the result would be open bus (ie: garbage).

Now, that case would work fine if you had D set to $0000. But suppose you had the data bank register set to $5A? Now you're trying to access a HiROM bank, so there's no RAM mirror there and the data at $001E will be different from the same address in bank $00. Even with D at zero, you're going to get $00001E instead of $5A001E, because direct-page doesn't care what the data bank is.
The only manual input I remember is when it told me that bra didn't work because the number was too high or something, so I switched to brl and it worked fine. (bra is 8 bit, while brl is 16 bit?)
That's a lot like DP in some ways, except that since the opcode names aren't the same it's impossible for the assembler to bork it up.

A closer comparison to direct-page vs. absolute might be between bra and jmp, since one is relative to a given position within the program bank and the other is not.
Also, I guess it would probably be good to learn about direct paging and stuff?
Yes. At the very least you need to be able to keep the assembler from doing weird things. And once you understand this stuff, you can exploit it for speed and convenience...
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: 16bit table indexing problem

Post by koitsu »

Well damn, I had a huge and well-written-out post that compounded 93143's post, but then I went messing about trying to find old posts to reference and ended up losing my stuff by accidentally clicking "Edit" on someone's post. Doh. :(

One thing I did want to mention here: I've tried to figure out (per the docs) how WLA DX decides whether to use direct page or absolute addressing opcodes (more specifically: if it tracks tcd like Tepples asked, or if there's an explicit syntax modifier for forcing one or the other (common in IIGS assemblers)), and it isn't mentioned in the docs anywhere that I can find, nor in the examples. I read the Assembler Syntax and 65816 sections and neither state anything useful (sigh). I did find this, however, describing .struct:

Code: Select all

A WORD OF WARNING: Don't use labels b, B, w and W inside a struct as e.g.,
WLA sees enemy.b as a byte sized reference to enemy. All other labels should
be safe.

lda enemy1.b  ; load a byte from zeropage address enemy1 or from the address
              ; of enemy1.b??? i can't tell you, and WLA can't tell you...
This is a great example of WLA DX's horrible documentation. Honestly I've been sitting here for a good 5 full minutes trying to work out exactly what the author is trying to convey. It's specifically referring to 6502 here, but seriously, my brain is a circular bunch of mush:

I start thinking: "okay, he means that within a .struct block, you should not use labels named b or w or else the parser won't know if you're..... wait, no, that makes no sense: .b and .w are used to specify the "size" of an address or immediate value at expansion-time, what does that have to do with the actual label itself?"

But then I read the code comment and I think "oh wait, I see, lda enemy1.b (which should syntactically be the same as lda.b enemy1) doesn't allow the assembler to determine if you want to expand enemy1 into an 8-bit address for zero page access, or refer directly to the (presumably) 16-bit address of label enemy1.b... except what does that have to do with structs and b and w? There's no . (period/dot), so what the heck is going on? What is the parser doing?!"

My gut feeling is that in 65816 mode, it probably uses absolute addresses all the time unless specifically told not to... except for the beautiful bug that psychopathicteen pointed out and I expanded upon, where the assembler is ridiculously choosing to start referring to 8-bit addresses in direct/zero page for no reason. (I haven't had time to look into that, but I still have those couple of gut feelings...)

The only thing I could find in the WLA DX docs about that assemble-time decision is with regards to using the .b modifier, or the .8bit directive:

Code: Select all

For example:

LSR 11       ; $46 $0B
LSR $A000  ; $4E $00 $A0

The first one could also be

LSR 11       ; $4E $0B $00

.8BIT is here to help WLA to decide to choose which one of the opcodes it
selects. When you give .8BIT (default) no 8bit address/value is expanded
to 16bits.

By default WLA uses the smallest possible size. This is true also when WLA
finds a computation it can't solve right away. WLA assumes the result will
be inside the smallest possible bounds, which depends on the type of the
mnemonic.
The last paragraph is awfully damning, but the phrase "can't solve right away" makes my brain explode. Right away? So, what, it can figure it out later? What does that even mean?

With regards to the bug psychopathicteen pointed out, it's literally like the assembler's internal logic is somehow suddenly deciding to use direct page/zero page addressing when it most definitely shouldn't. But that last paragraph implies that some kind of "computational mistake" that caused it to do this, and the code (to me) doesn't give any indication that the assembler would have any difficulty.

I even bothered to check in the IIGS mini-assembler writing the same code -- and it does the right thing.
Sik
Posts: 1589
Joined: Thu Aug 12, 2010 3:43 am

Re: 16bit table indexing problem

Post by Sik »

koitsu wrote:The last paragraph is awfully damning, but the phrase "can't solve right away" makes my brain explode. Right away? So, what, it can figure it out later? What does that even mean?
Taking a guess, it probably means an expression that can't be solved in the first pass but can be solved in the second (obvious example: referring to a label that's defined later in the code - in the first pass the assembler won't have seen it yet, but by the second pass it will have). This can actually be quite of a problem if something has to be guessed since usually assemblers determine the opcode to use in the first pass (so they can know the addresses of all instructions by the second pass).

Why the heck it decides that the smallest size possible (i.e. the most prone to break) should be the default in case of guessing, I don't know. Practically every time I see an assembler stumble upon a situation like this, they either default to the largest size (unoptimal but bound to work always), choose a default size and then throw an error if it doesn't work (e.g. 68000 defaulting to word opcodes), or just throw an error immediately. They don't let it pass if it can break.
psycopathicteen
Posts: 3140
Joined: Wed May 19, 2010 6:12 pm

Re: 16bit table indexing problem

Post by psycopathicteen »

You might as well download bass.exe, since it's the most current assembler.
tomaitheous
Posts: 592
Joined: Thu Aug 28, 2008 1:17 am
Contact:

Re: 16bit table indexing problem

Post by tomaitheous »

tepples wrote:A "bank" is 65536 bytes. "Direct page" is the ability to relocate the 65816's counterpart to the 6502's zero page anywhere in bank $00. Direct page addressing modes end up behaving more like a frame pointer (like EBP on x86), even if 6502 fans set it to $0000 for familiarity.
I think I remember reading that if you set the DP register to anything other than default, that there would be a +1 cycle penalty for all ZP addressing modes.
__________________________
http://pcedev.wordpress.com
KungFuFurby
Posts: 275
Joined: Wed Jul 09, 2008 8:46 pm

Re: 16bit table indexing problem

Post by KungFuFurby »

I think I recall that the penalty only applies if the direct page register uses a non-zero value for the low byte (high byte can be any value and no penalty will apply).

If you're looking for the latest version of bass (I think it's v14), I have a copy of v14 on my end (normally, it's an .xz file, but I can make a zip version of it).
User avatar
koitsu
Posts: 4201
Joined: Sun Sep 19, 2004 9:28 pm
Location: A world gone mad

Re: 16bit table indexing problem

Post by koitsu »

Re: D register and cycle penalties: covered earlier by 93143 (last paragraph). And yes, the +1 cycle penalty only applies if the low byte of register D is non-zero, e.g. lda #$1e20 / tcd would induce a +1 cycle penalty for all subsequent DP access, while lda #$1e00 / tcd wouldn't.
tomaitheous
Posts: 592
Joined: Thu Aug 28, 2008 1:17 am
Contact:

Re: 16bit table indexing problem

Post by tomaitheous »

Ahh, that's what I remember: if DP register was anything other than '0', then apply said penalty. The document I had must have not bothered to mention this was strictly the case of the low byte of DP register.
__________________________
http://pcedev.wordpress.com
Post Reply