8x16 and whatever else unreg wants to know

Are you new to 6502, NES, or even programming in general? Post any of your questions here. Remember - the only dumb question is the question that remains unasked.

Moderator: Moderators

tepples
Posts: 22708
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: 8x16 and whatever else unreg wants to know

Post by tepples »

Since when can adc use absolute indirect?

The reason why the jmp instruction appears to have an absolute indirect mode in the first place is that regular jmp aaaa behaves like an immediate load into the program counter (PC), and jmp (aaaa) is more like an absolute load into PC.

Code: Select all

jmp $CDEF    ; really ldpc #$CDEF
jmp ($CDEF)  ; really ldpc $CDEF
As for why, I guess in the mid-1970s, memory sizes (and thus problem sizes) for 8-bit microprocessors were small enough that MOS Technology engineers assumed that each step of processing would need only one index into a large array or array of variable starting point. And it's very much tuned toward indexing into multiple arrays with the same index value. I've made metatile engines that use four parallel pointers on zero page, one each to the tiles at the top left, top right, bottom left, and bottom right corners of a metatile, and all four are indexed with the same value in Y. The 6502 was also intended to compete directly with the Motorola 6800, which has two accumulators (A and B) but only one general-purpose data pointer (IX) and one special-purpose data pointer (SP).
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: 8x16 and whatever else unreg wants to know

Post by tokumaru »

unregistered wrote: Wed Jan 22, 2020 10:44 amwhy did MOS Technology not allow their lda to use Absolute Indirect addressing? adc and jmp can use that; it’s not fair!
The 6502 is not a particularly well designed CPU... We can argue about why it lacks this or that all day long, but it is what it is and there's nothing we can do about it. There are times when we feel like trading a kidney for an extra register, but we have to know when to admit defeat and just use RAM instead.

Computers that normally run code from RAM (such as the Commodore 64) can take advantage of the fact that all absolute addresses are stored in RAM, which allows you to change addresses in the code dynamically to your heart's content and have all the benefits of absolute indirect addressing at the speed of absolute addressing!

If you really need the speed and have the RAM to spare, you could do the same thing and place this critical code in RAM.
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

tepples, ooh, sorry, my old version of MOS Technology’s Programming Manual’s Appendicies, printed out, lists, at the beginning of Appendix C, both adc and jmp as being able to use Absolute Indirect... guess that’s a mistake they must have corrected in a later version. My early version says the Absolute Indirect addressing mode for adc is supposed to take 6 cycles.

That’s really cool 8-) using 4 zero-Page pointers accessing each corner of a 16bit metatile. :)

Thank you for explaining the logic! :D


tokumaru, I love RAM! But, there are already 4, I think, functions in my saveRAM doing that, it can take twice as much space using RAM opposed to ROM, it requires initializing-cycles spent transferring ROM to RAM, and I just wanted a simple lda using Absolute Indirect... but, it’s really ok; was just kind of venting my calm frustration/confusion. :)
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

unregistered wrote: Wed Jun 05, 2019 2:15 pmedit: sigh, guess I caused my asm6_ to sometimes assemble corrupt binaries. :( tokumaru recommends asm6:
tokumaru[color=#FF8040], [url=https://forums.nesdev.com/viewtopic.php?p=110164#p110164]here[/url],[/color] wrote:NESASM and ASM6 are equally simple IMO: both can create a ROM from nothing more than a single ASM file, without configuration files or complex command lines. If you're going for simplicity, you should pick one of these 2. I prefer ASM6because NESASM uses non-standard 6502 syntax for some things and it has been known to fail silently in the past, producing corrupt binaries without reporting any errors (some people say these bugs have been fixed).
I need to fix this sometime. :oops:
But, I forgot that God blessed me with getting asm6_ to sometimes assemble corrupt binaries; during its creation I really wanted to be able to debug the rejected binary so I could see exactly why it was failing to be created. So, I guess it’s a weird “feature” :wink: of asm6_ that I really forgot to report.

edit: I did want to add a symbol to asm6_’s cmd-line output that would alert a corrupt binary being created, but during my few learning attempts, I couldn’t figure out how asm6_’s code had been changed to allow corruption. :( Lacking comments is a terrible mistake... oh well.
turboxray
Posts: 348
Joined: Thu Oct 31, 2019 12:56 am

Re: 8x16 and whatever else unreg wants to know

Post by turboxray »

The source code is public right? Who's officially maintaining it?
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

turboxray wrote: Mon Jan 27, 2020 2:39 pm The source code is public right? Who's officially maintaining it?
hi turboxray, asm6_ was only worked on by me... it not being updated now bc it does everything I wanted it to do. It’s really useful to just me, I guess. However, if you’d like, you can try it here: https://fervid.org/asm6_.zip

edit:
It’s misleading for me to say “asm6_ was only worked on by me”. Loopy wrote asm6... I just spent a lot of time trying to learn how part of its source worked, and then just made a few very helpful, to me, small edits and additions.

asm6_ works purposefully the exact same way asm6 works with some extra listing file (.lst) creation options. And asm6_ includes my sometime assembling of invalid roms, like explained above your reply, turboxray.
turboxray
Posts: 348
Joined: Thu Oct 31, 2019 12:56 am

Re: 8x16 and whatever else unreg wants to know

Post by turboxray »

unregistered wrote: Mon Jan 27, 2020 4:30 pm
turboxray wrote: Mon Jan 27, 2020 2:39 pm The source code is public right? Who's officially maintaining it?
hi turboxray, asm6_ was only worked on by me... it not being updated now bc it does everything I wanted it to do. It’s really useful to just me, I guess. However, if you’d like, you can try it here: https://fervid.org/asm6_.zip

edit:
It’s misleading for me to say “asm6_ was only worked on by me”. Loopy wrote asm6... I just spent a lot of time trying to learn how part of its source worked, and then just made a few very helpful, to me, small edits and additions.

asm6_ works purposefully the exact same way asm6 works with some extra listing file (.lst) creation options. And asm6_ includes my sometime assembling of invalid roms, like explained above your reply, turboxray.
The reason I ask, is a quick search on github shows a handful of forks/repositories (some of then pretty recent).
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

turboxray wrote: Mon Jan 27, 2020 5:44 pm
unregistered wrote: Mon Jan 27, 2020 4:30 pm
turboxray wrote: Mon Jan 27, 2020 2:39 pm The source code is public right? Who's officially maintaining it?
hi turboxray, asm6_ was only worked on by me... it not being updated now bc it does everything I wanted it to do. It’s really useful to just me, I guess. However, if you’d like, you can try it here: https://fervid.org/asm6_.zip

edit:
It’s misleading for me to say “asm6_ was only worked on by me”. Loopy wrote asm6... I just spent a lot of time trying to learn how part of its source worked, and then just made a few very helpful, to me, small edits and additions.

asm6_ works purposefully the exact same way asm6 works with some extra listing file (.lst) creation options. And asm6_ includes my sometime assembling of invalid roms, like explained above your reply, turboxray.
The reason I ask, is a quick search on github shows a handful of forks/repositories (some of then pretty recent).
Ooh, hmm... well I did submit the changes to fasm, but he never implemented my submission... at least “never” by the last time I checked... maybe some took my code and created their own version? Oh well, regardless, asm6_ has really been super helpful to me; glad others enjoy it too. :)

edit:
Also, lying is bad; they are just hurting themselves if they claim they created the asm6_ edit. Honestly, I don’t care to research what you’ve reported turboxray. I’m not worried in the slightest bc God is in charge of everything; He gives me peace. :)
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

While writing CHRRAM, our code overflows by 8 bytes... is there a simple way to branch past the last 8 bytes written if $2007 becomes able to write to a #$00 low-VRAM address? I’m currently thinking about setting v if our function is writing to the last possible page of memory for CHR writing, and adding a bvs at the spot where the code needs to quit.

It would be sweet if there was a simple way to beq around the extra 8 byte writes. :)

edit: and a branch would have to be added to specify we’re in the last loop cycle so that the bvs just branches past the last 8 bytes :(

final-edit: maybe bit can be used to reduce the setup for those 2 branches :)
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: 8x16 and whatever else unreg wants to know

Post by tokumaru »

I think it would make more sense for you to rethink the method you're using to write to CHR-RAM and make such overflows impossible in the first place, since any additional tests you put in will have an impact on the overall performance of the data transfer.

What exactly is the condition that causes that overflow of 8 bytes?

One way to optimize VRAM transfers is to dedicate part of the stack page ($0100-$01FF) to your VRAM data buffer, and have a long unrolled loop in ROM that looks like this:

Code: Select all

UpdateVRAM:
  pla
  sta $2007
  pla
  sta $2007
  ;(repeat according to the maximum
  ;length of a single VRAM transfer)
  pla
  sta $2007
  rts
Then, in the NMI handler you change the stack pointer and have it point to the beginning of the VRAM buffer, so you can jump to the precise position in that unrolled loop that will transfer the exact amount of bytes you need, without the need to update counters or test boundaries, making transfers significantly faster.

If you implement a solution like this, you will not have to worry about overflows or other such problems, and you'll also be making the most or of the available vblank time.
turboxray
Posts: 348
Joined: Thu Oct 31, 2019 12:56 am

Re: 8x16 and whatever else unreg wants to know

Post by turboxray »

unregistered wrote: Mon Jan 27, 2020 9:55 pm edit:
Also, lying is bad; they are just hurting themselves if they claim they created the asm6_ edit. Honestly, I don’t care to research what you’ve reported turboxray. I’m not worried in the slightest bc God is in charge of everything; He gives me peace. :)
I have no idea what any of that means haha. I was just asking if there was a central repository for it.
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

tokumaru, this is my lack of experience response: But pla is 4 cycles. That’s twice as many cycles as lda #. How does a double amount of cycles make your solution more efficient?

8 extra bytes bc: 1 4kb CHR file is 4096 bytes. (4096 bytes / 256 tiles = 16 bytes per tile.) 4096 / 12 frames = 341.3333. So our function writes 342 bytes per frame... 342 bytes * 12 frames = 4104 bytes. 4104 - 4096 = 8 extra bytes.

The write2SaveRAM loop writes hex codes for 2 byte stores per iteration bc 342/2<256, so hmm... maybe it would be good to decrease the loop by 1 so 340 bytes are written, then manually write the other hex codes to SaveRAM so that 341 bytes are written per vblank, and then add even more code to write the last 4 bytes (341 * 12 = 4092 + 4 = 4096) to CHRRAM. I’ll do this :); it would have been sweet if writing to $2007 set some flag once the VRAM address reached an $x000 byte.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: 8x16 and whatever else unreg wants to know

Post by tokumaru »

unregistered wrote: Thu Jan 30, 2020 4:49 pm tokumaru, this is my lack of experience response: But pla is 4 cycles. That’s twice as many cycles as lda #.
You're right, lda # is indeed as fast as it gets - I didn't know you were doing that. Few people use that for VRAM transfers because of how much memory it needs (5x the amount of actual data actually being transferred!).
8 extra bytes bc: 1 4kb CHR file is 4096 bytes. (4096 bytes / 256 tiles = 16 bytes per tile.) 4096 / 12 frames = 341.3333. So our function writes 342 bytes per frame... 342 bytes * 12 frames = 4104 bytes. 4104 - 4096 = 8 extra bytes.
Wait... so you have one unrolled function that writes 342 immediate values to VRAM? That function is 1710 bytes (plus 1 for the RTS) long! Do you have the RAM for that? Or do you have many such functions in ROM, meaning you're dealing with the x5 expansion of your CHR data?

If you have this function in RAM, I don't see why you can't call it by JSR'ing to FunctionStart+(5*8*16) to skip the first 8 tiles on the last transfer. And if you have these functions in ROM, why not just make the last one shorter?
The write2SaveRAM loop writes hex codes for 2 byte stores per iteration bc 342/2<256, so hmm... maybe it would be good to decrease the loop by 1 so 340 bytes are written, then manually write the other hex codes to SaveRAM so that 341 bytes are written per vblank, and then add even more code to write the last 4 bytes (341 * 12 = 4092 + 4 = 4096) to CHRRAM. I’ll do this :);
Since you have a "write2SaveRAM" loop, I assume your unrolled function is in RAM. So I suggest one of the following:

1- Start populating the transfer list after the first 8 tiles, and during vblank, skip these first transfers by JSR'ing directly to the 9th transfer. You can JSR to an indirect JMP to simulate an indirect JSR, so you can pre-calculate the entry point to the function (no conditional logic during vblank).

-OR-

2- When you finish buffering the 334 bytes of the last block, overwrite the following LDA # ($A9) with RTS ($60), causing the function to exit early and skip the last 8 tiles. You have to remember to change the RTS back into an LDA after the transfer ends. This solution is better IMO because you don't need to change the NMI handler at all.
it would have been sweet if writing to $2007 set some flag once the VRAM address reached an $x000 byte.
You have to stop wishing for these oddly specific behaviors that would be useful only in your particular implementation of things. Blocking VRAM writes in such cases would cause much more harm than good, because you could end up inadvertently triggering this when clearing VRAM or updating random pattern or name table regions, both of which are very common tasks. Like it or not, this system architecture has been set in stone with the release of the Famicom nearly 37 years ago, so there's no point in wishing for these things now. Instead of thinking how the platform could be changed to meet your needs, think of how you can improve your code to make better use of the platform as it is.
User avatar
tokumaru
Posts: 12427
Joined: Sat Feb 12, 2005 9:43 pm
Location: Rio de Janeiro - Brazil

Re: 8x16 and whatever else unreg wants to know

Post by tokumaru »

BTW, if you don't mind my asking: why do you need to update an entire 4K of patterns so fast?

I assume that since you're changing an entire pattern table, that you are in the middle of a transition of some kind (room, level, etc.), meaning you could probably turn rendering off and do the whole transfer much faster and without needing all that RAM.

But if this is not a transition or if for some reason you need to display graphics while this update happens behind the curtains, does the difference between 12 and 16 frames justify the absurd use of RAM? Unless this is for an animation, those 4 frames are completely unnoticeable to a normal human being.

Don't get me wrong, I like to see the NES doing lots of CHR-RAM updates like all of its contemporary systems had to do, proving it can do cool things even without the aid of mapper hardware, but while these fast updates make a lot of sense when animating characters and backgrounds, optimizing the hell out of a full pattern table switch feels like overkill to me.
unregistered
Posts: 1318
Joined: Thu Apr 23, 2009 11:21 pm
Location: cypress, texas

Re: 8x16 and whatever else unreg wants to know

Post by unregistered »

Thank you tokumaru! :D (Going to try your #2 option/suggestion.) I didn’t want to block VRAM transfers... just wanted to add a beq to the end of the VRAM transfer function to skip the final 8 bytes written. :) I agree that it is pointless to wish for a change, but, for me, that makes the message more exciting/fun to write. :) Though, I will try to remember your wisdom.

Yes, good points/wisdom shared, this is for my sister’s animation so the less frames between each picture, the better. :)
Post Reply