What instead of indexed addressing modes?

Discussion of programming and development for the original Game Boy and Game Boy Color.
tepples
Posts: 22284
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What instead of indexed addressing modes?

Post by tepples » Mon Mar 19, 2018 8:30 pm

That gets complicated when each actor ends up with dozens of different Uses flags, one for each possible line that could be in or not in a particular actor's script, that need to get copied from the actor's prototype. It gets doubly complicated when a line in the Grand Unified Script needs to turn a bunch of Uses flags on and off when an actor goes in or out of a particular state.

Today I decided to look at how commercial games solve the SOA vs. AOS dilemma. But I discovered that very few Game Boy games listed on Data Crystal actually have their RAM map substantially filled in. The first one I found was that of Wario Land, which has an 8-entry actor table at $A200, where the actors occupy $A200-$A213, $A220-$A233, $A240-$A253, ..., $A2E0-$A2F3. I guess this validates the array of 32-byte-aligned actor structures.

Oziphantom
Posts: 1080
Joined: Tue Feb 07, 2017 2:03 am

Re: What instead of indexed addressing modes?

Post by Oziphantom » Mon Mar 19, 2018 9:19 pm

I've recently started tumbling down the Z80 rabbit hole.. Only I have a full Z80 at my disposal and 6502 that I can jump back to when the going gets too hot for the Z80 ;)

I'm starting to think that the dispatch is the way to solve the issue.
We are use to doing

Get thing,x
and state
bne _next
lda otherData,x
adc moreData,x
sta otherData,x
and state2
bne _next2
...thing2

while for Z80 it might be better to use the "use bits" as a dispatch.
ld use
<<2
add Base Pointer
call

this way you put a function that knows what the use cases are and can handle the bits without having to look them up. Saving the need to index into multiple tables.

Not done too much experimentation yet, as I'm still trying to find an assembler that isn't circa 1982. Or one that is 1987 spec but isn't trapped on a machine that is slow...

tepples
Posts: 22284
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What instead of indexed addressing modes?

Post by tepples » Fri May 11, 2018 11:52 am

My bad luck continues. I searched Google for 8080 record field access and 8080 struct field access, hoping to stumble on some idiom that has become common practice, but most results were some website hosted on port 8080, not Intel 8080. I tried 8080 assembly struct field access, and Google tried to second guess me with "Missing: 8080"

adam_smasher
Posts: 272
Joined: Sun Mar 27, 2011 10:49 am
Location: Victoria, BC

Re: What instead of indexed addressing modes?

Post by adam_smasher » Fri May 11, 2018 12:43 pm

If you put a search term in quotes Google won't show you results without it.

You might try looking at the output of 8080 or GBZ80 C compilers when interacting with structs to see how they handle the problem.

tepples
Posts: 22284
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What instead of indexed addressing modes?

Post by tepples » Fri May 11, 2018 5:46 pm

adam_smasher wrote:You might try looking at the output of 8080 or GBZ80 C compilers
In other words, the godbolt solution. I'd've tried that if SDCC were any good. See "To C or not to C?" by ISSOtm.
ISSOtm wrote:[GBDK is] built on an ancient build of SDCC, which is known to generate poor (bloated) and often straight up wrong code.
What C compilers targeting 8080 are any good? Any luck with, say, BDS C?

adam_smasher
Posts: 272
Joined: Sun Mar 27, 2011 10:49 am
Location: Victoria, BC

Re: What instead of indexed addressing modes?

Post by adam_smasher » Fri May 11, 2018 7:14 pm

It might be worth giving SDCC a shot anyway - it's still under active development, and the aforementioned "ancient build of SDCC, which is known to generate poor (bloated) and often straight up wrong code" that GBDK is based on is 17 years old.

tepples
Posts: 22284
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What instead of indexed addressing modes?

Post by tepples » Thu May 24, 2018 8:55 pm

In GBDev Discord, ISSOtm announced an RGBDS macro pack to define structs and is trying to figure out how to best distribute it. Alongside this came some practical idioms for struct field access.

Code: Select all

  ; Prep: 3 mcycles each
  ld de,self
  ld bc,other_actor

  ; Random field load/store: 7 mcycles, BC preserved
  ld hl,offsetof(Actor, xsub)  ; 3
  add hl,de                    ; 2
  ld a,[hl]                    ; 2
  ; Compare 6502: 5 cycles (minus 1 for load not crossing page)
  lda actor_xsub,x             ; 4

  ; Random field arithmetic: 8 mcycles, BC preserved
  ld hl,offsetof(Actor, xsub)  ; 3
  add hl,de                    ; 2
  ld l,[hl]                    ; 2
  add a,l                      ; 1
  ; Compare 6502: 4 cycles (plus 1 for crossing page)
  add actor_xsub,x             ; 4

  ; Store constant in field: 8 mcycles, ABC preserved
  ld hl,offsetof(Actor, frame)  ; 3
  add hl,de                     ; 2
  ld [hl],FRAME_JUMP            ; 3
  ; Compare 6502: 7 cycles and A is clobbered
  lda #FRAME_JUMP               ; 2
  sta actor_xsub,x              ; 5
Roughly: 8080 is faster for sequential access, and 6502 is faster for random access. Try counting aaaa,X vs. (dd),Y accesses in your NES program to estimate what parts might become slower or faster respectively.

tepples
Posts: 22284
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What instead of indexed addressing modes?

Post by tepples » Tue Jan 22, 2019 10:11 am

I've since collected a lot of this into a wiki article.

tepples
Posts: 22284
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What instead of indexed addressing modes?

Post by tepples » Wed Nov 25, 2020 8:45 pm

In this topic, strat claimed that this 6502 code

Code: Select all

ldx character_index
; this part in the move routine
lda lo_x_pos,X
clc
adc lo_x_vel,X
sta lo_x_pos,X
became this SM83 code

Code: Select all

ld bc,lo_x_pos
ld h,00
ld l,(character_index)
; this part in the move routine
push hl
add hl,bc
ld b,h
ld c,l
pop hl
ld de,lo_x_vel
add hl,de
ld a,(bc)
add a,(hl)
ld (bc),a
strat seems to suggest that it is worthwhile to copy an actor's struct out to HRAM, manipulate it in HRAM, and copy it back to the actor struct.

Since the start of this thread, "dan jia" on the gbdev Discord server reported that Wisdom Tree's GB games put what amounts to C++'s this in BC and treat HL as a temporary, much like $at in MIPS or r12 in ARM.

Code: Select all

ld hl, character_index
ld b, 0
ld c, [hl]
; this part in the move routine
ld hl, lo_x_pos
add hl, bc
ld a, [hl]
ld hl, lo_x_vel
add hl, bc
add a, [hl]
ld hl, lo_x_pos
add hl, bc
ld [hl], a
Each 3-byte, 4-cycle indexed operation becomes a 5-byte, 7-cycle add. This starts hurting given that the GB CPU has only three-fifths of the NES CPU's clock speed.

ISSOtm on the gbdev Discord server maintains that it's possible to arrange the properties of an actor in advance for sequential access, so that most of the time is spent on inc hl and dec hl (or better yet inc l, dec l, ld a, [hl+], or ld a, [hl-]. But other things, such as school, paid projects, and rewriting an assembler that's full of technical debt, have occupied him.

User avatar
aa-dav
Posts: 154
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: What instead of indexed addressing modes?

Post by aa-dav » Mon Nov 30, 2020 11:36 pm

Yep, indexing is strong side of 6502.
I think it's really better to arrange characters as array of structs in i8080 and do something like:

Code: Select all

; suppose hl=start_of_character
ld bc, x_ofsset
add hl ; long index change
ld a, (hl+)
inc hl
inc hl
add a, (hl-)
dec hl
dec hl
ld (hl), a
Given characters are all in the same page next trick could be used to shift index without limits quickly (relatively):

Code: Select all

ld a, offset // could be negative
add a, l
ld l, a
And strong side of i8080 (and derivatives) is big register set and their ability to act like others, so you should always keep in mind that.
For example instead of:

Code: Select all

ld hl, lo_x_pos
add hl, bc
ld a, [hl]
ld hl, lo_x_vel
add hl, bc
add a, [hl]
ld hl, lo_x_pos
add hl, bc
ld [hl], a
You should:

Code: Select all

ld hl, lo_x_vel
add hl, bc
ld d, [hl] ; just keep it
ld hl, lo_x_pos
add hl, bc
ld a, [hl]
add a, d
ld [hl], a ; no need to change index twice
So you do not require 'a' to keep intact and could use quick shifts as was said above if data is reorganized in array of structs.
It's tricky thing sometimes to start think in another architecture. And I am staying on the opposite side: my first architecture was Z80 and 6502 seemed very limited first times I looked at it. And main reason was small and non-orthogonal register set and need to access memory every f... time. He who wrote code above seems to be in 6502 league and just didn't even think about saving private Ryan in register. But he implemented lda/adc/sta as he used to do. :)
And the first thing you think about while programming for i8080/Z80 is about saving all what is possible to save in registers.

tepples
Posts: 22284
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Re: What instead of indexed addressing modes?

Post by tepples » Tue Dec 01, 2020 9:06 am

Having a background in both C and 6502 assembly, I was able to learn the Zilog Z80 in the Game Gear by applying my C habits to park the actor pointer (what C++ and JavaScript call this and Python calls self) in register IX. Unfortunately, that Z80 style doesn't translate to i8080/8085 or SM83. So I'll try to summarize your post to understand it:

Organize actors as an array of structs, just as on Z80 and practically every other major non-65xx ISA, with no struct crossing page boundaries. This gives lets you park this in BC, giving 5 cycles to seek to another field without trashing A (ld hl then add hl, bc) or 4 cycles with trashing A (adding to L). When doing calculations involving multiple fields, you have two options: either precalculate field address low bytes ahead of time and store them in D and E until needed, or prefetch field values and store them in D and E until needed. This should buy enough time to get something working before you optimize field order later in the project.

Is it worthwhile to create a set of macros that tries to find a route between the current field and the next using set B, l, res B, l, inc l, dec l, and possibly even use of ld a, [hl+] or ld a, [hl-] if the next field is known in advance?

User avatar
aa-dav
Posts: 154
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: What instead of indexed addressing modes?

Post by aa-dav » Tue Dec 01, 2020 9:29 am

My experience (but I am not VERY experienced dude) in i8080 and descendads can be summarized in this list:
a) Try to place everything in registers in leaf-node functions or just loops. But if it's not possible - don't fear to use stack/memory as temporary storage. Don't be paranoid in this, it can be antiproductive (subrule a(2)). (in my experience wish to pack every subtask in register set was unproductive too often and it was hard task for me to stop optimizing every possible byte transfer. 6502 do not induce such wish because memory transfers are inveitable part of architecture)
b) This architecture could be perceived as pointer-oriented. HL, BC and DE serve well as pointers and pointer-oriented algorithms suits well here. That is: incrementing pointer instead of indexing, referring by pointer and not index and so on. HL and DE as pointers from->to and BC as counter scream for memcpy procedure. ;)
The only fallback here is absence of testing for pointer arithmetic (intended!): inc regPair doesn't change flags at all.
So to test BC for zero we are doomed to:
ld a, b
or c
jp z, ...
This fallback was (not really) fixed in Z80, but it's ok to do so in i8080.
But even i8080 is ideal for strcpy - because we test accumulator.
Also remember that you could increase lower part of address register as 8 bit and switch to high part if overflow occurs in some algorithms, so you are not required to use HL as destination of every 16-bit arithmetic here.

c) Accumulator is almost always dirty and overriden. Don't try to save it for long - it's first in queue for exclusion from rule (a). However if you can save it - it's cool. But see subrule (a(2)).

This is basics. Hope it helps somehow. For now I have no time, so will continue about examples some time later.

P.S.

I think you are correct. But MACROs are too much, imho. Try to do in more general way first, yes.

User avatar
aa-dav
Posts: 154
Joined: Tue Apr 14, 2020 9:45 pm
Location: Russia

Re: What instead of indexed addressing modes?

Post by aa-dav » Tue Dec 01, 2020 6:28 pm

P.P.S.
I never programmed Sharp LR35902, but I wrote article about differences of it, i8080 and Z80 (not in english).
It can be summarazied in these 4/5 tables (Z80 syntax):
a) Instructions of i8080 removed:

Code: Select all

D3  out (*), a
DB  in a, (*)
E3  ex (sp), hl
EB  ex de, hl
E4  call po, **
EC  call pe, **
F4  call p, **
FC  call m, **
b) Prefixes of Z80 removed:

Code: Select all

DD (ix prefix)
ED (ext prefix)
FD (iy prefix)
c) Instructions of i8080 changed:

Code: Select all

Code    Old             New
------------------------------------
E0      ret po          ldh (a8), a
F0      ret p           ldh a, (a8)
E2      jp po, **       ld (c), a
F2      jp p, **        ld a, (c)
E8      ret pe          add sp, r8
F8      ret m           ld hl, sp+r8
EA      jp pe, **       ld (a16), a
FA      jp m, **        ld a, (a16)

Last two instructions above were in the original set with another opcodes. But they were changed for these operations:
Code    Old             New
------------------------------------
22      ld (**), hl     ld (hl+), a
2A      ld hl, (**)     ld a, (hl+)
32      ld (**), a      ld (hl-), a
3A      ld a, (**)      ld a, (hl-)
d) Instruction of Z80 changed:

Code: Select all

Code     Old             New
------------------------------------
08      ex af, af'      ld (a16), sp
10      djnz *          stop 0
D9      exx             reti
Also: all instruction of Z80 with prefix CB were saved, but even more: undeocumented part of opcodes CB30-CB37 were updated to instruction SWAP r.

Post Reply