It is currently Fri Apr 28, 2017 3:18 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 28 posts ]  Go to page Previous  1, 2
Author Message
 Post subject: Re: Flight Minigames
PostPosted: Wed May 21, 2014 3:05 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 18202
Location: NE Indiana, USA (NTSC)
43110 wrote:
I think should start looking into tepples sound engine. I believe it's pretty much music.s sound.s and musicseq.* in RHDE.

I do like the 64 note range of famitone2

Patterns in my music engine support a TRANSPOSE command that changes the base note for the remainder of a pattern. For example, TRANSPOSE,12 shifts the rest of the pattern up by an octave, and RHDE's combat theme uses this. It just has the two-octave range for a more compact encoding of the common case.

Quote:
and my edit that allows different tempo and pitch setups.

Which "tempo and pitch" setups are you talking about?

(And is this tune happy or crappy?)


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Wed May 21, 2014 3:28 pm 
Offline
Formerly 43110
User avatar

Joined: Wed Feb 05, 2014 7:01 am
Posts: 286
ahh, two-octave local range per pattern range makes sense. The 64 note range I'm talking about is the period table.

Tempo and pitch for adjusting between NTSC, PAL, and Dendy TV systems. I hacked that in famitone2 by using the upper two bits of a variable.

tepples wrote:
(And is this tune happy or crappy?)

umm, both? It'll be good for the flappy mode featuring a butterfly. Unless you want to exclusively use it for that shoot-em-up game.


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Wed May 28, 2014 9:21 pm 
Offline

Joined: Wed May 28, 2014 7:57 pm
Posts: 4
43110 wrote:
Also this is probably the one piece of code that the cpu will spend a good bulk of it's time on.
Code:
;;
; multi purpose multi byte operation
; [API sniped but will be described in full source]
bignum_op:
  ldx temp+4 ; edit, don't touch temp+4
-
  lda (temp+0), y
  eor temp+5    ; a value of $ff will make this a subtraction
  adc (temp+2), y
  sta (temp+0), y
  iny
  dex ; edit, don't touch temp+4
  bne -
  rts


If you are using these for four byte values, then this lets you set the size once for a series of operations without having to reset it all the time. But count up how many times they appear ... if they appear too many times, the overhead of clear and carry and setting up the EOR byte will overwhelm the 11 bytes saved, and you are better off with:

Code:
bignm_add:
  clc
  ldx temp+4
-
  lda (temp+0), y
  adc (temp+2), y
  sta (temp+0), y
  iny
  dex
  bne -
  rts

bignm_sub: ; above subtracts 0 from 2 and stores in 0 ~ is that the right way around?
  sec
  ldx temp+4
-
  lda (temp+2), y
  sbc (temp+0), y
  sta (temp+0), y
  iny
  dex
  bne -
  rts


Its only a narrow range of number of calls of bignum_op where it actually nets saving space, when extra calling overhead is taken into account. If there are like two routines that add and two that subtract, you should be ahead, by a few bytes, but even at four routines that do each, the extra call overhead may offset the squeezing of bytes from the called routines.

Similar for clear and copy:

Code:
big_cpy:
  ldx temp+4
-
  lda (temp+2), y
  sta (temp+0), y
  iny
  dex
  bne -
  rts

big_clr:
  ldx temp+4
  lda #0
-
  sta (temp+0), y
  iny
  dex
  bne -
  rts


... allow clearing and copying blocks at a time (set temp+4=y=0), but used during four byte FP integer operations, allows temp+4 to hold the same size index and allows avoiding set-up overhead. There's a double purpose version using "AND temp+5", but at a net saving of 9 bytes, its a close run thing whether extra set-up overhead swamps the crunching of the factored operation.

In the middle case, if the program flow allows the temp+5 set-up to be stable over multiple operations, so there is not much overhead setting temp+5, but the carry flag is set or cleared more than four times total in calling big_num, you save space just by setting up distinct entry points to do that:
Code:
;;
bignm_sub:
  sec
  bcs +
bignm_add:
  clc
+
  ldx temp+4
-
  lda (temp+0), y
  eor temp+5    ; a value of $ff will make this a subtraction
  adc (temp+2), y
  sta (temp+0), y
  iny
  dex
  bne -
  rts


... though I still wonder whether its:

OP2 - OP1 =: OP1

... or ... OP1 - OP2 =: OP1 ... in which case it should be "lda (temp+2),y", etc.


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Thu May 29, 2014 9:28 am 
Offline
Formerly 43110
User avatar

Joined: Wed Feb 05, 2014 7:01 am
Posts: 286
I've updated the first posted the work I have done this month (v0.2). There's nothing apparently new in the compiled ROM, but the source has changed a bit. Additionally I'm now planing for 6 games instead of 8.

About what BruceMcF posted:
My motivation for even merging the two multi byte operations was to rid of sequences of jump subroutines, and the major optimization was actually having to do with loading all the prams from three byte tables (see line 1058 in the source of v0.2).

I was trying to preserve X through the whole chain of subroutine calls so that at the top I can do a loop like
Quote:
ldx #0
-
jsr do_math_step
inx
cpx #32
bcc -

but looking again at what I have now, I really should change that to a counter in zeropage, since I had to make an unnecessary stack push in the multiplication subroutine.

Also if you want to see another thing I did with a hard to compute the trade off, I took an idea of having list of addresses in stack and made my game state selector push such a list instead of having a sequence of jsr in a subroutine.


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Fri May 30, 2014 10:29 am 
Offline

Joined: Wed May 28, 2014 7:57 pm
Posts: 4
43110 wrote:
About what BruceMcF posted:

I was trying to preserve X through the whole chain of subroutine calls so that at the top I can do a loop like
Quote:
ldx #0
-
jsr do_math_step
inx
cpx #32
bcc -

but looking again at what I have now, I really should change that to a counter in zeropage, since I had to make an unnecessary stack push in the multiplication subroutine.

Yes, its frequently better to use the X register in inner loops and an index register for outer loops.

Quote:
Also if you want to see another thing I did with a hard to compute the trade off, I took an idea of having list of addresses in stack and made my game state selector push such a list instead of having a sequence of jsr in a subroutine.


Also, if you have a pool of precomputed "RTS addresses" in a known page, you can set up outer factor routines as jump targets:

Code:
;;
routine1: ; routine cannot span a page, cannot start on 0 location in a page
   lda #>rtn1+beg-1
   ldx #>rtn1_end+1 ; rtn1_end generated from source so points to low byte
vec_call:
   sta temp+5
-
   lda vec_pool,x
   pha
   dex
   cpx temp+5
   bne -
   rts ; call set of vectors

routine2: ; see warnings above
   lda #>rtn2+beg-1
   ldx #>rtn2_end+1
   bne vec_call
...


And further crunching in storage of the vector sequences is possible from vectors being allowed to overlap:

Code:
;;
vec_pool:
rtn1_beg:
   doop1-1 ; first operation of routine1
   doop2-1
   doop3-1
rtn2_beg:
   doop4-1 ; first operation of routine2
   doop5-1
rtn1_end:
   doop6-1 ; last operation of routine1
rtn2_end:
   doop7-1 ; last operation of routine2
   ; ...


The fixed space overhead of that particular stack stuffing is 12 bytes (there are of course post-indexed versions that allow the vectors to be anywhere in visible memory). The outer routines break even compared to identical subroutine factor code at six unique operations calls per outer routine, with one byte saved per inner operation call after that point. And of course any overlapping operations in the pool are basically free space to all but one of the calling routines.

So that rewards programming that relies on lots of little factors that are heavily re-used, lego-block style, and factor design that favors the use of overlapping "phrases" of the operation inner factors.


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Fri May 30, 2014 4:33 pm 
Offline
Formerly 43110
User avatar

Joined: Wed Feb 05, 2014 7:01 am
Posts: 286
I had to read that several times before I got it. While I am aiming for subroutine reuse, I don't think things will logically interleave like that. But you did mention putting subroutines in the same page, and I then saw that the address really only have 10 bits for a 4KiB ROM. That means I'm now wasting 6 bits for every address in the list, and that's not too far from having a whole implied byte automatically inserted by the list loader.

Edit: Nevermind, I miscounted. 4KiB is 12 bits, causing only 4 bits of waste per address.

but before I start arranging assembly code like a falling block puzzle game, I need to program the pieces first.

On a related note, I have about 640 unused bits from the note_period table. What can I put there that uses only the upper 5 bits of 128 bytes?


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Fri May 30, 2014 7:47 pm 
Offline

Joined: Wed May 28, 2014 7:57 pm
Posts: 4
43110 wrote:
I had to read that several times before I got it. While I am aiming for subroutine reuse, I don't think I'll have things logically knitted like that. But you did mention putting subroutines in the same page, and I then saw that the address really only have 10 bits for a 4KiB ROM. That means I'm now wasting 6 bits for every address in the list, and that's not too far from having a whole implied byte automatically inserted by the list loader.

But you also want the list loader to be concise.

Without the "falling block puzzle game" side of it, you can have counted vectors anywhere in the ROM without a count byte by using the high bit to flag the beginning of the list, but after working that through, its a bit more intricate and brittle than I like.

Vectors with an index to their own end in front don't let you play the "sliding block puzzle" games, but they are easier to follow what is going on.

And there really is not any binding constraint with the vector pool approach in terms of how many vectors it supports. If you have so many lists of subroutine calls that you've used up one vector pool, you've saved more than enough space to cover the overhead for a second vector pool page.

Code:
;;
;    NB. A "vector pool" is a page with a set of subroutine address vectors in it.
;    The leading byte is the position in the page of the last byte in the vector.

routine1: ; the first entry in vector pool 1.
   ldx #<vector1

go_pool1:
   stx temp+5
   lda vpool1,x
   tax
-
   lda vpool1,x
   pha
   dex
   cpx temp+5
   bne -
   rts ; call subroutine vectors in turn from stack page

routine2:
   ldx #<vector2
   jmp go_pool1

routine3: ; NB. routine3 is only four ops, so there is no space saving from the vector
   jsr op3_1
   jsr op3_2
   jsr op3_3
   jmp op3_4

routine4:
   ldx #<vector4
   jmp go_pool1

; ... and so on, until & unless the vpool1 page is full up, then (supposing it was 30 routines that filled it up):

routine31: ; the first entry in vector pool 2.
   ldx #<vector31

go_pool2:
   lda vpool2,x
   tax
-
routine1: ; the first entry in vector pool 1.
   ldx #<vector1

go_pool1:
   stx temp+5
   lda vpool2,x
   tax
-
   lda vpool2,x
   pha
   dex
   cpx temp+5
   bne -
   rts ; call subroutine vectors in turn from stack page

routine32:
   ldx #<vector2
   jmp go_pool2
...


... so the limit of distinct vectors to a single binary page is not an overall limit on the number of vectors altogether.

And if (as supposed above), 30 vectors filled up a vector page, then using the vector page saved 30 bytes, since each vector pool call is one byte shorter than passing a full address, which

But if you want to really crunch ... yeah the addresses can be packed. EDIT ~ yeah, 4K is 12 bits, so not just can't be bothered packing more than two, CAN'T pack more then two ~ 2 bits to 4 bits fixed put in place w/out more edit markets ~ the original was to crunch routines into the 4K "Golden RAM" in the C64 above the ROM in the memory map and below the I/O, but it was reconstructed from memory, and I got some of the details off ~ plus multiplied when I should have been dividing (oops). /EDIT

If there are an odd number of routines, its the address at the head of the list that is not unpacked, so in the vector pool, an odd list would like like (dummy, so op_# is the place of the op in the sequence, not the actual name, psuedo-assembler syntax):
.byte
.word op1
.byte <op_2, <op_3, ((>op_2 .AND. $0F)*16)+(>op_3 .AND $0F)
.byte <op_4, <op_5, ((>op_4 .AND. $0F)*16)+(>op_5 .AND $0F)
.byte <op_6, <op_7, ((>op_6 .AND. $0F)*16)+(>op_7 .AND $0F)
...

... with the packed address business best constructed as a macro in your favorite 6502 macro assembler, to get it right once and avoid lurking typos.

Note that the high bit of the address is stripped by the packing process and replaced by the routine.

Code:
;;
;    NB. The leading byte is the size of the packed string of vectors in bytes.

routine1: ; the first entry in vector pool 1.
   ldx #<vector1

go_pool1:
   lda vpool1,x
   sta temp+5
   tax
-
   lda vpool1,x
   tay
   and #$0F
   ora #$80
   pha

   lda vpool1,x
   pha
   dex
   cmp temp+5
   beq +

   lda vpool1,x
   tya
   lsr a
   lsr a
   lsr a
   sec
   ror a
   pha

   lda vpool1,x
   pha
   dex
   cmp temp+5
   bne -
+
   rts ; call subroutine vectors in turn from stack page


44 vs 16 bytes, so 28 bytes extra space required by the processing. 56 pairs of subroutine vectors crunched together and that breaks even.

So that is worth considering if the vector pool page is more than half full.

And since each vector pool has its own engine, if you have own full vector pool page and one only quarter full, you can put packed lists in the full one, and unpacked lists in the only partly full one.

Personally, if I was using any, I'd use the simpler one, because its easier to just define the routines using ".word" assembler instructions with the label for the routine.

This is all a kind of building a Forth-like inner interpreter without a Forth assembler/compiler system running in the NES.


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Mon Jun 02, 2014 8:05 am 
Offline
Formerly 43110
User avatar

Joined: Wed Feb 05, 2014 7:01 am
Posts: 286
Well I tried, but Summer is on me and I have no time until the start of the next school year in August. I look forward to the multi-cart production.

Before I go, here's the list of games I had planned in this:
  • Game A: like the Helicopter game
  • Game B: An upside-down version of Balloon Fight physics. What should the character be?
  • Game C: A Flying Saucer that travels in a sine wave unless you press A, (I can place the sine table in the high byte of the note period table)
  • Game D: A paper airplane that always travels ±45°.
  • Game E: An electric spark that travels on one of two lanes. The A button switch between the two.
  • Game F: flappy butterfly A fluttering butterfly.


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Mon Jun 02, 2014 1:37 pm 
Offline

Joined: Wed May 28, 2014 7:57 pm
Posts: 4
BruceMcF wrote:
Code:
;;
;    NB. A "vector pool" is a page with a set of subroutine address vectors in it.
;    The leading byte is the position in the page of the last byte in the vector.

routine1: ; the first entry in vector pool 1.
   ldx #<vector1

go_pool1:
   stx temp+5
   lda vpool1,x
   tax
-
   lda vpool1,x
   pha
   dex
   cpx temp+5
   bne -
   rts ; call subroutine vectors in turn from stack page
... {etc.}


Note that a second way to use the redundant 4 bits per address is as data.

List: .byte <op1, >op1 .AND $0F, <op2, >op2 .AND $0F, ...

Call:
A-reg: index to first address in list
X-reg: index of first address of following routine

Code:
routine1:
   lda #<lst_rtn1
   ldx #<lst_rtn2
   jmp call_vec
...

call_vec:
   sta temp+5
-
   dex
   lda vec_pool,x ; high byte address, data in high nybble
   and #$0F
   ora #$80
   pha
   dex
   lda vec_pool,x ; low byte address
   pha
   cpx temp+5
   bne -
   rts


This packs operation scripts of up to 128 routines total into a page, since the length information is entirely embedded by the call. Having an all-address pool makes it reasonably straightforward to use the high nybble of the even address as 128 nybbles of data.

I'm thinking of butterfly as something that would fly with a bit of unpredictable imprecision, so it's something where a wrap-around sequence of +/- trim values could be laid out to make it "fluttery", but various processes that could be given character with a cycle of signed 4 bit integers (-8 through +7) could work as well.

nnyb2int: ; signed high nybble to signed integer
php ; remember sign
lsr a
lsr a
lsr a
lsr a
plp
bpl +
ora #$F0
+ rts

; flutvflo = $07 for 16, 8-step cycles; $0F for 8, 16 step cycles, $1F for 4, 32 step cycles
flutvflo:
.byte $0F

inc_flutter: ; increment wrap around based on mask in flutvflo:
inc flutter
lda flutter
bit flutvflo
bne +
clc
sbc flutvlfo ; this is a trick ~ "clc / sbc N" is the same as "sec / sbc {N+1}", and the mask+1 is the wrap-around
sta flutter
+ lsr a
tax
inx
lda vec+pool,x ; data is in high nybble of high byte of address
rts

Have a good summer!


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Mon Jul 07, 2014 7:17 pm 
Offline
Formerly 43110
User avatar

Joined: Wed Feb 05, 2014 7:01 am
Posts: 286
I been studying a self made disassembly of a old Nintendo game, and it's been an interesting experience, so I guess I'll bump this thread for the question I want to ask instead of making another thread. I hope I can get this game started again.

Suppose I do all the logic of the entire game by lots of small subroutines. Some subroutines might be used only once and inlined or be called as the last thing in another subroutine and jmp'ed to. A conditional jump might be optimized into a direct branch by rearranging code, or a conditional return might branch to a rts above the routine if the one below is to far away.

Is there some tool that can generate a call graph and automatically make these optimizations? If not, what parser can I work off of to quickly make my own optimizer? (I'm ok with python stuff)


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Mon Jul 07, 2014 8:22 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 18202
Location: NE Indiana, USA (NTSC)
Years ago, I wrote a program in Python to make a call graph of a ca65 program. This is what the graph looked like. Should I try posting the program?


Top
 Profile  
 
 Post subject: Re: Flight Minigames
PostPosted: Tue Jul 08, 2014 7:45 am 
Offline
Formerly 43110
User avatar

Joined: Wed Feb 05, 2014 7:01 am
Posts: 286
Sure, why not.


Top
 Profile  
 
PostPosted: Wed Aug 06, 2014 1:41 pm 
Offline
Formerly 43110
User avatar

Joined: Wed Feb 05, 2014 7:01 am
Posts: 286
I've given up on this project so that I can clear my mind for the planning of another nes game, and to give time to gear myself up for the next competition.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 28 posts ]  Go to page Previous  1, 2

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group