It is currently Fri Nov 17, 2017 10:17 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 43 posts ]  Go to page Previous  1, 2, 3
Author Message
 Post subject: Re: Koei bytecode
PostPosted: Mon Jun 12, 2017 5:40 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
Oziphantom wrote:
Why not look at and extract the 16bit KOEI titles to see if they already have the above in 16bit?


Apparently their early SNES titles (e.g. Romance of the Three Kingdoms II) do use the same bytecode. Because the virtual machine has no support for an address space larger than 16 bits, they have to constantly copy bytecode and data overlays into RAM, simulating a bankswitched 64K machine.

I had a very brief look at a couple of later SNES Koei games (Uncharted Waters 2 and Genghis Khan 2) and they both look like compiler-generated 65816 code (tons of stack-relative addressing, etc.). So it looks like Koei switched to a different compiler at some point, one that produced native 65816 rather than bytecode. It's a bit interesting that the Famicom version of Genghis Khan 2 (Aoki Ookami to Shiroki Mejika: Genchou Hishi) is interpreted while the SNES version is native 65816.


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Fri Jun 16, 2017 6:29 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
I'm now investigating the assembly language initialization/NMI/syscall code, or "BIOS" as Koei apparently called it. In the MMC5 games there appear to be three major versions of the BIOS: one used in Nobunaga's Ambition 2, Bandit Kings, Ishin no Arashi and Rot3K2 (with minor changes per game); one used in Uncharted Waters and L'Empereur; and a final version used in Gemfire, Genghis Khan 2 and Nobunaga's Ambition 3 (aka Lord of Darkness on the SNES and Genesis).

The US versions of Nobunaga's Ambition 2, Bandit Kings and Rot3K2 are actually kind of hybrids between BIOS version 1 and 2: the RAM addresses they use are version 1 (a ton of variables got shuffled in and out of zero page between the versions--I guess someone did some profiling :)) but a number of code changes from version 2 were backported in. The backported changes appear to be mainly sound related--version 1 uses the MMC5 channels exclusively for melodic sound effects, but of course those channels aren't usable on a stock NES, so they had to backport the version 2 sound engine which can share channels between BGM and effects in order to move those sound effects to the built-in APU channels.


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Wed Aug 02, 2017 10:46 am 
Offline

Joined: Thu Oct 13, 2005 10:39 am
Posts: 66
oh, lol... accidentally I'm doing the same re work for a couple of week already, but haven't seen this topic by now...
I was going to write some kind of documentation too, but seems this is not needed anymore. here is a complete info already, better than I can write myself. I even got here some fixes to my custom disasm scripts...

as a side note, I'd like to mention, KOEI's C compiler (strong evidence this is actually a C code is here https://tcrf.net/Aerobiz_Supersonic_(SNES)) has an ability to produce the native 6502 code as well as bytecode at least since 1988. not only the SNES version of uncharted waters 2 has this auto generated native code, but almost every NES title. usually, most of the native code you could see in the NES games by KOEI is autogenerated. except the interpretator itself, bios and some helpers. the rest is for sure autogenerated, maybe only with later manual polishing... looks like they wanted to make some portions of code work faster, or even did some profiling for most frequently used functions. seems they just couldn't compile all the code in native 6502 since it uses a lot more space. the game with maximum autogenerated native code is L'Empereur, almost half of scrips compiled as native code.

if someone interested, I can provide some practical tools to decompile these scrips (ida loader+scripts). doubt I will recreate some kind of KOEI's compilator. but at least I can provide a solution to reassemble the existing code with all links and functions. with using some asm macroses, you could rewrite all bytecode in a nice looking readable and editable sourses. maybe for those who wanted to translate some japaneseonly games (secrets and unused debug already found by me lol, the main reason I started to do all this thing myself)...


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Wed Aug 02, 2017 7:19 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
CaH4e3 wrote:
if someone interested, I can provide some practical tools to decompile these scrips (ida loader+scripts). doubt I will recreate some kind of KOEI's compilator. but at least I can provide a solution to reassemble the existing code with all links and functions. with using some asm macroses, you could rewrite all bytecode in a nice looking readable and editable sourses. maybe for those who wanted to translate some japaneseonly games (secrets and unused debug already found by me lol, the main reason I started to do all this thing myself)...


I don't own or use IDA (I've written my own extensible disassembler instead) but I'm interested in seeing your decompiler script anyway.

Did you discover if the apparently redundant bytes in the stack frame are used for anything?


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Thu Aug 03, 2017 11:12 am 
Offline

Joined: Thu Oct 13, 2005 10:39 am
Posts: 66
so, the scripts mostly useless for you without ida. they firmly depends on my other custom disasm scripts. but I may try to make some demo what they do and what I can do with it...

and no, I haven't digeed so much in stack to check if there any oddities ;) i'm sure, any oddities, is just a genuine "bugs" or redundancy of the compiler. the bioscall always copy 22 bytes of stack in arguments buffer in any case, even if there is only one or two args (so maybe because of that they did some fast versions of bios call using only first arg to switch banks or something..)


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Thu Aug 03, 2017 4:21 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
CaH4e3 wrote:
so, the scripts mostly useless for you without ida. they firmly depends on my other custom disasm scripts. but I may try to make some demo what they do and what I can do with it...

and no, I haven't digeed so much in stack to check if there any oddities ;) i'm sure, any oddities, is just a genuine "bugs" or redundancy of the compiler. the bioscall always copy 22 bytes of stack in arguments buffer in any case, even if there is only one or two args (so maybe because of that they did some fast versions of bios call using only first arg to switch banks or something..)


Even if I can't run your scripts without IDA, I can port them to my own disassembler :)

Well, I've found one use for the "hole" at frameptr+8. Your tip that there was also compiler-generated native code was just what I needed. Take a look at the routine at 1F/E24B in Ishin no Arashi, 1F/E37A in Rot3K2 or 1F/E2E3 in L'Empereur (the same routine exists in BKoAC and probably most/all of the games, but it's not always in the fixed bank. I guess the games that have it in the fixed bank are the ones with the most compiler-generated native code)

The purpose of this routine seems to be to create a bytecode-compatible stack frame for a native-code function, and to clean up that stack frame when that function exits (it does some 6502 stack manipulation to wedge its cleanup code after the RTS of the "wrapped" function). Anyway, this wrapper routine uses the byte at frameptr+8 to store the number of bytes it has copied from $0080 to the bytecode stack, in order to restore that number of bytes back to $0080 when the wrapped native-code function exits.

It looks like compiler-generated native code uses $0080~ as local variable storage, and each native-code function uses this wrapper to preserve its caller's local variables (if any) and to allocate stack space for the arguments it needs to pass to any functions that it calls.


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Fri Aug 04, 2017 10:07 am 
Offline

Joined: Thu Oct 13, 2005 10:39 am
Posts: 66
AWJ wrote:
Even if I can't run your scripts without IDA, I can port them to my own disassembler :)


enjoy https://pastebin.com/sMW8wLXS
me commented some portions of code, the rest is my own or system library functions...

all nes koei games has a full list of bitecode procedures and can be disassembled automatically using my scripts and ida ;). but the offsets are for my own loader, they has no correlation with real offsets except lower bits...

some opcodes I didn't reversed deeply and just nailed briefly, I took from your description (comments are given). bitfield opcodes never used in the nes bytecode (as I see), as well as the relative branches... so I just didn't care about it by now..

AWJ wrote:
Well, I've found one use for the "hole" at frameptr+8. Your tip that there was also compiler-generated native code was just what I needed. Take a look at the routine at 1F/E24B in Ishin no Arashi, 1F/E37A in Rot3K2 or 1F/E2E3 in L'Empereur (the same routine exists in BKoAC and probably most/all of the games, but it's not always in the fixed bank. I guess the games that have it in the fixed bank are the ones with the most compiler-generated native code)

The purpose of this routine seems to be to create a bytecode-compatible stack frame for a native-code function, and to clean up that stack frame when that function exits (it does some 6502 stack manipulation to wedge its cleanup code after the RTS of the "wrapped" function). Anyway, this wrapper routine uses the byte at frameptr+8 to store the number of bytes it has copied from $0080 to the bytecode stack, in order to restore that number of bytes back to $0080 when the wrapped native-code function exits.

It looks like compiler-generated native code uses $0080~ as local variable storage, and each native-code function uses this wrapper to preserve its caller's local variables (if any) and to allocate stack space for the arguments it needs to pass to any functions that it calls.


this is actually a native version of the bytecode exec procedure. it is used for every native procedure start. the same way as for bytecode procedures. but instead this functions executes the real 6502 asm and then returns back to the caller routine. it does the same thing as for bytecode procedures, prepares the stack and executes the native code instead of the bytecode. it gets the same 16-bit signed value to assign the local variables buffer for procedure, but has an extra parameter byte, which is used to backup a number of vm stack bytes to 80-buffer. it will be stored in your 8-byte stack offset. but, I haven't seen any functions used any value apart from 0 here. l'empereur and uncharted waters never used this byte for sure. it's always 0. so copy to 80 buffer never used there. I doubt it used anywhere else... so this is for sure redundant leftover of some planned but never used feature of the compiler to save some stack parameters..


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Fri Aug 04, 2017 8:13 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
Thanks to your script I see what the return address at frameptr+9 is used for, too. It isn't used by the bytecode interpreter but it is used by the native-code function call wrapper (the routine that's approximately equivalent to bytecode opcode $E9). I guess they wanted to make the stack frames identical between bytecode and native-code functions to make debugging easier.

A funny thing about that routine is that the "stack adjust amount" parameter that it takes includes the return address! So even function calls with no arguments need to use a value of 2.

Quote:
this is actually a native version of the bytecode exec procedure. it is used for every native procedure start. the same way as for bytecode procedures. but instead this functions executes the real 6502 asm and then returns back to the caller routine. it does the same thing as for bytecode procedures, prepares the stack and executes the native code instead of the bytecode. it gets the same 16-bit signed value to assign the local variables buffer for procedure, but has an extra parameter byte, which is used to backup a number of vm stack bytes to 80-buffer. it will be stored in your 8-byte stack offset. but, I haven't seen any functions used any value apart from 0 here. l'empereur and uncharted waters never used this byte for sure. it's always 0. so copy to 80 buffer never used there. I doubt it used anywhere else... so this is for sure redundant leftover of some planned but never used feature of the compiler to save some stack parameters..


It's actually the other way around: on function entry it copies n bytes from $80 to the stack, and on function exit it copies them from the stack back to $80.

I thought that feature might be used for local variables that are pointers (and thus have to be in zero page) but I guess the compiler-generated native code just copies pointer variables from the stack to scratch space (e.g. $08-$14) upon use, rather than keeping them in zero page for the entirety of a function?

I've reverse engineered the entire MMC5 BIOS, all three versions (except a couple of version-1-only syscalls that I can't figure out and might actually be incomplete/nonfunctional) Are you interested in a description or have you already done it yourself? I've also reverse engineered the sound program if anyone is interested (the one used in Mahjong Taikai, Famicom Top Management and all the MMC5 games--the first three MMC1 games have a totally different sound program which I've barely looked at)


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Sun Aug 06, 2017 1:10 pm 
Offline

Joined: Thu Oct 13, 2005 10:39 am
Posts: 66
AWJ wrote:
It's actually the other way around: on function entry it copies n bytes from $80 to the stack, and on function exit it copies them from the stack back to $80.

I thought that feature might be used for local variables that are pointers (and thus have to be in zero page) but I guess the compiler-generated native code just copies pointer variables from the stack to scratch space (e.g. $08-$14) upon use, rather than keeping them in zero page for the entirety of a function?

yeah, you right. but it seem just more like an attempt to have a separate set of bytecode registers for a native procedures which is overwrites those from procedural stack. if used, they may run two separate native procedures with the same set of registers... maybe for cases when some other inbetween procedure changes it somehow... but anyway, it's never used

AWJ wrote:
I've reverse engineered the entire MMC5 BIOS, all three versions (except a couple of version-1-only syscalls that I can't figure out and might actually be incomplete/nonfunctional) Are you interested in a description or have you already done it yourself? I've also reverse engineered the sound program if anyone is interested (the one used in Mahjong Taikai, Famicom Top Management and all the MMC5 games--the first three MMC1 games have a totally different sound program which I've barely looked at)


I nailed some functions for some games of my personal interest but maybe someone will like to see it too ;)


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Sun Aug 06, 2017 10:06 pm 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
After disassembling some native-code routines in Ishin no Arashi, I understand some things I didn't understand before. In the compiler-generated native code, zero page address $06-$07 (which in the interpreter is the instruction pointer) is used as a second frame pointer instead, and contains the value frameptr - fastlocals - 256. The "- 256" is because the 6502 doesn't support indexed addressing with a negative index. Local variables on the stack are addressed using ($06),Y where Y = #$FE for the first (word-sized) local, #$FC for the second local, etc. I was wondering why the stack frame setup did DEC $07 before jumping to the wrapped code and INC $07 after returning from it, but now I see why.

ETA: Bandit Kings of Ancient China does use the "fast local variables at $80" feature; here's an example of a function that uses it (output from my disassembler):

Code:
14/AAE0: 20 99 B1      jsr CreateStackFrame
14/AAE3: 04 26 FF      frame 4,-218
14/AAE6: A0 0B         ldy #$0B
14/AAE8: B1 04         lda (frameptr),y
14/AAEA: 85 80         sta $80
14/AAEC: C8            iny
14/AAED: B1 04         lda (frameptr),y
14/AAEF: 85 81         sta $81
14/AAF1: C8            iny
14/AAF2: B1 04         lda (frameptr),y
14/AAF4: 85 82         sta $82
14/AAF6: C8            iny
14/AAF7: B1 04         lda (frameptr),y
14/AAF9: 85 83         sta $83
14/AAFB: 8A            txa
14/AAFC: 48            pha
14/AAFD: 48            pha
14/AAFE: 48            pha
(snip...)


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Mon Aug 07, 2017 11:19 am 
Offline

Joined: Thu Oct 13, 2005 10:39 am
Posts: 66
AWJ wrote:
ETA: Bandit Kings of Ancient China does use the "fast local variables at $80" feature; here's an example of a function that uses it (output from my disassembler):


I see native code uses it as a vars/regs as well

Code:
BANK14:A61B          loc_20661B:                             ; CODE XREF: BANK14:A64Bj
BANK14:A61B                                                  ; BANK14:loc_20669Cj
BANK14:A61B 18                       CLC                     ; *a
BANK14:A61C A9 07                    LDA     #7              ; *a
BANK14:A61E 65 80                    ADC     byte_80         ; *a
BANK14:A620 85 80                    STA     byte_80         ; *a
BANK14:A622 8A                       TXA                     ; *a
BANK14:A623 65 81                    ADC     byte_81         ; *a
BANK14:A625 85 81                    STA     byte_81         ; *a
BANK14:A627
BANK14:A627          loc_206627:                             ; CODE XREF: BANK14:A618j
BANK14:A627 A1 80                    LDA     (byte_80,X)     ; *a
BANK14:A629 A0 01                    LDY     #1              ; *a
BANK14:A62B 11 80                    ORA     (byte_80),Y     ; *a
BANK14:A62D F0 03                    BEQ     loc_206632      ; *a
BANK14:A62F 4C 4E A6                 JMP     loc_20664E      ; *a

it seems, most of native code may be decompiled the same way as bytecode, except maybe some optimizations... like they just unroll all bytecode with an native asm representing every single command, maybe with some opts


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Mon Aug 28, 2017 10:39 am 
Offline

Joined: Mon Nov 10, 2008 3:09 pm
Posts: 431
Heehee... can you find the bugs in these implementations of the C library functions abs() and strlen()? Hints: stackptr+2 points to the first argument on the stack, and left holds the (16-bit) function return value (meaning left+2 and so forth are effectively scratch space)

Code:
abs:
    ldy #2
    lda (stackptr),y
    sta left
    iny
    lda (stackptr),y
    sta left+1
    bpl done
    eor #$FF
    sta left+1
    lda left
    eor #$FF
    clc
    adc #1
    sta left
done:
    rts


Code:
strlen:
    ldy #3
    lda (stackptr),y
    sta left+3
    dey
    lda (stackptr),y
    sta left+2
    ldy #0
    sty left
    sty left+1
loop:
    lda (left+2),y
    beq done
    iny
    bne loop
    inc left+1
    jmp loop
done:
    sty left
    rts


I'd be surprised if the abs() bug doesn't impact game logic in at least one game, most likely in something non-obvious like AI (not all the games contain all the library functions, but the games that have abs() do call it...)


Top
 Profile  
 
 Post subject: Re: Koei bytecode
PostPosted: Mon Aug 28, 2017 3:35 pm 
Offline

Joined: Thu Oct 13, 2005 10:39 am
Posts: 66
Abs does not increment higher nibble of negative value after inverting it. Strlen does not increment hi nibble of str pointer as well as the result. But seems they have not so much strings longer than 256 or they rarely need strlen counting...


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 43 posts ]  Go to page Previous  1, 2, 3

All times are UTC - 7 hours


Who is online

Users browsing this forum: gauauu, Google [Bot] and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group