It is currently Fri Nov 16, 2018 12:55 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 32 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Mon Aug 12, 2013 9:10 am 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
So I've been developing a bytecode interpreter for the NES to solve a few problems. I'm using it to do r&d. Here's why I did it:

1) 6502 is hard - too hard to try out ideas on or get something running quickly.

2) C is not much easier and not space-efficient at all.

3) NES has tight storage, so space-maximizing measures are advantageous

I'd been a fan of Forth for several years and became a minor expert in it. I thought it was the perfect solution to primarily the problem of difficulty of coding. Forth simplifies things a lot, at the cost of speed. The savings in size are almost a side benefit, but they are substantial.

What I settled on is a virtual machine with 8-bit instructions and an 8-bit stack. It's almost fully functional - still tweaking the instruction set. I optimized it as best I could - the estimated speed is about 600 900 instructions on average per frame. This is enough for rapid prototyping of ideas ... but also useful where speed is not critical, like decompression. I estimate that the size savings it provides over plain asm is about 30%. That ratio, as well as the speed can be improved with peephole optimization (when the compiler combines commonly used pairs of "words" into faster hybrids)

A "compiler" coded in SwiftForth (PC) will generate asm6 assembly code. It should run on the trial version.

I thought I might create a thread to let people know what I'm doing, gauge interest, maybe get help with any issues I run into or just hear ideas and answer questions.

8/15/2013
I threw the runtime source up in a Google Code project - https://code.google.com/p/eighth6502/ sorry no docs, compiler, or example yet

8/18/2013
Updated OP to be a little more general and up-to-date

_________________
sonder


Last edited by sonder on Fri Aug 23, 2013 9:11 am, edited 4 times in total.

Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Mon Aug 12, 2013 5:33 pm 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
I had a eureka moment and figured out a way to speed it up by 1.5x using a tiny bit of self-modifying code, while also saving a significant amount of ROM space. Like hundreds of bytes.

An additional improvement is it can now call arbitrary asm, so asm and bytecode can be freely mixed. I am looking to add 16-bit math next.

The only thing that needs doing now is testing it. I spent a long time debugging the first version, and I'm not really keen on doing that again, so I'd love it if anyone was interested in maybe helping beta test it.

To give an idea of how optimized this is... here is the heart of it, the "inner loop"

Code:
_vmnext:
   ldx   $0000, y        ; 4
   iny                   ; 2
   stx   VMNEXT_JMP      ; 3
   jmp   opcode_page     ; 3
                         ; +3 for the jmp to this routine = 15


VMNEXT_JMP is actually the ZP address of the lower byte of the literal within the next instruction, only one page of ROM used for opcode implementations. one of the virtual registers IPBASE is actually the 2nd and 3rd bytes of the first instruction in this listing.

_________________
sonder


Last edited by sonder on Mon Aug 12, 2013 7:57 pm, edited 3 times in total.

Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Mon Aug 12, 2013 6:01 pm 
Offline
User avatar

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 4105
Ever read about Steve Wozniak's Sweet16 interpreter?

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Mon Aug 12, 2013 6:23 pm 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
Dwedit wrote:
Ever read about Steve Wozniak's Sweet16 interpreter?


Neato - no I hadn't heard of that. It isn't practical to have a full 16-bit VM for realtime code (rather I'm going to just have some extra instructions for doing a couple 16-bit operations) but I will have to look into this later for ideas or code. Thanks.

Btw... I noticed your alias... Dan Weiss right? We've collabed before. :)

_________________
sonder


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Wed Aug 14, 2013 9:47 pm 
Offline
User avatar

Joined: Mon Dec 07, 2009 11:08 am
Posts: 95
Location: USA
Thanks for the updates sonder. Is your implementation living somewhere out on the web for perusal? I am interested in checking it out.


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Wed Aug 14, 2013 10:00 pm 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
Update:

During the past couple days I debugged it and optimized it more. Comparison tests, if/then, for loops and while/until control structures are supported and functioning perfectly. A very small set of opcodes and routines for working with 16-bit numbers were added.

On the testing side of things I was able to get significantly more game features going on in roughly same number of cycles with tons to spare (currently using around 1/4 of the frame. About half of that is just the metasprite rendering (about 16 sprites), which is asm, for perspective. One nice thing is calling arbitrary asm routines is pretty fast so they are not limited to being behemoths to justify their existence. Surprisingly, because the interpreter is so fast, a lot of the time trying to squeeze out more speed by turning something into asm doesn't actually buy much. But as the complexity of a bytecode routine grows, the benefit of turning it into asm increases by a lot. So it is still best suited to stuff that can be slow, or simple high-level logic.

Ready for some example code? Here is an excerpt, just really basic moving a guy around. Doing it by hand for now, since the interpreter was basically rehauled and requires a major update to the compiler, but it's actually not too bad.

Code:
do_player_movement:

   mcallasm andcase
   .db PAD_LEFT
mif +then
   mcallasm xfetch
   mdouble $FFF0
   .db <dplus
   mcallasm xstore
   .db <drop,<exit
+then

   mcallasm andcase
   .db PAD_RIGHT
mif +then
   mcallasm xfetch
   mdouble $0010
   .db <dplus
   mcallasm xstore
   .db <drop,<exit
+then

   mcallasm andcase
   .db PAD_UP
mif +then
   mcallasm yfetch
   mdouble $FFF0
   .db <dplus
   mcallasm ystore
   .db <drop,<exit
+then

   mcallasm andcase
   .db PAD_DOWN
mif +then
   mcallasm yfetch
   mdouble $0010
   .db <dplus
   mcallasm ystore
+then
   .db <drop,<exit

doplayer:
   .db <zero                     ; push a zero
   mcallasm padstate
   mcall do_player_movement
   mcallasm screenxy
   mdouble ms_player_idle_fwd
   mcallasm metaspr
   .db <oplus                    ; go to next object
   .db <exit


Once the compiler is done this would be how the same code will look in Forth-style syntax:

Code:

: do_player_movement  ( padstate -- padstate )
   andcase b: PAD_LEFT if
      xfetch $FFF0 d+ xstore exit
   then
   andcase b: PAD_RIGHT if
      xfetch $0010 d+ xstore exit
   then
   andcase b: PAD_UP if
      yfetch $FFF0 d+ ystore exit
   then
   andcase b: PAD_RIGHT if
      yfetch $0010 d+ ystore
   then ;

: doplayer
   0 padstate do_player_movement drop
   screenxy ms_player_idle_fwd metaspr
   o+ ;


_________________
sonder


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Wed Aug 14, 2013 10:04 pm 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
dullahan wrote:
Thanks for the updates sonder. Is your implementation living somewhere out on the web for perusal? I am interested in checking it out.


No not yet. It just needs to be taken out of the game project folder and cleaned up. I will do that tomorrow since it's in a good working state and throw it in a SVN repository.

_________________
sonder


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Thu Aug 15, 2013 8:52 am 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
Just made the code available, see OP.

_________________
sonder


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Fri Aug 16, 2013 9:36 pm 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
Well the desire for speed and more efficient nesting has led to yet another "rehaul" (though not as big as the last one) and I am thinking of moving to 16-bit "opcodes" (really 16-bit direct threaded code, classic Forth)

I discovered a way to do the inner loop for threaded code (basically lists of addresses) that is probably the fastest possible. It's 20 cycles so a bit slower, but, now there is absolutely no penalty for calling arbitrary asm where as before it was ~30 (on top of the 15 cycles for the inner loop that's 45 cycles just to execute something as simple as fetching an object property). As a side bonus nesting interpreted routines is a little faster. But fetching literals is almost twice as slow.

The size savings would be likely less than bytecode but not that bad. This is because of how often custom routines are called. Because all calls are now 2 bytes instead of some being 1 and some being 3, it nearly balances out. I probably don't care about size as much as before because I've moved from NROM to UNROM.

I originally envisioned a bytecode interpreter because I wanted to also use it for data compression but it's looking like I might end up doing what would have seemed counter-intuitive before my pursuit of speed started - two interpreters one for reasonably fast Forth scripting and one specifically tailored for size over speed. The bytecode interpreter may actually become a slower, limited subset of the DTC one, relying on a lot of the same code and variables just a different separate inner loop. The idea was to set it up so the opcodes are extensible, to for instsance specify data using basically simple inline scripts within the map data.

_________________
sonder


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Sat Aug 17, 2013 6:09 am 
Offline
User avatar

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 4105
By the way, how is Forth as a programming language? I've never tried it myself. Seems like it uses RPN extensively, but how different would it be from Basic or C if it used infix notation?

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Sat Aug 17, 2013 7:09 am 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
Dwedit wrote:
By the way, how is Forth as a programming language? I've never tried it myself. Seems like it uses RPN extensively, but how different would it be from Basic or C if it used infix notation?


It is slower than C (in Eighth's case probably 2x), but faster than Basic. Its big draw is ease of implementation, but it does also have the benefit of teaching you to be a better coder if you stick with it. Mostly because of how the requirement to learn to factor well to effectively use it beefs up your abstraction practices.

It's like C in that it grants you a powerful gun with which to shoot yourself, but like Basic in that it is doesn't have nearly as much punctuation and required boilerplate.

It has no types, which can be seen as a benefit. The one type is the cell - the width of the data stack. In my interpreter Eighth, that's just 8 bits. Any others are derivative and managed by you. This isn't hard if you just notate things using stack diagrams or use certain name conventions (e.g. "<name>?" returns either 0 or -1). Address and 16-bit math are accomplished with "doubles", pairs of values on the stack. You learn not to think in more than a handful of types which are fluid and morph into each other frequently.

It can be implemented in a couple days, like I did. The runtime portion anyway. Doing a complete system takes a couple weeks to months.

It lets routines return multiple values for no extra work. You don't have to name them, just push them on the stack.

It's eminently extensible. Want a custom control structures, inline data (like strings), or custom compiler keywords? Trivial. Many language features can be added, but you get the advantage of being able to tailor it specifically to your app. Many things can't be easily added in full (like C-like structs) but you learn to work around, for instance by using its equivalent of namespaces, or prefixes. Or figuring out some other way to do what you want to do in the end entirely.

Being fully concatenative it lends itself to very natural-reading code (admittedly in your hands.) It frees you to think in broad strokes when reading code, instead of poring through it line-by-line like a computer. You pay more attention to the structure and logic of the program than the nitty gritty. Oftentimes rewriting subroutines or entire modules from scratch rather than trying to shoehorn in kludgy tweaks. Because ideally they should be really short and clean. This gave it its "write-only" rep but I disagree I think it is very readable, just not always easy to make changes to a complex routine, which would be your fault anyway. Keeping your subroutines simple (you're supposed to use lots of subroutines) frees you from having to think like a compiler. (Unfortunately Eighth's subroutines are currently kind of expensive...)

Not requiring named parameters makes code really fluid. Bits and pieces can be easily lifted out and turned into subroutines far easier than any other language I've used (I've heard Lisp is similar though). That contributes to the fun factor. Local named variables/params are supported in fancier Forths.

So TL;DR, actually, Forth is really, really different than both C and Basic. In my opinion it's both easier and more powerful if your goal is writing a program fast. The fundamentals can be learned in a day, but mastering it takes time. You have to unlearn a lot of assumptions and learn all the tricks. In the meantime writing simple scripts should be a cinch for just about anyone - in the old days they'd give customers (like scientists and businesspeople) app-specific vocabularies and let them write scripts (which were really just Forth programs!) all the time.

_________________
sonder


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Sat Aug 17, 2013 7:25 am 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
sonder wrote:
Well the desire for speed and more efficient nesting has led to yet another "rehaul" (though not as big as the last one) and I am thinking of moving to 16-bit "opcodes" (really 16-bit direct threaded code, classic Forth)


Right now I'm not sure if it'll be worth the effort to implement 16-bit DTC (direct threaded code). Free arbitary asm, 1.5x faster subroutines, and much faster for/nextare the advantages. The big disadvantage is actually that literals are slow. More than 2x actually. And they are ubiquitous, so that's no bueno. It's totally unclear how it balances out right now, if there's enough overall speed gain to justify. This may be a future upgrade. Kind of tired of rehauling this thing.

_________________
sonder


Last edited by sonder on Sun Aug 18, 2013 10:35 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Sun Aug 18, 2013 10:29 am 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
sonder wrote:
sonder wrote:
Well the desire for speed and more efficient nesting has led to yet another "rehaul" (though not as big as the last one) and I am thinking of moving to 16-bit "opcodes" (really 16-bit direct threaded code, classic Forth)


Right now I'm not sure if it'll be worth the effort to implement 16-bit DTC (direct threaded code). Free arbitary asm, 2x faster subroutines, and much faster for/next are the advantages. The big disadvantage is actually that literals are slow. More than 2x actually. And they are ubiquitous, so that's no bueno. It's totally unclear how it balances out right now, if there's enough overall speed gain to justify. This may be a future upgrade. Kind of tired of rehauling this thing.


Just did a quick analysis of the test script and literals are far less common than callasm's and call's. Moving to 16-bit DTC would eliminate 903 cycles (8 scanlines) of overhead. (originally 1759 - 16 scanlines - so around 2x faster)

That was a naive assumption. I just spent an hour creating a DTC version of the interpreter. I'll post actual test results soon.

_________________
sonder


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Sun Aug 18, 2013 10:32 pm 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
The speed increase, as I thought, wasn't as dramatic as I originally hoped. Debugging the DTC version was a BITCH due to the instruction pointer being pre-incrementing now. But the savings in the test program after everything was about 4 scanlines. I tried to get literals down as much as I could, but I could only get them down to 36 cycles, down from ... like 50. I forget. (but compare to the old VM's 25... and i considered that slow...)

It's starting to get to the point of frustration - come up with a great idea, bang it out, then reality hits and after all the work that goes into debugging it, it yields only an incremental improvement. Bleh. Well, what's done is done and now it's slightly faster. Besides, I'm sore from parkour class yesterday so I was fine with coding on the couch half the day :)

_________________
sonder


Top
 Profile  
 
 Post subject: Re: Bytecode Interpreter
PostPosted: Wed Aug 21, 2013 1:39 am 
Offline
User avatar

Joined: Wed Jun 26, 2013 12:35 pm
Posts: 116
Location: Baltimore
Banged out a brand new compiler supporting DTC and several useful bonus features. Converted the player's script, and got it working fairly quickly:

Code:

\ these aren't even used here ... just an example of adding asm routines

code{ uplus
   tsx
   clc
   adc $102,x
   sta $102,x
   vmnext_pla
}

code{ twostore
   tax
   pla
   sta 0,x
   pla
   sta 1,x
   vmnext_pla
}

asm{
enum TEMP
   NX: .db 0
   NY: .db 0
   PX: .db 0
   PY: .db 0
ende
}


\ hardwired obj size = 16x16 for now

: solid_obj_move
;


bindlit PAD_LEFT PAD_LEFT
bindlit PAD_RIGHT PAD_RIGHT
bindlit PAD_UP PAD_UP
bindlit PAD_DOWN PAD_DOWN
bindlit OVERWORLD_POS OVERWORLD_POS

bind2lit ms_player_idle_right ms_player_idle_right

bindword vx! vxstore
bindword x! xstore
bindword vy! vystore
bindword y! ystore
bindword vx@ vxfetch
bindword x@ xfetch
bindword vy@ vyfetch
bindword y@ yfetch
bindword load_screen
bindword screenxy
bindword padstate
bindword metaspr

: do_player_control
   w: 0 2dup vx! vy!
   andcase dw: PAD_LEFT if $ffec vx! exit then
   andcase dw: PAD_RIGHT if $0014 vx! exit then
   andcase dw: PAD_UP if $ffec vy! exit then
   andcase dw: PAD_DOWN if $0014 vy! exit then
;

: load_overworld_screen
   OVERWORLD_POS @ load_screen
;

: do_player_world
   x@ drop $0f > if 1 OVERWORLD_POS +! 0 dup x! load_overworld_screen exit then
   x@ drop $00 < if -1 OVERWORLD_POS +! $0f00 x! load_overworld_screen exit then
   y@ drop $0f > if 4 OVERWORLD_POS +! 0 dup y! load_overworld_screen exit then
   y@ drop $00 < if -4 OVERWORLD_POS +! $0b00 y! load_overworld_screen exit then
;

: doplayer
   0 padstate do_player_control drop
   x@ vx@ d+ x!   y@ vy@ d+ y!
        do_player_world
   screenxy ms_player_idle_right metaspr
   o+
;

_________________
sonder


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 32 posts ]  Go to page 1, 2, 3  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Gilbert and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group