It is currently Fri May 24, 2019 7:48 am

All times are UTC - 7 hours

Post new topic Reply to topic  [ 2 posts ] 
Author Message
PostPosted: Tue Feb 06, 2018 12:36 pm 

Joined: Sun Jan 31, 2016 9:55 pm
Posts: 54
I've been working on a sort-of emulator for the 6502 as part of a compiler project. Here's the github repo. It symbolically executes 6502 code.

I was hoping to get more eyes on the emulator code. I've probably got some bugs in there. In the repo, emulator.h is the core of the 6502 emulation. The emulator takes a machine (virtual 6502 processor) and a single instruction, and applies the instruction to the machine. (Repeat as desired). There are two machine implementations -- random_machine.h is a processor with randomized actual numbers for all of its state. abstract_machine.h is a processor with symbolic values (like algebraic variables), and after each instruction the state is a formula for the resulting state in terms of the initial state. I wrote emulator so that it can take both types of machines. Note that I'm using my own representation for the instructions, with a byte for the operation (adc, plp, etc), followed by a nybble for the addressing mode (absolute, indirect y, etc), and a nybble for the operand.

Any help would be great!
I'm also open to suggestions to expand this. I want to add the unofficial opcodes for the 2a03, and maybe modes for other 6502 variants -- 65c02, etc. I also want to parameterize it by the set of live variables, especially the flags.

More info about the project:

Using the z3 theorem prover, I created a symbolic execution engine for the 6502. It can run small sequences of 6502 instructions and prove equivalences between them.

It's not able to run all programs. There are some limitations:
  1. Except the first instruction, none of the instructions can be the target of a jump. This is meant for basic blocks, not full programs.
  2. All instructions are supported except brk and jsr, because they create implicit jump targets (control flow returns after rts or rti).
  3. There are a limited set of operands:
    1. Up to three 16-bit addresses (for absolute, jump/branch targets, etc.). Called absolute0, absolute1, absolute2
    2. Up to four 8-bit addresses (for zp, indirect, etc.). Called zp0, zp1, zp2, zp3.
    3. Two 8-bit literals, c0 and c1. In addition, these operands are valid for immediate mode: #0, #1, #c0, #c1, #(c0+1), #(c1+1), #(c0+c1)

Think of the operands as unknown variables. c0 could be any 8-bit value, absolute0 could be any 16-bit value.

My program can find somewhat equivalences like this:

lsr; bmi Absolute0 <-> lsr
(This is true because after lsr, the high bit of a is guaranteed to be 0). (Yes, I use a 16-bit value as the operand for branch instructions. Just think of it as the result of adding the signed 8-bit offset)

I have code that basically tries all instruction sequences up to a certain length and compares them to build up a database of optimizations. There are a bunch of tricks to speed this up (see this paper if that interests you). That code (enumerator.cpp) is really messy and not quite working yet -- I don't need a code review for it now.

edit: fixed paper url

PostPosted: Tue Feb 06, 2018 6:01 pm 

Joined: Thu Aug 20, 2015 3:09 am
Posts: 451
Sweet! This is exactly the sort of thing I wanted for my compiler. I even read the same paper!

Don't have time to look it over now (and won't for a good long while) but I'll be back to help if I get half a chance.

Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 2 posts ] 

All times are UTC - 7 hours

Who is online

Users browsing this forum: No registered users and 10 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group