Source code shuffling

Discuss technical or other issues relating to programming the Nintendo Entertainment System, Famicom, or compatible systems.

Moderator: Moderators

Post Reply
tepples
Posts: 22149
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Source code shuffling

Post by tepples » Sat Mar 20, 2010 1:30 pm

If you write after the end of a buffer, you affect whatever is stored after the buffer. Unmanaged languages like assembly language and C make it easier to end up with a buffer overflow in a program than in managed languages like C# or Java, which automatically check each array access against the array's lower and upper bounds. In fact, assembly language is worse than C in this respect because it makes no type distinction between a scalar variable and the first element of an array or struct.

Ordinarly, you can detect a buffer overflow from the behavior of a program when it uses what is stored after the overflowed buffer. However, if you write after a buffer into memory that you're not using at the moment, you may not see the effect of the buffer overflow until it's too late. So one technique to detect buffer overflows is to randomize the order of things in memory, so that each buffer that can be overflowed is more likely to eventually end up before something where the effect of the overflow is visible.

So I'm proposing an extension to 6502 assembly language that introduces a new control command called .shuffle. An assembler should permute the lines between .shuffle and .endshuffle when assembling the program, so that variables end up in a different order each time. For example:

Code: Select all

.shuffle
foo: .res 32
bar: .res 4
baz: .res 4
cnut: .res 32
.endshuffle
might become

Code: Select all

cnut: .res 32
baz: .res 4
foo: .res 32
bar: .res 4
It's also useful for finding overflows that fall off the end of a read-only data segment. For these, you'll want to permute chunks longer than one line, which is why the .shuffle keyword takes a delimiter argument.

Code: Select all

.shuffle THE_GAME
title_screen:
  .byt ...
  .byt ...
THE_GAME
character_menu:
  .byt ...
  .byt ...
THE_GAME
stage_menu:
  .byt ...
  .byt ...
.endshuffle
A similar mechanism at the program loader level in operating systems with virtual memory has been called ASLR.

Another potential application even after you have found and fixed buffer overflows is binary fingerprinting. If you are distributing copies of a program under nondisclosure agreement, and you want to covertly mark each copy to make it traceable, you can do so by permuting the subroutines and variables in each copy.

If you like the idea, I plan to implement it as a preprocessor in Python. This would work for NESASM, CA65, ASM6, and even C compilers.

User avatar
Dwedit
Posts: 4365
Joined: Fri Nov 19, 2004 7:35 pm
Contact:

Post by Dwedit » Sat Mar 20, 2010 2:00 pm

What about canaries?
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!

tepples
Posts: 22149
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Sat Mar 20, 2010 2:07 pm

What's so "tweet" about shuffling is that it uses a randomly selected other variable as your canary so that it doesn't take up any more RAM at run time.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg » Sat Mar 20, 2010 2:13 pm

Canaries would be useful, though don't necessarily have space, especially in zero-page. The assembler could generate a list of their addresses as a table that your code then iterates over to initially fill them, and later verify their values. Maybe you could limit them to larger objects, avoiding them for byte and word-sized objects, the most common and unlikely to be accessed in an indexed manner.

What about a completely different approach, using an instrumenting emulator? You notate which data structures are independent, such that you never rely on their relative addresses. The assembler generates a list of each structure that the emulator has access to. Then it tags pointers and ensures you never access outside a structure. For example, lda table,x would bounds-check x to be within the table. It could tag upper and lower bytes of addresses too, later detecting when they're used as a word, for when you copy them to a zero-page pointer and then use indirect indexed addressing.

As for randomizing (the actual topic!), even just reversing the order of objects in memory would have a useful effect. I wouldn't be surprised if some assemblers offered reverse allocation as an option already.

tepples
Posts: 22149
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Sun Mar 21, 2010 7:47 pm

tepples wrote:assembly language is worse than C in this respect because it makes no type distinction between a scalar variable and the first element of an array or struct.
blargg wrote:Maybe you could limit them to larger objects, avoiding them for byte and word-sized objects, the most common and unlikely to be accessed in an indexed manner.
Sometimes one might forget that a variable is not indexed. For example, one might forget that a variable is one byte per game instance, not one byte per player.
blargg wrote:What about a completely different approach, using an instrumenting emulator?
In a way, that's what C# and Java do. But in CLR and JVM, an array iterator isn't just a raw pointer to an element. Instead, it's a pointer to the start of the array and an index. This works well as long as the CPU's word size can hold the index, which is true of the 32-bit machines that run C# and Java but not necessarily of the 8-bit NES with an array larger than 256 bytes.
blargg wrote:You notate which data structures are independent, such that you never rely on their relative addresses.
In fact, such markup would look a lot like .shuffle commands.
blargg wrote:It could tag upper and lower bytes of addresses too, later detecting when they're used as a word, for when you copy them to a zero-page pointer and then use indirect indexed addressing.
That'd be a research project because it'd have to detect when the program is changing the upper byte of the address to hit more than 256 bytes, or when changes the address to point to one row of the array from which y indexes a column. Of course, there would have to be an exception for init code when it zeroes BSS and copies DATA from ROM to RAM. A further complication involves storing an address 1 byte before the entry point of a subroutine in order to set up an RTS dispatch.
blargg wrote:As for randomizing (the actual topic!), even just reversing the order of objects in memory would have a useful effect. I wouldn't be surprised if some assemblers offered reverse allocation as an option already.
Good idea. I've added command-line options to shuffle four ways: A. forward, B. reverse, C. with a seed derived from a string specified on the command line, or D. with a seed derived from urandom.

It's fairly easy to integrate this into a makefile. One way reshuffles the file every time you change the source code:

Code: Select all

AS65 = ca65

$(objdir)/%.o: $(objdir)/%.s

	$(AS65) $(CFLAGS65) $< -o $@

$(objdir)/%.shuffle.s: $(srcdir)/%.s
	tools/shuffle.py $< -o $@
Or changing the seed to be based on the file name would let you freeze the shuffle order while you investigate a given defect:

Code: Select all

$(objdir)/%.shuffle.s: $(srcdir)/%.s
	tools/shuffle.py $< --seed 345$< -o $@
Concentration Room 0.02 will be shuffle-enabled.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg » Sun Mar 21, 2010 10:10 pm

I was going to mention my desire for a similar way to mark routines, so that the linker could dead-strip all unused routines from the final result. You'd simply put some directive between each routine, allowing the linker to treat them as independent units that it could reorder or remove entirely.

I just realized that coupled with your marking of the boundaries between independent objects, the linker could ALSO remove unused objects. That is something I'm very interested in, because it allows much cleaner utility libraries to be used, without worrying about them bloating your program if you only use a few routines.

So what if we introduced a single new directive, something like .separator (not a good name, but I can't think of anything), which could be placed in any segment, and would allow the linker to treat what's before it as independent from what's after it. This means that code couldn't assume anything about the relative addresses, thus the linker could remove something with no direct references to it.

This would even pave the way for another directive I've wanted for the 6502, that tells the linker to avoid placing the current block at an address that causes any marked branches to cross a page. This would allow it to choose the best place for such routines, without wasting a lot of space as you currently have to do by adding a .align before it.

Then your shuffling would be a trivial addition to this very useful feature. I'll have to take a crack at adding this to ca65 over the next few days. If it's successful, maybe I can get my "suspicious use/absence of #" warning accepted into the main sources as well.

Code: Select all

.bss
tables:
        .res $F0
.separator
        ; Tells linker to avoid placing foo at an address that would cause
        ; foo through the next label or the next .separator directive
        ; to cross a page.
.nopagecross
foo:
        .res $36
.separator
bar:
        .res $200
.separator

.code
reset:
        jsr liba
        ...
.separator
liba:   lda foo,x ; will never cross page
loop:   ... lots of code
        ; Tells linker to prevent this branch from crossing a page
        .nopagecross
        bne loop
        rts
.separator
libb:   lda bar,x
       rts
.separator
Then if linked, libb and bar would never even make it into the executable, foo would probably be placed before tables, and the bne in liba would be guaranteed not to cross a page.
Concentration Room 0.02 will be shuffle-enabled.
That'll improve its card shuffling, right?

tepples
Posts: 22149
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Mon Mar 22, 2010 7:21 am

blargg wrote:I just realized that coupled with your marking of the boundaries between independent objects, the linker could ALSO remove unused objects.
So something like ca65's .ifref?

While developing shuffle.py, I've added a detailed manual, describing the rationale, various uses, and caveats, as a docstring at the top. Then I realized it takes up half the program's source code.

I imagine that .nopagecross could be a macro in terms of <* and >* unless the assembler complains about something being "not constant". (I'm away from my dev PC and can't test right now.)
Concentration Room 0.02 will be shuffle-enabled.
That'll improve its card shuffling, right?
The card shuffling is already fine. But I did find a couple unnecessary instructions in the "deal" phase and stripped them out while adding .shuffle blocks around instructions that have no mutual dependency, such as 'txa' vs. 'clc' before an add, or 'ldx' vs. 'ldy'.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg » Mon Mar 22, 2010 8:00 am

tepples wrote:
blargg wrote:the linker could ALSO remove unused objects.
So something like ca65's .ifref?
Yeah, except it actually works for dead-stripping. I tried .ifref a while back and couldn't get it to work for this purpose. Here we go, my test showed that it only works if referenced before the .ifref directive in the source file; if referenced afterwards, .ifref's condition is false.

Fundamentally a .nopagecross directive would mark a region of the current "block" as needing to be within a page. A simpler, general form might mark that region as the current address to the current address plus a signed offset parameter. Then you might be able to build other forms via macros.

User avatar
blargg
Posts: 3715
Joined: Mon Sep 27, 2004 8:33 am
Location: Central Texas, USA
Contact:

Post by blargg » Tue Mar 23, 2010 7:39 pm

Well, I've examined ld65 and it doesn't seem it would be an easy project to convert it to what I described. It apparently does currently support a very primitive dead-stripping, at the module level. A proper algorithm would break up each module into separate sections as well. Your script will be the best way to achieve your aims.

tepples
Posts: 22149
Joined: Sun Sep 19, 2004 11:12 pm
Location: NE Indiana, USA (NTSC)
Contact:

Post by tepples » Tue Mar 23, 2010 11:48 pm

blargg wrote:A proper algorithm would break up each module into separate sections as well.
Various things in the ca65 and ld65 manuals imply as much.

I lost shuffle.py when my laptop just died, but it only took me a couple hours to rewrite it from the notes I had left here. Yay Python.

(It's in the Concentration Room source distribution.)

User avatar
thefox
Posts: 3141
Joined: Mon Jan 03, 2005 10:36 am
Location: Tampere, Finland
Contact:

Post by thefox » Sun Nov 13, 2011 7:39 am

blargg wrote:This would even pave the way for another directive I've wanted for the 6502, that tells the linker to avoid placing the current block at an address that causes any marked branches to cross a page. This would allow it to choose the best place for such routines, without wasting a lot of space as you currently have to do by adding a .align before it.
Not exactly an optimal solution for this problem, but I've been using a macro (for ca65) like this to get diagnostics when certain branches cross a page:

Code: Select all

.macro branch_check opc, dest
    opc dest
    .assert >* = >dest, warning, "branch_check: failed, crosses page"
.endmacro
It can be used like this:

Code: Select all

branch_check bne, somewhere
This is handy when the code is still undergoing modifications, to make sure that timing changes in timing sensitive routines don't go unnoticed. Code can then be manually shuffled around or .aligns added to get rid of the warnings.

Post Reply