It is currently Tue Oct 16, 2018 2:36 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 27 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Tue Apr 17, 2018 5:20 pm 
Offline

Joined: Wed Apr 04, 2018 7:29 pm
Posts: 37
Location: Montreal, Canada
Hi.

The NES is obviously a very limited system, but the one resource i pay the least attention to is the code size. I think i am reaching 6K of code soon and I basically just have a guy running around in a room.

Do you guys care at all about code size, do you you just add extra pages of ROM whenever you run out?

What do you allow yourself to use macros for (16bit math, etc.) ? Do you often unroll loops? Any best practices I should be aware of?

Thanks.

-Mat


Top
 Profile  
 
PostPosted: Tue Apr 17, 2018 5:43 pm 
Offline
User avatar

Joined: Fri Nov 19, 2004 7:35 pm
Posts: 4093
When I did homebrew for the TI83, code size was paramount and extremely important. Especially when you had 25K of user ram for all your programs.

_________________
Here come the fortune cookies! Here come the fortune cookies! They're wearing paper hats!


Top
 Profile  
 
PostPosted: Tue Apr 17, 2018 5:45 pm 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20656
Location: NE Indiana, USA (NTSC)
When to add pages

With very few exceptions, sizes of PRG ROM and CHR ROM on NES are powers of two. When you add more pages of ROM, you always double it, and if you double it too many times, you could exceed how much you had planned to pay per cartridge for replication. A few of these thresholds are especially painful.

For PRG ROM size:
  • 32K to more than 32K, as you now have to include PRG bank switching hardware on the PCB and decide what goes in the fixed bank and what goes elsewhere.
  • 64K to more than 64K, as you're now ineligible for the main category of the NESdev Compo (and thus exposure on the Action 53 anthologies to build an audience for your future solo cartridge releases).
  • 256K to more than 256K, as many popular discrete mappers top out at that much, as does the MMC1 if CHR ROM is used.
  • 512K to more than 512K, as you hit the limit of the PowerPak and many ASIC mappers.

For sizes of other things:
  • 8K CHR ROM to more than 8K, as you now have to include CHR bank switching hardware on the PCB and figure out what will be displayed alongside what.
  • Roughly 6K of DPCM samples to bigger than 6K, as you're now putting serious pressure on your fixed bank.
  • 16K of enemy movement code to more than 16K, as you now have to split that across several banks and give each enemy type not only a movement routine entry point but also a movement routine bank.
  • 16K of music code and data to more than 16K, as you now have to move a lot of music code into the fixed bank so that it can access sequence data in multiple banks.
  • 2K of RAM to more than 2K, as you need to include WRAM and decoding hardware on the PCB.
  • 32 bits of state preserved from one play session of your campaign to the next to more than 32, as you need to switch from an 8-character password to battery-backed RAM or self-flashability in order to save players' sanity.
  • 8K of WRAM to more than 8K, as only a few well-known mappers are known to support that: MMC1 with CHR RAM, MMC5, and FME-7.

The PRG ROM of Super Mario Bros. was just over 32K. As ShaneM discovered, there are a few parts of the program where tricky code-golf optimizations were made, and a few parts that were left unoptimized. Nintendo engineers optimized the code for size just enough to the point where it came under 32K.

Time-space tradeoffs

Moderate unrolling (factors of 4 to 16 or so) is also beneficial in time-critical code, such as video memory update routines. If you're trying to fit into 16K PRG ROM for NROM-128, you probably don't have that much stuff to push to video memory each frame, so you can get away with a less unrolled update loop than something that pushes the limits of video memory bandwidth the way Battletoads does.

Subroutine call overhead on 6502 is 12 cycles: 6 for JSR and 6 for RTS. If a routine is called only a few times per 29780-cycle frame, this overhead may not amount to much.


Top
 Profile  
 
PostPosted: Tue Apr 17, 2018 7:41 pm 
Offline
User avatar

Joined: Wed Apr 02, 2008 2:09 pm
Posts: 1250
It's also the resource I pay the least attention too. Making my code smaller affects my players' perception of the game less than making my code faster (dropped frames vs... an extra second of download on 56K?). (Before the pedantic, I deal in ROMs, making cartridge costs aren't a factor.)

I do turn a lot of jumps into branches (which can only tie or make code slower) for bytes. (There's usually always a flag you can branch on instead of a jmp.) I do end up caring a little bit about data size, but I just find that sort of compression fun.

I use macros to unroll loops, and in place of subroutines that are used often (to avoid the jsr/rts speed hit.) I don't use 'em too much for 16bit math since usually it makes optimizations harder to see. (If you have a 16bit add macro with a clc baked in, but you know the carry will be clear in some places where you place it. Or you know it will be set, and can just fix the constant. Or whatever.)

If you posted some code, it'd be easier to give tips based on what you're actually doing. My most general tip is the carry flag is super useful for all kinds of optimizations. Relatively few instructions change it so you can rely on it not changing (and branch based on a value it has had) for a while. As well you can set things up so you branch on carry set to a subtraction (so no need to sec) and on carry clear it wouldn't branch and would add (so no need to clc.) It's two cycles to set or clear and again... doesn't really change. So you can use it as a return value for a subroutine, and do a lot of other stuff before you actually use the value.

You can also check out this thread: http://atariage.com/forums/topic/71120- ... ler-hacks/
And this one: http://www.atariage.com/forums/topic/11 ... -by-seven/
And this wiki article: https://wiki.nesdev.com/w/index.php/Syn ... structions (Reverse subtract is nice.)
As well as here: http://codebase64.org/doku.php?id=base:6502_6510_maths
For some cute examples of code to get your brain turning.

tl;dr: Worry the most about getting the game done, honestly. Use whatever resources are at your disposal for that, because none of the rest matters if no one will ever play it.

_________________
https://kasumi.itch.io/indivisible


Top
 Profile  
 
PostPosted: Tue Apr 17, 2018 9:38 pm 
Offline

Joined: Mon May 25, 2009 2:20 pm
Posts: 64
me personally, i always try to use the "least" amount of code as possible for anything on megaman odyssey. I like to say that im the master of "optimzing routines" - cause i've been doing it for so many years.
The game rarely ever has any lag frames "during" gameplay, and i've tested certain areas and situations with fceux movie-recordings and the lag counter, countless times.
except for Pyro Man level only, cause the SNES-Genesis style water IRQ is called like 30 times a frame

But im always so obsessed with finding or creating as many little shortcuts as possible even though i know it wont ever matter to a person in the world.

My game's size is currently 512 KB graphics, and 512 KB coding ...i have more than 50% free space on graphics, and probably about 3 or 4 "2000 byte banks" still avaiable on coding space, before i must upgrade to 1 MB

I use the MMC5 mapper, so the limit is 1 MB on both. never had to go past 512 kb yet.


Top
 Profile  
 
PostPosted: Tue Apr 17, 2018 10:16 pm 
Offline
User avatar

Joined: Sat Feb 12, 2005 9:43 pm
Posts: 10892
Location: Rio de Janeiro - Brazil
I normally consider PRG-ROM to be a cheap resource, so I'll often resort to using unrolled loops and look-up tables if that results in performance improvements. However, that doesn't mean I don't care about code size, because even though 512KB of PRG-ROM (a "common" limit for programs that are not meant to be part of compilations) is a lot to fill, the NES can still only see 32KB at a time, and I really don't want to be switching banks back and forth during critical parts of my game engine.

Like other coders here, I do a lot of small optimizations, turning jumps into branches, removing redundant SECs and CLCs, and so on. I always try to draw attention to these optimizations using comments, so that I know I must be careful when editing code around them. Another thing I do is try to use as few variables as possible in any given logic block, to avoid excessive loading and storing. I also spend some time optimizing loops, avoiding special handling of the first or last iterations (which could require repeated code) and finding the optimal ending conditions (e.g. counting down instead of up to avoid a comparison), even if that means tweaking the inner logic a bit.


Top
 Profile  
 
PostPosted: Tue Apr 17, 2018 11:23 pm 
Offline
User avatar

Joined: Fri Feb 27, 2009 2:35 pm
Posts: 294
Location: Fort Wayne, Indiana
tokumaru wrote:
I really don't want to be switching banks back and forth during critical parts of my game engine.

This is really the main driver behind trying to keep code size down, myself. If I can keep all of the code and data that's relevant to each other inside the same bank, that means that everything is a lot simpler. MMC1 also makes you want to avoid bank switching when you can because of how many cycles it ends up using.

On top of doing small optimizations, I tend to use subroutines quite heavily. It doesn't matter if the code is very specific - if there's a block of code I use multiple times and cycles aren't very important, I make it a subroutine.

Another thing I do with subroutines a lot is to give them a "default" input, that can be overridden. For example, I have a "Display Enemy" which just takes a starting tile number, which is the most common case. "Display Enemy" just prepares an input for and drops into executing "Display Enemy Custom" which takes four arbitrary tile numbers, and I only have to store one copy of the pretty lengthy subroutine. Super Mario Bros does something similar, giving some routines multiple places to enter them that feed something else with different data depending on which entry point you used.


Top
 Profile  
 
PostPosted: Wed Apr 18, 2018 12:03 am 
Offline
User avatar

Joined: Fri Nov 12, 2004 2:49 pm
Posts: 7546
Location: Chexbres, VD, Switzerland
Personally I am very careful about code size, and I know I disagreed with tokumaru on that quite a few times :)
Optimizing for speed only makes sense if you come close to the limit where you use 100% of the CPU, while optimizing for size makes sense whenever you are close to a limit of 2^n of PRG-ROM and don't want to go for the upper 2^n size.

Limiting code size also makes the most sense if you want to limit to 32KB and avoid bankswitching altogether. (Yes, there is way to get more without bankswitching but they're non-cannonical). As soon as you'll implement any bankswitching I don't think it makes much difference whether you use, say, 64KB or 128KB, but each mapper has it's cannonical limits.


Top
 Profile  
 
PostPosted: Wed Apr 18, 2018 1:10 am 
Offline

Joined: Wed Nov 30, 2016 4:45 pm
Posts: 116
Location: Southern California
bleubleu wrote:
What do you allow yourself to use macros for (16bit math, etc.) ?

In most cases, macros should assemble exactly the same thing you would have written out by hand, adding no extra length of executable code nor execution time, only hiding the ugly internal details so you don't have to look at them every time you use them. Since they can make your code more concise and readable, you might even find optimization possibilities that weren't otherwise obvious. I also get fewer bugs when I make heavy use of macros, and the ones I do get tend to be easier to find and fix. Call me the macro junkie. Maybe that should have been my forum name.

Kasumi has a valid point about a few situations like whether or not a CLC is needed before a 16-bit add in a macro; but you can also have for example two different macros that do the same thing except that one has the leading CLC and the other does not, and give them the same names except that the one less used might have a trailing _ or something like that. That's not very common though, and of course just because you have a macro there doesn't mean you can't still do it the non-macro way if you want to.

I do 6502 nestable program flow control structures in macros too. See http://wilsonminesco.com/StructureMacros/ . One of the simplest examples might be
Code:
        CMP  #14
        IF_EQ            ; clear enough that it really needs no comments
           <actions>
           <actions>
           <actions>
        END_IF

and the IF_EQ assembles aBNEdown to theEND_IF, exactly as you would write by hand, but it doesn't need the label. TheEND_IFis only used by the assembler, and it does not lay down any code. Again, it can be nested too, meaning a secondIF_EQ...END_IFpair could be inside the first one, and another one inside of that one, etc., and the assembler will make each branch go to the right place. There are lots of different forms of these, even in the IFs group, likeIF_BIT ACIA_STAT_REG, 3, IS_SET, and of course other structures besidesIFs, like BEGIN...WHILE...REPEAT, FOR...NEXT (including a 16-bit one), CASE, etc..

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Top
 Profile  
 
PostPosted: Wed Apr 18, 2018 5:48 am 
Offline

Joined: Tue Feb 07, 2017 2:03 am
Posts: 608
Code size is something to be careful about, it does sneak up on.
At the start I just write code, I don't care about how big or slow it is. I just get it to do "the thing". Once I see the thing and I play with it, I can get it to the point I'm convinced that it will "stay". Then I will make it "sensible". Any thing that can be looped, tabled etc get looped and tabled. I then save size/speed optimisations for when it becomes critical and I can see how it has to work in the mostly complete code.

For, this "clc is not needed", "this jump can be a bne etc", I leave it to the tass optimizer to find those, it will find all of them in a second, it does love to show off.

In order to stop the 3 weeks of optimisation at the end of the project that is risky as you don't have time to test everything, I use BDD6502 and I take a week or so here and there to do a optimise and code cleaning pass, to break it up a bit.

Macros are nice, but they are mermaids.. they sing a sweet song and send you to your doom if you are not careful. If you make the "safe" its mostly ok. But you have to really plan to properly and understand how they work etc. I did have a lot of macros but I found they tend to make the code less readable and maintainable after a while. ADCB_W, ADCBX_W, IFBLT, IFBLTE, BAGTE etc. So I've evolved it to a syntax sugar + optimizer system which then gets the tass optimizer as well. After using it for the last 4 months everyday its too the point I can't even be bothered to write a small test case the old way.. Still has a lot of things it doesn't do that it needs to do though sigh...


Top
 Profile  
 
PostPosted: Wed Apr 18, 2018 7:09 am 
Offline

Joined: Mon May 25, 2009 2:20 pm
Posts: 64
just wanted to show this if i may, sometimes i'll have insanely long sections of nothing but JSR's only ..to do many different things in terms of enemy/boss movement patterns, like this :P

Image

without all these JSR's, it would have been like 20x longer.
fun discussion topic :)


Top
 Profile  
 
PostPosted: Wed Apr 18, 2018 8:50 am 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2757
How many bank granularity is there with existing mappers? If I start with a 32kB bank, things will go smoothly, but if I pass that limit and have to rely on bank switching to get around it, trying to fit existing code in seperate 16kB banks would be really frustrating.


Top
 Profile  
 
PostPosted: Wed Apr 18, 2018 9:34 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20656
Location: NE Indiana, USA (NTSC)
The 3 most common layouts:

32K (e.g. GNROM, AOROM, BNROM, Color Dreams, MMC1)
$8000-$FFFF: 32K switchable window

16K (e.g. UNROM, MMC1)
$8000-$BFFF: 16K switchable window
$C000-$FFFF: fixed to last 16K

2x8K (e.g. MMC3)
$8000-$9FFF: 8K switchable window
$A000-$BFFF: 8K switchable window
$C000-$FFFF: fixed to last 16K

And specialized layouts:

2x8K reconfigured for DPCM switching (e.g. MMC3, VRC4)
$8000-$9FFF: fixed to last 8K
$A000-$BFFF: 8K switchable window
$C000-$DFFF: 8K switchable window
$E000-$FFFF: fixed to last 8K

16K+8K (e.g. VRC6)
$8000-$9FFF: fixed to last 8K
$A000-$BFFF: 8K switchable window
$C000-$DFFF: 8K switchable window
$E000-$FFFF: fixed to last 8K

3x8K (e.g. FME-7, RAMBO-1)
$8000-$9FFF: 8K switchable window
$A000-$BFFF: 8K switchable window
$C000-$DFFF: 8K switchable window
$E000-$FFFF: fixed to last 8K

One reason for having a fixed bank is that on NES, the program, data, and DPCM sample bank are all $00 (in 65816 terms). So the mapper needs to subdivide the space so that a subroutine in one bank can access data in another bank.


Top
 Profile  
 
PostPosted: Wed Apr 18, 2018 12:25 pm 
Offline

Joined: Wed Nov 30, 2016 4:45 pm
Posts: 116
Location: Southern California
Oziphantom wrote:
Macros are nice, but they are mermaids.. they sing a sweet song and send you to your doom if you are not careful. If you make the "safe" its mostly ok. But you have to really plan to properly and understand how they work etc. I did have a lot of macros but I found they tend to make the code less readable and maintainable after a while. ADCB_W, ADCBX_W, IFBLT, IFBLTE, BAGTE etc.

Take a different approach. Instead of using cryptic names, make it really clear what they're doing, and use the parameters to make like a sentence. If your ADCB_W means "Do a double-precision (16-bit) add-with-carry of B and W," you could change the macro name to something like _16bit_ADC, and make the line say for example,
Code:
        _16bit_ADC   B, _and, W    ; B=B+W

(Unfortunately the assembler requires separating parameters with a comma, which is why there's a comma after the _and.) The "_and" (with the underscore or other character to keep the assembler from confusing it with the mnemonic) is an equate that does not actually get used by the macro. It's only there to make things more readable to humans. The comment clarifies where the answer goes. So this would assemble the same as
Code:
        CLC
        LDA   B
        ADC   W
        STA   B
        LDA   B+1
        ADC   W+1
        STA   B+1

The same macro can be used to add different variables which you specify in the parameters, rather than being confined to B and W. Conditional assembly in the macro definition can do optimizations if necessary. Some assemblers let you say in essence, "If there's a fourth parameter, do the following;" so you could use the same macro to add more than just two numbers, and you could invoke it something like this:
Code:
        _16bit_ADC   B, W, _and, offset3    ; B=B+W+offset3


If your IFBLT means "if: branch if less than," and only assembles a BMI, it's not really clarifying or shortening anything. How about something like this instead, where a portion is skipped if the N flag is set:
Code:
        IF_POSITIVE    ; Negative result above causes it to skip the following lines.
           <do_stuff>
           <do_stuff>
           <do_stuff>
        END_IF

or to branch back to the beginning of a loop as long as the result is negative:
Code:
        BEGIN           ; (Or name it "DO" if you like)
           <do_stuff>
           <do_stuff>
           <do_stuff>
        UNTIL_POSITIVE

Then you don't even need a label (although you can still use one if you want to).

_________________
http://WilsonMinesCo.com/ lots of 6502 resources


Top
 Profile  
 
PostPosted: Wed Apr 18, 2018 7:05 pm 
Offline

Joined: Wed Apr 04, 2018 7:29 pm
Posts: 37
Location: Montreal, Canada
Thanks for all the answers. The IF_XXX macros are very interesting. I also find beq/bmi/bpl are often hard to follow.

I should have realized that such an opened question would send the discussion in many direction. Ill try to be more specific.

Where do you draw the line for inlining something (macros) vs. making it a subroutine. For example, what about a 16-bit addition (which is ~20 cycles or so). Do you accept to pay 12 cycles (jsr/rts) for it just for the sake of reducing code size or do you use a macro? Where is the cutoff?

-Mat


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 27 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: Google [Bot] and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group