It is currently Sat Oct 21, 2017 7:13 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 100 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7
Author Message
PostPosted: Wed Nov 09, 2016 5:07 pm 
Offline
User avatar

Joined: Sat Sep 07, 2013 2:59 pm
Posts: 1402
mikejmoffitt wrote:
General tips for best-practice C absolutely still apply, including those about program organization. There may be additional architecture-specific notes, like what Shiru wrote. What exactly is there to be misunderstood here?

I don't know what's to be misunderstood here. The things you say are what I have said the whole time.

_________________
Available now: My game "City Trouble".
Website: https://megacatstudios.com/products/city-trouble
Trailer: https://youtu.be/IYXpP59qSxA
Gameplay: https://youtu.be/Eee0yurkIW4
German Retro Gamer article: http://i67.tinypic.com/345o108.jpg


Top
 Profile  
 
PostPosted: Wed Nov 09, 2016 9:06 pm 
Offline

Joined: Wed May 19, 2010 6:12 pm
Posts: 2294
This might be a strange opinion but I'd rather use a fake ASM language to 6502 converter than use a C compiler. Like a RISC CPU with 256 registers mapped to the direct page.

Obviously it would need peephole optimizations.


Top
 Profile  
 
PostPosted: Wed Nov 09, 2016 9:47 pm 
Offline
Formerly ~J-@D!~
User avatar

Joined: Sun Mar 12, 2006 12:36 am
Posts: 445
Location: Rive nord de Montréal
There is alot of action in this (interesting!) thread to keep up, so even though it happened in the morning it's already two pages ago. Hell, how you guys find the time to repeatedly write lengthy, time-consuming posts? Anyway, this grabbed my attention:
rainwarrior wrote:
Whole program A.K.A. link time optimization was mentioned, but it's also sometimes practical to just build the entire program in a single translation unit. (This doesn't just apply to small games, I worked on a PS3 project using Unreal 3 where we did this.)

Wait, what?!?

A PS3 game, in one translation unit?

Like what, there's one main file, and every other file is included in it, like tens of #includes at the top that each includes tens of other files that includes tens of other files... That does take a heck of a time to compile, even for a tiny change, doesn't it?

Also, in rainwarrior's small C++ example, I'm much more concerned with the throw statement, which is a big memory bloat in embedded systems that I've experienced many times on different targets. I carefully "dummy-out" some functions that occupy (directly or indirectly) many kB of code (in particular, some function that print the exception object when an unhandled exception is thrown...), but you're still left with all those typeinfo structures into rodata, and exception handling needs RTTI, so you can't really get rid of them.

Those members access, this->index, are much more benign in comparison. While indirect access may hinder some optimisations, by itself it only cost an average of one more CPU cycle, and one less byte, than an indexed access. this can already be in zeropage, if parameters are passed by zeropage. There are situations where these extra cycles aren't that critical, but now you can reuse more of your code, and save space. Remember folks that Shiru always said that memory, not speed, is the main concern when coding NES games in C, and I expect this would be the case for C++ as well.


Top
 Profile  
 
PostPosted: Wed Nov 09, 2016 9:54 pm 
Offline
User avatar

Joined: Mon Jan 03, 2005 10:36 am
Posts: 2963
Location: Tampere, Finland
Jarhmander wrote:
Like what, there's one main file, and every other file is included in it, like tens of #includes at the top that each includes tens of other files that includes tens of other files... That does take a heck of a time to compile, even for a tiny change, doesn't it?

I've heard this being called a "unity build" (not to be confused with the game engine of the same name) if you want to google for more information. I've heard it can actually speed up the builds in some cases due to various factors (e.g. less disk I/O), although I have never tried it myself.

_________________
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: kkfos.aspekt.fi


Top
 Profile  
 
PostPosted: Wed Nov 09, 2016 10:23 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3192
Location: Mountain View, CA, USA
Jarhmander wrote:
Like what, there's one main file, and every other file is included in it, like tens of #includes at the top that each includes tens of other files that includes tens of other files... That does take a heck of a time to compile, even for a tiny change, doesn't it?

Sounds like a job for precompiled headers. If implemented correctly (the big complication stems from correct Makefile creation, i.e. making sure a modified header gets re-precompiled (heh) if changed), and depending on project size, these can *massively* speed up compilation times.


Top
 Profile  
 
PostPosted: Wed Nov 09, 2016 10:39 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5728
Location: Canada
Yes, "unity build" was the technique, and it was to keep compile times lower, which it did. Unreal 3 had a big problem with header dependencies, and on the GCC PS3 build (PowerPC target, circa 2008) it helped.
koitsu wrote:
Sounds like a job for precompiled headers. If implemented correctly (the big complication stems from correct Makefile creation, i.e. making sure a modified header gets re-precompiled (heh) if changed), and depending on project size, these can *massively* speed up compilation times.

Precompiled headers were thoroughly investigated, and unable to do an adequate job by themselves in this particular case. A lot of time (and money-- for hardware and distributed build software) was invested addressing the build time problem on that project. (Also, we only did this for the PS3/GCC build. On the 360/PC/MSVC platforms, the unity build idea was tried too but ultimately a more conventional build style worked better. In all cases the build times were horrible, the unity build was a desperate attempt to make it less so.)

Anyhow, I only brought it up to point out that alternative build solutions are not uncommon, and often very appropriate on specific platforms.

Jarhmander wrote:
Also, in rainwarrior's small C++ example, I'm much more concerned with the throw statement

I used "throw" there just to indicate conceptually that you have to deal with that problem in some way. I wasn't suggesting that you'd actually want to use a throw, or even to use that particular method for assigning an index. (In the OP's video the guy just shrugged a question about exceptions with "it didn't come up".)

Jarhmander wrote:
Those members access, this->index, are much more benign in comparison. While indirect access may hinder some optimisations, by itself it only cost an average of one more CPU cycle, and one less byte, than an indexed access. this can already be in zeropage, if parameters are passed by zeropage.

It's not just one CPU cycle. You'd need to set Y differently for every different member (which also takes back that saved byte). The index suggestion lets you use striped arrays, which would share the same index value for every member. (Pretty similar to how I'd probably organize and use the data in assembly.)

It also can use X or Y, so it frees up Y to be used with a different pointer at the same time. I don't know if x86 to 6502 conversion could be put up to this task, but it might be really interesting to see a real 6502 C++ optimizer try to automatically reorder accesses to be contiguous so it could use mostly INY / DEY to adjust Y.

Anyhow, the point wasn't to suggest this was universally better, just that there are ways you can manage indirect accesses, and use 6502-friendly data structures, even with dynamically used C++ classes. (It was also an argument against DRW's incessantly repeated premise that indirection is a problem. It's not necessarily a problem.)

Jarhmander wrote:
There are situations where these extra cycles aren't that critical, but now you can reuse more of your code, and save space. Remember folks that Shiru always said that memory, not speed, is the main concern when coding NES games in C, and I expect this would be the case for C++ as well.

I'd say that's generally true for NES games once they get large enough, not just in C, but in assembly too. Performance is usually easier to budget than ROM or RAM if you're keeping an eye on it. There are still lots of cases in my large project where I've felt the need to trade space for speed. Size of everything is an issue, and code's not even the biggest fish there. Data space has been the much more important constraint as my game got large.

Depending on the needs of your project, the bottleneck really doesn't have to be ROM space. Part of the reason it's a problem with CC65 is that it can't very easily put C code in more than one bank, mostly due to its heavy dependency on CRT functions for generated code (and inability to have multiple copies of it at link time). If you can solve this problem, code space could be a much smaller issue. All of Shiru's games are NROM, aren't they? Even using banks for just data would open a lot of space for code in the main bank.

Since we're talking about GCC, the -Os (optimize for size) flag seems more appropriate than the -O3 (optimize for speed) used by the author in that video.


Top
 Profile  
 
PostPosted: Thu Nov 10, 2016 6:11 am 
Offline
Formerly ~J-@D!~
User avatar

Joined: Sun Mar 12, 2006 12:36 am
Posts: 445
Location: Rive nord de Montréal
Thanks for the answers regarding so-called "unity builds".

After a quick read on the net it seems it can speed-up build by some large factor, but I noticed too that those blogs praising those "unity builds" are quite old, over 5 years. In 2008, there weren't quad-core (or more cores) available in PCs, right? Because I don't think the "unity build" is fast anymore now, because with over 4 cores you can build translation units in parallel effectively; doing the "unity build" way OTOH prevent parallelization.

Every time I hear people complaining that build time (for an iteration, not a full rebuild) is too long, I invariably think either they have way too much coupling between "modules" (at least, at the file level) or they fucked up their build systems real hard.

EDIT: I thought of some compiler extension that could be used and implemented into a C compiler to implement splitted array of structures. I really do not miss the near and far keywords from back in the day (though they are still used in some C compilers targeting 8-bit microcontrollers), but it could be useful to convey an optimization to the compiler, nevertheless.

A near pointer is 8 bit in size, it can fit in an index register; an array of near objects have their members split, and these kind of array cannot hold more than 256 objects.

Code:
// file: object.h

// (include guard and #include's)

#define NUM_OBJECTS 8

typedef struct
{
    uint8_t x;
    uint8_t y;
    uint8_t type;
    uint8_t state;
} Object;

extern near Object objects[NUM_OBJECTS];

// functions
void Object_move(near Object *object, uint8_t delta_x, uiunt8_t delta_y);
...

// ------
// file : object.c

near Object objects[NUM_OBJECTS];

// functions
void Object_move(near Object *object, uint8_t delta_x, uiunt8_t delta_y)
{
    object->x += delta_x;
    object->y += delta_y;
}

...


a near pointer is absolutely impossible to convert back to a normal pointer, because the memory layout of the near structure is different.

The fun part is not the syntax, but the implementation. One way to do it is to have member access be resolved by the linker; the offset cannot be known by the compiler because it depends on how much objects of this type exists in the final executable. Each near object would have to be output in special sections, one encoding its type for near objects and the other the layout of this type. The linker would ensure that for each section representing a type, offsets do not exceed 256 bytes. The limitation is you could not create e.g. 2 distinct arrays of near Object, each one having 256 objects, because of how the algorithm I described works, even though we all know we can do it in assembly easily.


Top
 Profile  
 
PostPosted: Thu Nov 10, 2016 11:26 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5728
Location: Canada
Yeah I haven't used near and far since learning C++ on my 286 in the early 90s. I don't think I'd want to reuse the keyword to mean something different like that, but it would be nice if there was some sort of type modifier keyword that could be used to stripe a structure for arrays.

In C++ you can do a lot of stuff by passing by reference and overloading operators, like have a native-looking array class that can store its members internally in striped 8 bit arrays.

I dunno about viability of unity builds for compile performance today. I haven't tried to use one for that purpose since then. Though link time optimization is a lot stronger now, I think a single translation unit approach can still result in a runtime performance gain, and it might very well be applicable to an NES target.


Top
 Profile  
 
PostPosted: Fri Nov 11, 2016 6:52 pm 
Offline
Formerly ~J-@D!~
User avatar

Joined: Sun Mar 12, 2006 12:36 am
Posts: 445
Location: Rive nord de Montréal
rainwarrior wrote:
Yeah I haven't used near and far since learning C++ on my 286 in the early 90s. I don't think I'd want to reuse the keyword to mean something different like that, but it would be nice if there was some sort of type modifier keyword that could be used to stripe a structure for arrays.

Yeah I do not really miss these keywords, buuuuut they could be useful, and these keywords, well, they are not random keywords, (some) programmers are familiar with them, they know what they mean. And the meaning is technically not different: it is merely an optimization to reduce the size of the pointers as well as potentially reduce the code size dealing with them. I understand It might be more "orthodox" to define a near pointer as a pointer into zeropage, but that's not nearly as useful.

I'm just not sure yet if doing this would broke the type system or do something non-conformant (safe the fact that it's a non standard keyword and thus has no standard-defined semantics).

rainwarrior wrote:
In C++ you can do a lot of stuff by passing by reference and overloading operators, like have a native-looking array class that can store its members internally in striped 8 bit arrays.


I can't see how it could be done:
  • operator. cannot be overloaded;
  • using operator-> won't work, because that operator should return an actual pointer to struct/class or an object with operator-> overloaded. This won't let you a chance to trick the address generated for the member;
  • even if you did manage to generate an address to get that member, you'll be in trouble for multi-byte members, because the C++ will naturally assume that all bytes of the accessed member are contiguous (as with any other object on any storage), so you'll be forced to generate a fancy proxy object that convert into the underlying type and implement operator= that write the rhs back into the splitted array. I can't figure out how the compiler would generate efficient code for this.

But I would like to hear what you had in mind, I possibly missed other options that may make it possible, and I can't realistically prove that it cannot be done.


Top
 Profile  
 
PostPosted: Fri Nov 11, 2016 7:19 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 5728
Location: Canada
Hmm, I meant you can make a class that "feels" like a 16 or 32 bit integer array but internally stores its data to 8 bit stripes via overloaded operators (operator () and = but also ++, +=, or any other modifying ones too). I've definitely done this before, but honestly it's not a solution I can really rattle off the top of my head (a lot of tricky little ramifications to remember and deal with). I'd have to do some digging to find a practical working implementation.

I wasn't really thinking about doing it for all members of a structure, but maybe you could make a structure out of such classes? I'm not sure.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 100 posts ]  Go to page Previous  1 ... 3, 4, 5, 6, 7

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group