It is currently Sat Nov 17, 2018 4:42 am

All times are UTC - 7 hours





Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Sat Sep 08, 2018 5:36 pm 
Offline

Joined: Thu May 19, 2005 11:30 am
Posts: 694
In C (and C++), you can define structures with arbitrary data types inside, and you can even define structures with packed bitfields. For example, you could do something like this:
Code:
union {
   struct {
      unsigned grayscale: 1;
      unsigned leftBorderBackground: 1;
      unsigned leftBorderSprites: 1;
      unsigned showBackground: 1;
      unsigned emphasizeRed: 1;
      unsigned emphasizeGreen: 1;
      unsigned emphasizeBlue: 1;
   };
   uint8_t data;
} PPUMask;

void PPU::writeHandler (uint16_t address, uint8_t value) {
   switch(address &0x0007) {
      case 1: PPUMask.data = value; break;
   }
}

void PPU::draw (void) {
   // ...
   if (PPUMASK.showBackground) {
      // ...
   }
}
By naming bitfields while also being able to access the entire PPUMask structure at once, one can write very flexible and readable code. But every C person will tell you that this is bad, because the C standard lets compilers pad structures and fill up bitfields however it wants, and you should not assume anything about bit ordering, byte order and so on for the sake of portability. So, why not add additional keywords to the C standard that lets me specify what kind of bit or byte ordering I expect, if I expect anything, and whether I can accept structure padding in any given case, and let the compiler do the additional work? Something like this stylized example:
Code:
struct IFFHeader {
   char      ID[4];
   be_uint32_t   length;   // big endian chunk size
} binary; // don't pad anything

union {
   struct {
      unsigned grayscale: 1;
      unsigned leftBorderBackground: 1;
      unsigned leftBorderSprites: 1;
      unsigned showBackground: 1;
      unsigned emphasizeRed: 1;
      unsigned emphasizeGreen: 1;
      unsigned emphasizeBlue: 1;
   } binary lsbfirst; // bits are packed and specified from 0x01s to 0x40s
   uint8_t data;
} PPUMask;
But no. If you look at stackoverflow.com and similar sites, as well as most "portable" code that processes binary data in any form, you are expected to do something horrible like this in the name of portability:
Code:
#define PPUMASK_SHOWBG 0x08

if (PPUMask &PPUMASK_SHOW_BG) {
   // ...
}
// read big-endian chunk size
uint32_t iffLength = (chunkHeader[4] <<24) | (chunkHeader[5] <<16] | (chunkHeader[6] <<8) | chunkHeader[7];
Basically, when processing binary data, you're supposed to eschew high-level structures and do everything by hand in the name of portability: masking and shifting bits from packed bytes, retrieving and putting together multibyte fields from binary structures, and so on, with all the potential for additional error this hard-to-read code brings about. Instead of just being able to tell the compiler what I want, and let it do the work as needed on the particular platform, and as a consequence, any optimizations. And most C/C++ programmers, and certainly the people doing the standard seem to be perfectly fine with it.


Top
 Profile  
 
PostPosted: Sat Sep 08, 2018 6:13 pm 
Offline
User avatar

Joined: Sun Sep 19, 2004 9:28 pm
Posts: 3686
Location: Mountain View, CA
Myria might have a field day with this one. ;-) (reference)


Top
 Profile  
 
PostPosted: Sat Sep 08, 2018 6:16 pm 
Offline
User avatar

Joined: Fri May 08, 2015 7:17 pm
Posts: 2333
Location: DIGDUG
Interesting side note.

The wikipedia page on bit fields has NES specific example code.

https://en.m.wikipedia.org/wiki/Bit_field

_________________
nesdoug.com -- blog/tutorial on programming for the NES


Top
 Profile  
 
PostPosted: Sat Sep 08, 2018 6:57 pm 
Offline
User avatar

Joined: Sat May 04, 2013 6:44 am
Posts: 33
I use bitfields all over the place in my emulator, in a way very similar to yours. I use them in the core CPU and PPU emulation, I use them in mapper registers, I use them in iNES and FDS header parsers.

That advice about non-portable or compiler-specific behavior? Yeah, the standard might say that, but in practice, which compilers are you going to use? My emulator works on Windows, macOS, and Linux, using MSVC, clang/llvm, and gcc respectively, and the bitfields behave exactly the same way on those three compilers. What more do you want?

Bitfields aren't perfect, however. You have to pay close attention to bit alignment within bytes, byte alignment within words, endianness, and more. My bitfields look like this:
Code:
   enum NametableSource : uint8_t {
      CIRAM,
      CHRROM
   };

   union VRC6PpuBankingStyle {
      VRC6PpuBankingStyle(uint8_t val) : value(val) {}
      struct {
#if __BYTE_ORDER == __LITTLE_ENDIAN
         uint8_t ppuBankingMode : 2;
         uint8_t mirroring : 2;
         NametableSource nametableSource : 1;
         uint8_t chrA10Rule : 1;
         uint8_t unused : 1;
         uint8_t prgRamEnable : 1;
#else // __BIG_ENDIAN
         uint8_t prgRamEnable : 1;
         uint8_t unused : 1;
         uint8_t chrA10Rule : 1;
         NametableSource nametableSource : 1;
         uint8_t mirroring : 2;
         uint8_t ppuBankingMode : 2;
#endif
      };
      uint8_t value;
   };

All of my bitfields are endian-safe by repeating themselves in reverse order for the other endianness mode. Boost.Endian provides the compile-time endianness detection. You can see I use a typesafe enum inside of this bitfield, and I declare the underlying integer data type with my enums, which might be necessary to use them the way I do.

I often deal with 16-bit fields like this:

Code:
#pragma pack(push, 1)
struct DiskInfo
{
   uint8_t BlockCode;
   char DiskVerification[14];
   // ... snip ...
private:
   uint8_t diskWriterSerialNumberLo;
   uint8_t diskWriterSerialNumberHi;
   // ... snip ...
public:
   uint16_t GetDiskWriterSerialNumber() const { return diskWriterSerialNumberLo | (uint16_t(diskWriterSerialNumberHi) << 8); }
};
#pragma pack(pop)

Doing it this way means I don't even have to think about byte alignment / word alignment within the structure - I can just pluck a pointer out of the middle of an arbitrary buffer, cast it to the type I want, and get 16-bit integers out, regardless of the architecture or endianness. It's not like this is performance sensitive code.


Top
 Profile  
 
PostPosted: Sat Sep 08, 2018 7:06 pm 
Offline
User avatar

Joined: Thu Mar 31, 2016 11:15 am
Posts: 419
Many years ago I wrote some code which would handle serialization automatically, but you had to write structs inside a macro. It looked like this:
Code:
struct header_t
{
    SERIALIZED_POD
    (
        ((std::uint16_t, op))
        ((std::uint32_t, length))
    )
};

So it is possible to solve the problem generically, it just requires a library.

@LightStruk
Why do you have to swap the bits with different endians? Shouldn't you only be swapping the bytes?


Top
 Profile  
 
PostPosted: Sat Sep 08, 2018 7:15 pm 
Offline
User avatar

Joined: Tue Jun 24, 2008 8:38 pm
Posts: 2047
Location: Fukuoka, Japan
So to allow portability you just destroy readability... :? That's doesn't seems like a good compromise. I feel your pain. I would react the same way.


Top
 Profile  
 
PostPosted: Sun Sep 09, 2018 10:37 am 
Offline

Joined: Sun Sep 19, 2004 11:12 pm
Posts: 20775
Location: NE Indiana, USA (NTSC)
dougeff wrote:
The wikipedia page on bit fields has NES specific example code.

The diff that added it is mine. The previous bit order didn't match any real-world bit order, and I thought reflecting real-world use would make the example more meaningful to readers.


Top
 Profile  
 
PostPosted: Sun Sep 09, 2018 1:28 pm 
Offline
User avatar

Joined: Sat May 04, 2013 6:44 am
Posts: 33
pubby wrote:
Why do you have to swap the bits with different endians? Shouldn't you only be swapping the bytes?
I think the compiler developers could have decided to keep bit-order within bytes the same between little and big endian code, but they decided instead to make the bit-order behave like the byte-order. In other words, for little endian, bytes and bits are declared from low to high within a given integer type, even though high-to-low bit order is a lot more legible. Since byte order works like this:
Code:
union shortword {
  struct {
#if __BYTE_ORDER == __LITTLE_ENDIAN
    uint8_t byteLow;
    uint8_t byteHigh;
#else // __BIG_ENDIAN
    uint8_t byteHigh;
    uint8_t byteLow;
#endif
  };
  uint16_t value;
};
bit order has to work like this:
Code:
union nibbles {
  struct {
#if __BYTE_ORDER == __LITTLE_ENDIAN
    uint8_t nibbleLow:4;
    uint8_t nibbleHigh:4;
#else // __BIG_ENDIAN
    uint8_t nibbleHigh:4;
    uint8_t nibbleLow:4;
#endif
  };
  uint8_t value;
};


Top
 Profile  
 
PostPosted: Sun Sep 09, 2018 2:37 pm 
Offline

Joined: Thu May 19, 2005 11:30 am
Posts: 694
koitsu wrote:
Myria might have a field day with this one. ;-) (reference)
Oh, I have absolutely no doubt that the sadists on that C standard committee are always keeping a "gotcha" whip ready for everybody about something or other.

LightStruk wrote:
That advice about non-portable or compiler-specific behavior? Yeah, the standard might say that, but in practice, which compilers are you going to use? My emulator works on Windows, macOS, and Linux, using MSVC, clang/llvm, and gcc respectively, and the bitfields behave exactly the same way on those three compilers. What more do you want?
Supposedly, it will fail on a Solaris machine or some other big-endian crap platform, because apparently, GCC packs bitfields backwards for these target platforms.

But before I torture myself with code from hell like this:
Code:
int PRGMask = ((0x3F | (Reg[1] &0x40) | ((Reg[1] &0x20) <<2)) ^ ((Reg[0] &0x40) >>2)) ^ ((Reg[1] &0x80) >>2);
... I have decided to become a portability refusenik and to embrace bit fields whole-heartedly, devil-may-care. In the unlikely event that some poor soul has to adapt my code, reversing the lines of a bit field definition will be easier than trying to understand bit shifts from hell such as the above.

Also, there apparently is Boost.Endian, which allows specifying a "be_uint32_t/le_uint32_t" instead of just an "uint32_t" with automatic implicit conversion where necessary. Of course, installing boost can be a major pain. "The plan is to submit Boost.Endian to the C++ standards committee for possible inclusion in a Technical Specification or the C++ standard itself." Wow! 44 years (that quote is from 2016) after the introduction of the C language in 1972, and 27 years after its standardization in 1989, somebody has the bright idea that C itself might benefit from the ability to specify endianness for a variable! What other wonders will they think of next?


Top
 Profile  
 
PostPosted: Sun Sep 09, 2018 2:56 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6955
Location: Canada
Banshaku wrote:
So to allow portability you just destroy readability... :? That's doesn't seems like a good compromise. I feel your pain. I would react the same way.

There are plenty of "readable" ways to make bitfields.

Most commonly I've seen macros used for it. Some people have general opposition to macros. YMMV.

You can also make bitfields with template classes. std::bitset is one such implementation.

The bitfield part of the language spec is also usable for lots of purposes. Like you've hit on before, the C standard was always allowed to pad structures arbitrarily. Most compilers have a #pragma pack extension, but I don't think C bitfields have really gotten that kind of treatment, partly because they aren't very widely used.

And yes, Banshaku, you can write explicit shift / and / or / complement / etc. everywhere too. It has a disadvantage of verbosity and maintenance, but I would also say that approach has an advantage of being very explicit about what is happening.


Every approach has compromises. The reason the C language feature isn't well used is just that most cases where you really want a bitfield you care specifically how those bits are packed. Its primary use case is more or less for an optional compiler optimization of space. There's just no way to specify even "I want these 8 bits to fit into a byte", which kinda defeats the usual point of bitfields.

I don't think portability is really the biggest problem here, it's just that most implementations will not create a bitfield structure that looks the way you want it to be packed in a lot of cases.

Same deal with std::bitset, since the actual implementation is hidden by the library, it's more or less a compiler-optional space optimization only. (...but std::bitset at least works pretty well in practice when e.g. you want to story a million flag variables.)


So... these are optional memory usage optimizations. In a lot of cases memory isn't so constrained. In cases where it is very constrained, you may very well need to "roll your own" instead of hoping the compiler will do what you want with those bits... and that's the reason most people do.


Top
 Profile  
 
PostPosted: Sun Sep 09, 2018 9:05 pm 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6955
Location: Canada
NewRisingSun wrote:
Wow! 44 years (that quote is from 2016) after the introduction of the C language in 1972, and 27 years after its standardization in 1989, somebody has the bright idea that C itself might benefit from the ability to specify endianness for a variable! What other wonders will they think of next?

For portability's sake, the one thing that's really been missing is a compile time device to know the native endian of the target system, and yeah that's still not in. Coming in C++20.

The solution to that one is generally just to add a line or two to some header whenever a new kind of target platform is added. There's some compiler defines (e.g. GCC's __BYTE_ORDER__) that can cover a whole compiler family at once. When your product is a library that's supposed to build in any compiler out there, like Boost, this solution gets a little bit cumbersome, but aside from universal library projects I think most applications can get all their targets covered in a handful of #define lines? (Or already depend on a library that has done it for them.) It's a problem, but not a very hard one.

Though, ironically, finally adding it to C++20 won't help cases like Boost which will still have a mandate to support every compiler under the sun. ;P

As far as Boost's endian types, I dunno. It's one way to solve a particular problem, but there are other very trivial solutions for the same thing. I could take it or leave it, don't really think it needs to be part of the spec, but wouldn't kick it out of bed I suppose. This doesn't seem a glaring omission to me at all.


Top
 Profile  
 
PostPosted: Mon Sep 10, 2018 12:44 am 
Offline

Joined: Thu May 19, 2005 11:30 am
Posts: 694
Everybody seems to come up with insular solutions to cope with the problem of compilers doing whatever they want, because the standard allows it. Some of these solutions may well be elegant and workable, and of course, they have been used for decades. But they are still workarounds, hacks, kludges, coping strategies.

My point is, one should not have to. One should not have to use workarounds to cope with a compiler doing whatever its wants. One should have the ability to tell the compiler to do what it should do, and when necessary how it should do it, and in a way that every standards-conforming compiler understands. And that's a fundamental, almost philosophical, premise when writing or updating a standard, one that seems to be missing, and whose lack will not be remedied by submitting any particular workaround for standardization.


Top
 Profile  
 
PostPosted: Mon Sep 10, 2018 1:55 am 
Offline
User avatar

Joined: Sun Jan 22, 2012 12:03 pm
Posts: 6955
Location: Canada
NewRisingSun wrote:
And most C/C++ programmers, and certainly the people doing the standard seem to be perfectly fine with it.

I mean, I can agree that it would be nice if there was a way to specify a well defined data structure in C or C++.

The reality of it is that you can't though. If you're interested in ways to practically do it, there are many I could suggest, but by your response it sounds like you don't want to discuss that.

Am I "perfectly fine" with it? I guess. I try to accept the things I can't change. Joining an ISO national committee and working on the standard is an option, I suppose, but that's not a casual engagement. I'd probably rather spend my time writing code than arguing with people about how it could have been written. (Even less so with people with no capacity to change it.)

I wouldn't assume that "the people doing the standard" are perfectly fine with it either. I'm sure this particular issue has been debated many times, with probably a great many people unhappy that it hasn't reached a solution.


Also, just to point out another way the existing C/C++ bitfield feature is unuseful, try making that union in your first post and taking the size of it. (This is kind of what I meant by portability not being the biggest problem. It's not that some compilers might do a different thing, it's that you probably won't find any compiler for a given platform that does what you want here.) They're pretty good at saving some data space if used carefully, but that's about it.

As another thought, though, is there a language that implements bitfields in a better way? Not meant as an argument that C/C++ should not improve, I'm just curious if any do.


Top
 Profile  
 
PostPosted: Mon Sep 10, 2018 2:25 am 
Offline

Joined: Thu May 19, 2005 11:30 am
Posts: 694
Well, this is more of a rant thread than a practical solution thread. :)

I do think however that the problem of well-defined data structures has in fact not been adequately considered by the standards bodies, rather than having been considered and reached a decision to keep things as they are. And I think that is exactly because people are too willing to help themselves with practical yet insular solutions, so it is not seen as something in need of being addressed. That is what I meant by people liking it that way. We shall see what becomes of that Boost.Endian proposal, if there is one.

The union size is of course the target platform's word size (i.e. 32 or 64 bit), but that does not make it useless. I use constructs like the quoted one solely for the purpose of accessing individual hardware register bits or bit groups in an extremely readable manner, without having to clutter up my code by explicitly writing out ANDs and ORs, or CheckBit/ClearBit/SetBit macros/helper functions, all the time. It is quite useful for that, and works well enough when I restrict myself to compilers targeting little-endian platforms.

Edit: I've thought some more about why exactly I so dislike bit-setting macros, std::setbit templates, let alone manually ANDing and ORing all the time: it basically requires me to explicitly specify the storage details every time I access a bitfield member, instead of just once when I declare the bitfield. The equivalent with normal variables would be never being able to declare variables as floats, ints of various sizes or strings, but instead having to declare everything as void or uint8_t, and then having to cast every single time I access the variable. Nobody would want to do that for normal variables, yet that is exactly what all "portable" solutions of accessing well-fined data structures amount to, regardless of the amount of syntactic sugar that they use. C++ bit fields, and Boost:Endian arithmetic, are the only solutions that do not suffer from that drawback.


Top
 Profile  
 
PostPosted: Tue Sep 11, 2018 10:32 am 
Offline
User avatar

Joined: Mon Jan 03, 2005 10:36 am
Posts: 3137
Location: Tampere, Finland
Just say "fuck it", and say that your code is only supported on compilers/platforms where bitfields are laid out in the order you expect. You can add compile-time or runtime asserts to make sure that your expectations hold true.

It's perfectly reasonable to not support every compiler (and platform) in existence.

_________________
Download STREEMERZ for NES from fauxgame.com! — Some other stuff I've done: fo.aspekt.fi


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC - 7 hours


Who is online

Users browsing this forum: No registered users and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group