C++ Bitmasks – The Mess

Those who have gotten into extended conversations with me about coding have undoubtedly heard me quote Larry Wall’s Three Virtues of a great programmer.  I find the three rules both hilarious, and littered with truth.  I’m not quite sure where frustration with a task not being simple and clear enough falls into the three, probably somewhere between laziness and impatience.  Either way, I was reminded of these virtues when trying to find the perfect solution to the bitmask problem.  I want to talk about the different methods that I have seen for handling named bitmasks in C++ and if I was able to find that perfect solution.

First I want to mentions some of the goals of bitmasks, when we are working with them.  Some of these are required (*), and others are simply things that would be nice and make our lives easier.  (The numbering is not for priority, but so that I can reference them later in the post)

  1. *Store a series of boolean values as flags in a defined type
  2. *Values are a limited set of possible values that is known at compile time.
    • There are bitmasks that allow new flag values to be added to the set of possible flags at runtime, that is not part of this post.
  3. *Very small memory footprint to store
  4. *Very Fast to set/clear/check any or all flags
  5. Be able to reference individual flags by name (preferably type-checked or with some assurance that the name is valid before runtime)
  6. Simple, and clear interface (Do we know FOR SURE that this is a mask?  Can we guarantee how it is being used?)
  7. Minimal work to define a mask type (Are we required to do any special things to make this method work?)
  8. Able to be scoped (within a namespace/class/struct)

A Couple Immediate Bad Ideas

I often find it helpful to think about the worst ways to solve a problem, or at least talk about them when trying to find solutions.  So here’s a few methods that might be some people’s first thoughts on achieving the goals above, and why they don’t really meet our requirements.  These might be blatantly obvious to most of you, but I enjoy listing them anyways.

std::map or std::unordered_map

These solutions are (per the name of the section) an immediate a bad idea, because they violate #3.  The smallest value in the key-value-pair that could be stored to achieve our goal is a boolean type.  Boolean is 1 byte on most platforms, meaning that we are leaving 7/8 bits unused for each boolean, thus greatly increasing our memory footprint (let alone the overhead of the maps).

string type keys

Talking about maps brings up the fact that whatever method we choose will undoubtedly involve mapping some key (our flag name) to some value (is the flag set or not).  We would like to reference these keys by name (#5), so a string might be the first thing that some people choose.  This method is a poor choice when thinking about #4.  c-string or std::string comparisons are expensive operations, often being O(n) where n is the length of the strings.

We can optimize this comparison time out by using some form of hashed string.  However that does not hide the fact that strings are also violating another rule, #2.  Strings, regardless of how they are stored can (theoretically) hold an infinite number of possible values, and we only want a small set of possible values for our bitmasks, so that is off the table.

#define BitMasks

Alright, now let’s get to real solutions.  The first possible solution is using #defines to store the possible values and a typedef to store the mask itself:

#include <cstdint>

typedef uint8_t ImportantMask;
#define IMPORTANTMASK_ONE 1u << 0
#define IMPORTANTMASK_TWO 1u << 1
#define IMPORTANTMASK_THREE 1u << 2

void TestImportantMask()
{
    //initialize to the one and three flags
    ImportantMask mask = IMPORTANTMASK_ONE | IMPORTANTMASK_THREE;

    //set the TWO flag
    mask |= IMPORTANTMASK_TWO;

    //clear the ONE flag
    mask = mask & ~IMPORTANTMASK_ONE;

    //check the THREE flag
    bool threeIsSet = mask & IMPORTANTMASK_THREE;
}

This solution is actually pretty good.

Positives:

  • 2 We #define only the values we consider to be valid.
  • 3 We store our mask in a single byte (or larger if need be, simply change the typedef).
  • 4 Bit operators to modify flags are fast.
  • 5 We also have direct references to the set of valid values by name.
  • 6 We can clearly see when someone is using this as a bitmask, as they should be using our typedef.
  • 7 a bit unclear for the first definition, but straightforward to create after that

Negatives:

  • 8 #defines cannot be scoped
  • No type checking for mask keys.  Any integer could be dropped into this type and screw it up.
  • #defines are not immediately apparent what possible keys are stored or how to get to them without good documentation.

All that said, I don’t like this method, mostly for the last point listed.  If someone were to pickup my API and try to use this mask, they would be FORCED to read either documentation or source code to understand what possible values could be put in.  The scoping is also a serious issue, as polluting the global namespace is always very ugly.

SIDE NOTE: I could write an entire other section here about const member keys or const namespace member keys, however they have very similar issues to the #define, with the exception of the scoping issues.  However I don’t like them, because it is really unclear if values created in the manner are supposed to be used together, or for what purpose.  They also do not define a type, as our next candidate does, which also puts them on a lower bracket.

Enum Bitmasks

C (and thus naturally C++) includes enumerations as a built in type.  Under the hood, these are treated as unsigned integers, but they have the FANTASTIC quality of being type checked, scopable, and being able to reference the valid values by name in the code.  This makes them great for our purposes, as they essentially guarantee 2, 5, and 8 right out of the gate.  Unfortunately, they are not without issues.  The most noteworthy being the fact that any integer type can be cast to the enum, and placed inside, regardless of value (even if not within the set of possible elements).  However their naming clarity makes them fantastic for our desires for desires for convenient naming of the set of valid elements.

Enums as Bitmasks themselves

Since enums are stored as unsigned integers under-the-hood, we can simply treat them as a bitmask, as long as the values for each possible entry are unique bits.  We accomplish this by shifting each entry to a different bit.  I also want to preface this section with the fact that many do not think this method is appropriate for C++, as it is not always safe across compilers.

enum Test
{
    One = 1 << 0,
    Two = 1 << 1,
    Three = 1 << 2,
};

void TestClassicMask()
{
    //initialize to the one and three flags
    Test mask = static_cast<Test>(Test::One | Test::Three);

    //set the TWO flag
    mask = static_cast<Test>(mask | Test::Two);

    //clear the ONE flag
    mask = static_cast<Test>(mask & ~Test::One);

    //check the THREE flag
    bool threeIsSet = mask & Test::Three;
}

You will notice that we have to constantly do conversions here, because the bitwise operators (|, &, ^) return integer types, not our enum type (even though we know that our enum is technically an integer).  Luckily, if you are on windows, there is a define called DEFINE_ENUM_FLAG_OPERATORS that defines all these operators for a given enum type.  If you aren’t on Windows, I found a forum post with a version of the define in full or you can look it up directly in the Windows.h header file.

#include <Windows.h>

enum Test
{
    One = 1 << 0,
    Two = 1 << 1,
    Three = 1 << 2,
};
DEFINE_ENUM_FLAG_OPERATORS(Test);

void TestClassicMask()
{
    //initialize to the one and three flags
    Test mask = Test::One | Test::Three;

    //set the TWO flag
    mask |= Test::Two;

    //clear the ONE flag
    mask &= ~Test::One;

    //check the THREE flag
    bool threeIsSet = mask & Test::Three;
}

Positives:

  • 2 Enums have a limited set of possible values
  • 3 Enums are stored as unsigned integers
  • 4 Bit operators are fast
  • 5 enum entries are named, and type checked
  • 6 operations are very clear what is happening
  • 7 Each entry must be a unique bit, and the #define must be called to generate operators (both pretty manageable)
  • 8 yay scoping!

Negatives:

  • 6 There is no way to tell if we are storing a single value or a mask of values, unless the name of the variable says so.
  • Not guaranteed to be stored as the same integer type under-the-hood across compilers
  • Offsetting each element into a new bit is awkward (but not complicated)

Some would consider my first negative actually a positive, as the type is used solely for storing the potential uses of the enum, either a single value or an enum.  However, it would be nice to know when this is a mask, and when it is a single enum value.  That is just good type-safety, so it is a negative in this instance.

We can actually work around the first 2 negatives pretty easily by using a typedef for our mask type.  Doing this would also REMOVE the need for DEFINE_ENUM_FLAG_OPERATORS, as we would no longer be doing the operations on the enum values themselves, but instead on their underlying integer values.  This does return the previous type-checking issue, however.

#include <Windows.h>
#include <cstdint>

enum Test
{
    One = 1 << 0,
    Two = 1 << 1,
    Three = 1 << 2, 
}; 

typedef uint8_t TestMask; 

void TestClassicMaskWithTypedef() 
{
    //initialize to the one and three flags 
    TestMask mask = Test::One | Test::Three; 

    //set the TWO flag 
    mask |= Test::Two; 

    //clear the ONE flag 
    mask &= ~Test::One; 

    //check the THREE flag 
    bool threeIsSet = (mask & Test::Three) > 0;
}

New Negatives:

  • 7 Offsetting each element into a new bit is awkward (but not complicated)
  • 7 typedef of mask must have enough bits to contain all possible elements of the enum (very easy to change when altering the elements).
  • No type-checking of values put into mask.

Overall, these methods are some of the best and most commonly used.  They have negatives, but are simple enough to work around, with a minimal understanding of how they are defined.

std::bitset with enums

The STL includes a class called std::bitset, that is used to create bitmasks of defined sizes at compile time.  I am already a big fan of this class, so lets see how well it works with enums:

enum Test
{
    One,    //0
    Two,    //1
    Three,  //2
};

void TestBitSet()
{
    //initialize to the one and three flags
    std::bitset<3> mask;
    mask.set(Test::One);
    mask.set(Test::Three);

    //set the TWO flag
    mask.set(Test::Two);

    //clear the ONE flag
    mask.reset(Test::One);

    //check the THREE flag
    bool threeIsSet = mask.test(Test::Three);
}

Positives:

  • 2 Enums contain valid possible values
  • 3 bitsets are just integers as bitmasks underneath
  • 4 operations are bitwise operators underneath
  • 5 Enums are the names of values
  • 6 literally functions to modify state of mask, rather than bitwise operators, making the interface VERY CLEAR.
  • 8 No scoping issues
  • Works on any standard enum without defined values (because they always start enumerations at 0)

Negatives:

  • 5 Any size_t is accepted to modify the state of the bitset (set, reset, flip), meaning the reference to the enum is NOT type-checked, allowing invalid values.
  • 6 No way to know if the bitset is a mask (minus variable naming conventions) of the enum or another mask of that size.
  • 7 Must be standard enum with NO defined values internally.  Must ALWAYS know the size of the enum when creating the mask.
  • All masks will need their definitions updated if the number of items in the enum ever changes.
  • Unlike all other methods, writing a large bitset to binary can be a pain

Well, we managed to get a really nice and straightforward interface with this method, however there are some obvious large tradeoffs.  There is a workaround for hardcoding the size of the bitset, defining the count of elements within the enum itself.  Unfortunately, it introduces other issues too:

enum Test
{
    One,    //0
    Two,    //1
    Three,  //2
    Count,  //3 - ALWAYS after the last valid value
};

std::bitset<Test::Count> mask;

This eliminates the hardcoding of the mask size, however it also creates a new element in the enum.  This means that Count is considered a valid value of Test, meaning we can pass it to anything that accepts a Test, which is less than ideal.  However depending on the usage, this might be okay or even desirable.  For example, with this, we can now iterate over all values in the enum with a for loop without knowing the number of items.

for (size_t t = Test::One; t < Test::Count; t++)
    std::cout << static_cast<Test>(t);

I also realized that if we KNOW this count exists, we can create a template wrapper that can be used to hide away several of the other negatives here (also, I learned about SFINAE making this, which kind of blew my mind).  My version hasn’t been thoroughly tested yet, though it is mostly a straight wrapper of std::bitset, so it should work fine.  The new usage is now:

enum Test
{
    One,    //0
    Two,    //1
    Three,  //2
    Count,  //3
};

void TestTemplateMask()
{
    //initialize to the one and three flags
    EnumMask<Test> mask;
    mask.set(Test::One);
    mask.set(Test::Three);

    //set the TWO flag
    mask.set(Test::Two);

    //clear the ONE flag
    mask.reset(Test::One);

    //check the THREE flag
    bool threeIsSet = mask.test(Test::Three);
}

Positives:

  • 2 Enums contain valid possible values (values are type-checked on input to all functions)
  • 3 bitsets are just integers as bitmasks underneath
  • 4 operations are bitwise operators underneath
  • 5 Enums are the names of values
  • 6 literally functions to modify state of mask, rather than bitwise operators, making the interface VERY CLEAR.
  • 6 Template also makes it obvious what this is a mask of.
  • 7 Any standard enum can now be masked with minimal work (Add the Count element)
  • 8 No scoping issues

Negatives:

  • Enum MUST define a Count element as the last element in the enumeration.  (Very awkward if not needed)
  • Enum MUST either not use defined values, or ensure that all values [0, Count) are used.  (if defined element values skip certain numbers, those skipped numbers still add toward the Count element, as enumerating continues from the last element by the compiler)
  • Templates mean that invalid enums are not recognized until compile time. (No intellisense help here)
  • Unlike all other methods, writing a large bitset to binary can be a pain Edit: writing this to binary is actually no more difficult than writing any of the other masks (because the underlying data is already in a format we want to to be for storage).

I am generally happy with the conclusion of this method.  I am sure that we could work around this a bit more and make it a bit more dynamic, but this is about the end of the road in direct simplicity for it.

Conclusion

So, I guess I have to come to some sort of consensus on what method I prefer here.  Honestly, I think that any of (the types I talked about the most here…) the classic bitmasked enum methods, or a standard enum with a Count and the templated mask would be valid solutions.  They all accomplish nearly all our goals, have simple interfaces, and are quite optimized (both speed and memory).  They all have a bit of awkwardness with their definition, but nothing too unwieldy.  So, I guess we didn’t find a perfect solution, but we found several pretty good ones.  At least, good enough to satiet a bit of my laziness and impatience with this problem.

Leave a Reply