Hacker Timesnew | past | comments | ask | show | jobs | submitlogin
Ill-Advised C++ Rant, Part 2 (2016) (codersnotes.com)
61 points by signa11 on Oct 25, 2017 | hide | past | favorite | 73 comments


> Calculating the size of static array

In C++17 you can do

    #include <array>

    std::array myarray{ 4, 5, 6, 7, 8, 9 };
    int my_size = myarray.size(); // returns 6
which even deducts the type `std::array<int, 6>` for you.


You can also use the std::size() free function to get the size of a C array, an std::array or any other standard container.

http://en.cppreference.com/w/cpp/iterator/size


A free function for this looks even better IMO.


or just use std::extent which type safely gets the size of arrays (though most of my new c++ will probably be using std::array)


> Calculating the size of a static array.

  #include <type_traits>
  int myarray[] = { 4, 5, 6, 7, 8, 9 };
  int x = std::extent<decltype(foo)>::value;  // returns 6
> Converting a string to an enum

Generally a bad idea, as it couples a protocol (whatever is feeding you the strings) to symbols in your language. Often, one of those has constraints that the other cannot meet.

> Binary file includes

  objcopy -I binary -O elf64-x86-64 -B i386:x86-64:intel in.bin out.o
Not the prettiest, but, why should a language do the linker's job?

> FourCC codes

How often does this come up, that you can't write a 10-line class to do what you want? Not to mention the obvious issue of endianness in the example given. Last time I had to do this I was writing a PNG parser, which was not something I should have been doing.

> Types as first-class primitives

I've been coding for 25 years and never needed nor wanted this. I know not everyone agrees, but personally I find reliance on RTTI to be a code smell. (Granted, pre-C++17 it's kind of forced due to lack of a proper variant type.)

(I realize the above is opinionated, but… so is the OP! :)


> Not the prettiest, but, why should a language do the linker's job?

Because it's not cross platform. I have no idea how to do the equivalent in Visual Studio.


Point taken, though I would think MinGW's objcopy should be able to do something similar. (Obviously not part of Visual Studio.)


Strictly speaking I think the term for what's requested would be reflection - but I don't think these terms are especially precise ones.

The sorts of things I've used reflection for:

- logging - getting the name of things' types, turning enums into strings, etc.

- generating debugging UI - for every bool field in your object, you get a check box; for every enum field, a combo box; for every string field, a text box; and so on

- serialization/deserialization - it has limitations, but you can get surprisingly far before you run into them, and for many situations it doesn't matter

- scripting language integration - for every type of interest in your program, find its fields and methods, and link those up to the scripting system

- dependency injection - given a set of known services, when creating an object (perhaps by name, provided at runtime, which you match to a type by scanning the list of types available) provide it with the services it needs (determined by, say, examining the types of the parameters of its constructor). Typically you'd cross-reference types of parameter with types of each service to match them up, but there are various options

Especially good with a C#-like metadata system that lets you assign user-defined properties to fields that your calling code can query to get further info. For example, you might not want to serialize certain fields, or you might want to hide some fields from the UI, and so on.

(If this doesn't convince you, fine; for my money, though, this is the #1 missing feature from C++. But, like the post's author, my background is working on video games... maybe this is just stuff that only video games people would want?)


enum→string I agree 100% is useful for debugging. string→enum, and enum→string for production purposes I find questionable, since they couple your implementation internals to an externally-visible protocol, of which a change in one should not necessitate a change in the other. Of course relying on an implicit mapping speeds initial development… but I'm the one who has to maintain those things ;)


> objcopy -I binary -O elf64-x86-64 -B i386:x86-64:intel in.bin out.o

Neat! In the past I've used perl scripts to munge into C source.

FYI, it generates symbols named after the filename:

  tlb@london:~/ur$ objcopy -I binary -O elf64-x86-64 -B i386:x86-64:intel /etc/services services.o
  tlb@london:~/ur$ objdump -t services.o

  SYMBOL TABLE:
  0000000000000000 l    d  .data	0000000000000000 .data
  0000000000000000 g       .data	0000000000000000 _binary__etc_services_start
  0000000000004c95 g       .data	0000000000000000 _binary__etc_services_end
  0000000000004c95 g       *ABS*	0000000000000000 _binary__etc_services_size
And in C++ you can get the contents with:

  extern const char _binary__etc_services_start;
  extern const char _binary__etc_services_end;
  string services(&_binary__etc_services_start, &_binary__etc_services_end);


> Generally a bad idea, as it couples a protocol (whatever is feeding you the strings) to symbols in your language.

Is it intrinsincally bad though ? For instance F# is able to couple actual data files like CSV to symbols and types in the language, and this allows for very efficient development: http://fsharp.github.io/FSharp.Data/library/CsvProvider.html

> I know not everyone agrees, but personally I find reliance on RTTI to be a code smell.

Again, why ? There are plenty of cases where this has proven quite useful. Besides, "types as first-class primitives" does not necessarily imply "run-time": in the C++ spirit it would most certainly be a compile-time only feature.


> For instance F# is able to couple actual data files like CSV to symbols and types in the language

Of course that breaks once your CSV header names don't match allowable symbol names. Notice the example linked doesn't show accessing the "Adj Close" field. (Presumably there is a way to get at this in F#? There's no way to "escape" characters in symbols in C++ though.)

Sure, it speeds development, but first time a requirement comes down that says "foo_bar field must be named foo-bar", what should be a one-character change suddenly becomes something much more.

> why ? There are plenty of cases where this has proven quite useful.

"Useful" and "bad idea" are not mutually exclusive. A lot of things which are "useful" – i.e., which speed development or save keystrokes – have costs which hamper maintenance. (Think macros.)

RTTI generally implies that your program has logic to do something based on the type of a value, rather than on the meaning of a value. Say I have an object, BasketOfApples, with a field size_of_basket, which can be either an integer or a string. If it's an integer, it refers to the number of apples the basket can hold. If it's a string, it's either "small", "medium", or "large". OK so far.

But now I want my BasketOfApples to also be able to have a size defined as the weight in pounds it can hold. I can't use an integer again, that already has meaning! But I want a numeric type. So I'm forced to either choose a distinct numeric type (say float), or use an enum to distinguish the cases. The former forces me to make a suboptimal design decision. The latter is probably what I should have done in the first place.

So now my BasketOfApples has an enum field, type_of_basket_size, which determines the meaning of the basket_size value. Not only does type_of_basket_size do what RTTI failed to, it obviates the need for it, and is likely a bit more efficient to boot.

(BTW, you see this all the time with C++ constructors. You can't have two constructors which mean different things but accept the same types without using a dummy "tag" argument.)

Languages with tagged variant types (e.g. OCaml, and I presume by extension, F#) don't have this problem and don't need (or even supply) RTTI information, and they get along just fine. To me, using a feature that isn't available or missed in other "serious" languages, which incompletely accomplishes the task to which it's set, is a code smell.

BTW, I recognize that not everyone agrees with me on this point. But I've been coding like RTTI doesn't exist for decades in various languages and not once have I missed it.

> Besides, "types as first-class primitives" does not necessarily imply "run-time": in the C++ spirit it would most certainly be a compile-time only feature.

The OP's example (copying an object without knowing its type at compile time) specifically requires RTTI.


> Of course that breaks once your CSV header names don't match allowable symbol names. Notice the example linked doesn't show accessing the "Adj Close" field.

In F# you can escape symbol names with double backticks. In the example linked, accessing the "Adj Close" field would be done like:

    firstrow.``Adj Close``


> Languages with tagged variant types (e.g. OCaml, and I presume by extension, F#) don't have this problem and don't need (or even supply) RTTI information,

I'm not sure about F#, but I'm almost 100% certain that OCaml does have a runtime "tag" to distinguish cases (and indeed does so at runtime when e.g. pattern matching). That would qualify as a type of RTTI in my book.

(There really isn't way around this in general, except for certain trivially optimizable cases like one-constructor types, etc. But even in those cases runtimes may choose to not do the optimization just to have a uniform runtime representation.)


OCaml has explictly tagged variant types (like Java enums). There is no untagged variant type, and all types must be known at compile type, therefore there is no means to match based on "type" of a value.

OCaml does have hidden RTTI, but it is not what you think it is. It exists only so the garbage collector can distinguish the various built-in data types, and is completely opaque to user code (save the Obj module). e.g. all structures have the same tag:

  # type foo = { a: int; b: string };;
  type foo = { a : int; b : string; }
  # Obj.tag (Obj.repr { a = 5; b = "foo" });;
  - : int = 0
  # type bar = { x: int };;
  type bar = { x : int; }
  # Obj.tag (Obj.repr { x = 1 });;
  - : int = 0
  # Obj.tag (Obj.repr 123);;
  - : int = 1000
  # Obj.tag (Obj.repr "hello");;
  - : int = 252
In particular, this tag is certainly not used for pattern matching of anything but explicit variants, where it is used to distinguish values, not types, and hence is not RTTI.

[1] https://realworldocaml.org/v1/en/html/memory-representation-...

[2] https://caml.inria.fr/pub/docs/oreilly-book/html/book-ora115...


OK, so I'm probably just confused here, but: It's not particularly surprising that structures have the same tag, but sum types (aka variants) do. (As you note.)

That's certainly runtime-information about the types of values, but I suppose our disagreement may be about the exact meaning of RTTI. After looking at the Wikipedia entry, I see that my intuition doesn't align with the C++ 'definition' of RTTI, either so thanks for correcting that misconception. It actually looks more like Haskell's Typeable in that it (can apply)/(applies) to any type even primitive types. Of course Typeable is opt-in and stored separately from the values, so...

I'm still not sure I understand your objection, but maybe I just haven't read your OP closely enough and/or thought about it long enough.


The information in a variant has nothing to do with the type. I can declare:

  type foo = Bar of int | Baz of int
And the two variant cases remain distinguishable, despite both having fields of the same type, and both being members of the same type. No information, runtime or otherwise, about the type, is involved. Therefore it is not by any definition RTTI. Pattern-matching is performed solely based on the explicit tag names.

Consider that OCaml variants are in effect no different from the standard C pattern of enum + union. And I think we can agree that C has no RTTI whatsoever.

Now if I could write:

  type foo = int | string
Then pattern-matching that would require RTTI. But OCaml doesn't support such things.


You're just reiterating what we agree on. (I think I stated as much, but maybe not.)

EDIT: Just in case: We do agree that certain programs in dependently typed languages would require such reification of type information, yes? That is, we're talking about a run-time decision about choosing a type (or another type), yes?


I'm talking about run-time introspection into the type of a value. That is, the ability to ask the question, "is this arbitrary value an integer or a string?" That is what the OP needs for his example (clone an object of unknown type). That is something you need RTTI for, and something that I claim is better served by explicit variant tags.

I don't know what you mean by "run-time decision about choosing a type (or another type)". Can you give an example of some code that does this?

I also don't know what you mean by dependently typed languages requiring RTTI. Dependent type systems, like HM and other type systems, only constrain programs, they do not enable any behavior. Any program valid in an arbitrary dependently-typed language is still valid if the typing system is removed, rendering the language untyped (and thus devoid of any RTTI). (Specifically, DT adds to HM the ability to write expressions whose type or validity depends on other values. It does not grant ability to introspect those types; that is out of the scope of a type system.) Example: Coq is dependently typed, but has absolutely no notion of RTTI.


> Dependent type systems, like HM and other type systems, only constrain programs, they do not enable any behavior.

The question arises when you have languages that compile to native code: you want your constraints to still be valid when you load new types in your program at run-time.

eg

    obj = dlopen("libfoo.so")
    factory = /* insert some cast here */ dlsym(obj, "some_factory_function")
    a_value = factory.create_value()
    some_constrained_function(a_value)


What part of that example requires RTTI? Clearly the type of `factory` (and thus the type of values it produces, even if they vary with input values) is known at compile time, thanks to the cast. And confirming that the dynamically loaded code adheres to that type is a loader problem (which can be solved with metadata), not a language/runtime problem (which would require RTTI and the overhead of checking it).

The point of a static type system, either HM or a DT system, is to ensure that the types are either known at compile time, or derivable from some other runtime value (e.g. input to a function). This is for two reasons: safety, and the ability to elide RTTI. (Note that a type system forbids you from doing anything with a value with a dependent type until it finds its way into a conditional branch which simplifies its type to a non-dependent one.)


> thanks to the cast.

well, exactly: languages are able to do without the cast at all, just extracting type information from the runtime, which is much safer.


You wrote the example, not me. I'm confused what your argument is.

Are you saying that dynamic loaders should check the type of dynamically loaded code against the type expected by the code loading it? I agree that's a good idea, but it doesn't require language support for general RTTI, just a type-aware shared-object format and dynamic loader. (Heck, C++ symbol names already incorporate function argument type signatures, so this isn't a huge stretch.)

Are you saying that a runtime shouldn't need to trust that dynamically-loaded code actually implements the interface it claims to? If so, that's an incredibly specific requirement that's really better served by sandboxing or an RPC-like interface (especially in C/C++).


Or use the std::size free function. It's overloaded to work with C-style arrays.


Did not know about that, though unfortunately it's C++17 only.


Unfortunately yes. The implementation is straight forward though:

    template <typename Container>
    std::size_t size(Container const& c) { return c.size(); }

    template <typename T, std::size_t K>
    std::size_t size(T (&)[K]) { return K; }


OP complains about C-style arrays when criticizing C++. Just using `std::array` makes life easier.


Yeah that one is a little weird. C++11 included std::array to solve that problem (among others), but he doesn't use it and then complains about it anyway.


Agreed, though unfortunately pre-C++17 you must specify the size when defining the array.


Totally agree with the author. I have started coding again in C++ after a 10 year detour to python.

Can we have basics first, and then robust libraries for say, as simple as networking.

Batteries included, is a thing, and is something a lot of programmers like me look for when choosing between languages.


Unfortunately, I have already heard complaints about the size of the C++ standard library, and how "I can write a better vector/list/etc.. class". I can't imagine what people would think of a networking module, designed by committee and full of compromises.

I don't envy the members of the standards committee. It is a hard and very thankless job.


I've been a hardcore C programmer for decades and just got into C++11 this past year. I LIKE the standard library.† It's pretty much just basic data layout stuff that I've done 100 times manually in C, nothing more, nothing less. Leave the complicated data structures and libraries to 3rd parties. std::vector is already exactly what I want (though it could do with one fewer specialization ;). There's pretty much one way to implement it and they got it right.

On the other hand, there's 100 ways to implement, say, an HTTP client, and they're all wrong.

† Except for iostreams. Ugh.


One way to solve the do/while(0) problem is by using a “statement expression”. It's a GCC extension, and I don't know the reason it has never been accepted into the standard. Here's how it works:

    #define my_assert(X) ({ if (!(X)) error(); })
Another way is to use a lambda expression, and then call it immediately:

    #define my_assert(X) ( [=]() { if (!(X)) error(); } )()
Because the lambda captures by copy, it will catch inadvertent side effects in some cases.

You can also use the conditional operator in this case:

    #define my_assert(X) ( (X) ? error() : ((void)0) )


The last one is backwards, should be:

    #define my_assert(X) ( !(X) ? error() : ((void)0) )


I think Rust solves a lot of these problems.

> Calculating the size of a static array

some_array.len()

> Getting the maximum value of an enum

Rust's enums are strongly-typed (I think that's the right phrase), so this is a non-issue.

> Converting an enum to a string

impl Display for MyEnum (and then call to_string() on it)

> Converting a string to an enum

This could be tricky, since Rust enum variants can contain data. But if none of the variants contain data, it would be easy to make a method that returns a Result<MyEnum, CustomParseError>

> Binary file includes

include_bytes! macro (or the include_str! macro, depending on the exact scenario)

> Switch/case is still stupid

Pretty sure Rust's match is a lot better


> This could be tricky, since Rust enum variants can contain data. But if none of the variants contain data, it would be easy to make a method that returns a Result<MyEnum, CustomParseError>

Usually when you want to convert a string to an enum, you're trying to build some kind of parser or deserialization library, in which nom or serde are the best choices. (Of course, C++ has several parser frameworks too.)


Yeah, I would definitely be using serde in that case


This person is not a very good C++ programmer. Also for some of the things he mentions, it will be covered by the Reflection TS.

1. Size of a static array: Use std::array please

2. Maximum value of an enum: See reflection TS

3. Converting enum to string: ditto

4. Converting string to enum: ditto

5. #pragma once: see modules. They are in fact trying to make something better. There are also some cases where you don't want/need it

6. C99 designated initializers. They are there!

7. Binary file includes. Ugh this is just such a dumb idea I can't even. When do you want the file to load? When the executable does? What if you want to read it in chunks? This idea is not well suited for a systems-oriented language

8. FourCC codes. Again, this is a silly idea not even worth addressing. Just define it yourself. C++ is a programming language. Not your media toolkit

9. do/while(0). What are you using macros for in 2016 (the time of writing)?

10. preprocessor gripes. Stop using it! You have <type_traits> and stuff now

11. Switch/case is stupid. No it's not. You want fall/through cases as well and if you can't see why, don't criticize. This is true in pretty much EVERY language that supports switch/case

12. Still no strong typedefs. There are! Use the explicit keyword on a constructor definition.

13. Types as Primitives. See reflection TS

Basically, this feels like any other "C++ rant." Someone who didn't learn the language properly or respects what the language needs to achieve.


> This person is not a very good C++ programmer.

Am I a bad C++ programmer for agreeing with some of it?

> 5. #pragma once: see modules. They are in fact trying to make something better. There are also some cases where you don't want/need it

The fact that modules have taken so long, and still have prominent members of the community saying they need to be sent back to the drawing board, is a legitimate criticism.

> 7. Binary file includes. Ugh this is just such a dumb idea I can't even.

It's a useful feature, much used in the language I work on, allowing you to distribute a program that needs just one or two static data files as a single executable. It's also good for packaging data as libraries: for instance, Unicode libraries need a lot of random tables and it can be convenient to store these in binary form. Having to embed binary in C/C++ syntax as, say, ICU does just adds parsing overhead at compile time for no reason.

> 9. do/while(0). What are you using macros for in 2016 (the time of writing)?

There are lots of use cases of macros, too many to list here. Templates don't always suffice.

> 11. Switch/case is stupid. No it's not. You want fall/through cases as well and if you can't see why, don't criticize. This is true in pretty much EVERY language that supports switch/case

I designed much of the pattern matching system in an industry language without fall through in switch case and I have never missed it. Nor has anyone else in the community I know of.


C# lets you do "goto case XXX" to go to another case, and empty cases fall through to the next. This is a bit inconsistent, but it at least retains the useful fall-through case, gets rid of the usually-undesirable case, and still accommodates the occasional exceptions.


Yep, I also think modules are still relatively far away. I also hope that the C++ community can embrace some package management system.. at some point in time. Hopefully before I die


I think you can regard switch/case as syntax sugar for a series of if/else. So fallthrough may not be needed but more sugar is nice to have available.


Some good points. But "See reflection TS", really? That lands only in C++20. This article is from 2016.


Yeah, but look at when C++ started getting regular releases.

These are hard problems, and the things in 11, 14, and 17 are useful. Ultimately, people [on the committee and developing the language] are prioritizing what they need.

It's not a closed process; it's just a slow one.


Languages where switch-statements or switch-expressions don't fallthrough by default include Ada, C#, E, Go, Monte, Perl, Swift, and I think basically every flavor of Pascal.

"Pretty much every language," right? C++ is right in there with C, JavaScript, and PHP.


This comment is hilarious. C++ is more than 30 years old now, and you're expecting users to have mastered proposed future enhancements?


If you're going to complain about missing features in the language, you should probably be aware of what plans are being made for the language's future...


Dude, for code I'm writing today, your future solutions are worth nothing.


Most of these features are available in the core language already. The ones that aren't are implemented in the form of libraries. Like every other language...


> This person is not a very good C++ programmer.

This is the biggest thing that is wrong with C++. Everything is always stupid human's fault. :D


> 7. Binary file includes. Ugh this is just such a dumb idea I can't even. When do you want the file to load? When the executable does? What if you want to read it in chunks? This idea is not well suited for a systems-oriented language

This can be really useful in embedded devices where there is no file system but I don't really see the need for it to be part of the language, every time I've needed it the toolchain came with build tools to handle this.

> 11. Switch/case is stupid. No it's not. You want fall/through cases as well and if you can't see why, don't criticize. This is true in pretty much EVERY language that supports switch/case

a smarter option may have been to require an explicit break or fallthrough keyword but we're a good 40 years late.

>12. Still no strong typedefs. There are! Use the explicit keyword on a constructor definition.

Define a new wrapper class (maybe I'm missing something)? It would be a lot nicer if you could do something like `typedef explicit int FooId;`


> Define a new wrapper class

For simple types, for which this makes most sense but from which you cannot inherit, it is a lot of work.


Just to play devil's advocate, you could make a template for defining such wrappers. But then, it might just as well be built into the language for elimination of such boilerplate.


What's Reflection TS?


It took me a minute with a few googles, but the TS in this case means "Technical Specification", which helped out a ton with searching.

Unfortunately, I'm still not 100% this is what they're referring to, but I think this is it.

https://isocpp.org/files/papers/n3996.pdf

EDIT: Here it is :D

https://www.meetingcpp.com/blog/items/reflections-on-the-ref...


It's a proposal for inclusion into the C++ standard.


This is kicking a dead horse here. IMO, almost everything about C++ is terrible in this day and age. I'm so happy Rust exists and is doing well. I'll do anything in my power to never again create another C++ project.


You can use user-defined literals for a portable way to write four-character codes:

    constexpr std::uint32_t operator "" _4CC(const char* str, size_t len) {
        std::uint32_t result = 0;
        for (size_t i = 0; i < len; ++i) {
            result = (result << 8) + (std::uint32_t)(unsigned char)(str[i]);
        }
        return result;
    }

    auto const movie = "MooV"_4CC;


>Here's an example implementation for you. If you're using Visual C++, there's a built-in version. They could literally cut-n-paste this one line into <stddef.h> today.

>#define countof(X) (sizeof(X) / sizeof((X)[0]))

Could they? What if X is empty?


Still works. Whether X has any elements is irrelevant, as sizeof doesn't evaluate the expression. Instead, it returns the size of the static type of the expression.


an empty struct has a size of 1 byte.

    struct Widget{  
    };

    Widget widgets[256];
    printf("%d %d", sizeof(widgets), sizeof(Widget));
prints out

    256 1
if there are 0 elements, there are no issues because it's 0/x = 0.


the point is that they're indexing into an array of unknown size.


indexing in that case doesn't trigger a dereference, since it's being passed to the sizeof operator.


>Converting string to/from enum

It's generally a bad idea to do this, and an enormous headache to even attempt. If you want to do this, maybe you don't want to use an enum, because that's not what they are for. Maybe you want a hash map instead?


It can be useful for debugging and (de)serialization.


Maybe you just want to make an error message from an enum? OK string->enum could get tricky, but enum->string should be straightforward. Please explain why it would be a bad idea.


Why is it a bad idea? And if it's a headache, isn't that exactly when the language itself should do it so the developer doesn't have to?


How is it a good idea? Not everyone needs this functionality.

I wonder how much effort might be required to implement a purely opt-in syntax/library for the purpose: something like Ada's Enumeration_IO.


When you need it the compiler knows you need it. If you have enum Pet { Cat, Dog } and you write

    cout<<nameof(Dog)
The compiler could just replace it with the string value at compile time.

If you have a dynamic one the compiler could just make an array (or worst case hash table lookup) into a static array created at compile time. It's fairly simple and costs nothing if you don't use it. And it's a lot simpler and more elegant to do through the compiler than through a lib, metaprogramming etc.

And how useful is it? it's at least useful to be part of how most other languages work these days. An enum is should be seen as a list of names and values.

It shouldn't just be limited to be a glorified way of saying

    const int Cat = 0;
    const int Dog = 1;
Defined like that I'd find it completely unnatural if the symbol names were available at runtime.


It has a cost in the size of the object file or executable at least. What if I don't want my data segment being polluted with enum-string-values or, worse, your secret array/hash-table implementation, regardless of how little extra it is?

Something like this ought to be opt-in and explicit.


If you use nameof(Cat) then the string "Cat" is included. If you use nameof(some_pet) then the array ["Cat", "Dog"] is included. It's zero cost for something you don't use and it's exactly the same cost as you would have paid by manually adding this.


Ah, right, I see: the compiler only adds these things if it detects that it's necessary (i.e. there must be at least one use). Can't argue with that.


Unfortunately, in C++ even the notion of the string type itself is not part of the language...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: