An usable and maximally efficient C standard library

asveikau · on Dec 4, 2023

There are red flags in some of the function signatures that the author doesn't understand C.

For example functions that I guess are meant to be generic arrays that take int*. You could write volumes on how to approach generic arrays in C but this is certainly not one of them.

Also the use of "c standard library" is weird here. Any C programmer reading this phrase assumes a reasonably standard conformant libc. This is a bunch of interfaces the author made up.

Sorry for being negative but this doesn't pass a "production quality c code" at-a-distance sniff test.

eklitzke · on Dec 4, 2023

The code isn't very well written either.

For example, the various array printing functions (which are at the top of the file, first thing I saw) call putchar/printf in a loop instead of allocating the entire string and printing it with one call. For a HUGE array that doesn't fit in memory this is the right approach, but for the other 99.9% of the time you're printing a smallish value (otherwise you probably wouldn't be printing to stdout in the first place?) this is going to be way slower than allocating the string and printing it once, because it's going to make way more system calls. Also most libc implementations guard stdio access with a mutex, so you're going to have to do a bunch of mutex acquires as well.

Plus why don't these functions take a FILE*? What happens if you want to print to a file or stderr?

dekhn · on Dec 5, 2023

printing typically goes to buffered streams, and array printing is not a system call.

If you want to print to stderr, you write to fd 2.

klyrs · on Dec 4, 2023

I spotted several red flags in the title alone and I'm not even counting "an usable" that probably resulted from HN censoring "actually". The word "actually" is only a yellow flag.

asveikau · on Dec 5, 2023

"an usable" is a common category of mistake from non native English speakers that I see frequently and I don't judge for it.

Often I find myself trying to politely explain the rule. The "n" appears when there is a phonetic vowel next, not necessarily a written one. The first letter of usable is a vowel in spelling but phonetically it starts with a consonant, the international phonetic alphabet for it is /ju/, starting with the consonant [j].

Similar things happen with abbreviations. You get prescribed "an SSRI" and not "a SSRI" because "S" there is pronounced "es", starting with a vowel.

Strangely enough I find when I over explain like this about this specific topic, people thank me. Maybe since I'm way off topic I'll get crap for it here.

klyrs · on Dec 5, 2023

Yeah, no thanks from me. If you read the actual page title, and reread my comment, you'll see why.

asveikau · on Dec 5, 2023

Oh, sorry. I guess I see this so much online from the English as a second language crowd that this is what I assumed it was.

anonymous_union · on Dec 4, 2023

who could live without concatenate_five_strings()?

brennopost · on Dec 4, 2023

For me the red flag is the proportional font on the screenshot

petee · on Dec 5, 2023

For me its dualing implementations of is_alpha & is_string_alpha (and other similar pairs) that do the same thing but one is a 'for', and the other a 'while' loop

CelticBard · on Dec 5, 2023

I think is just a personal project, at least initially.

asveikau · on Dec 5, 2023

Yes, but it's phrased that "finally" there is a library that solves the author's problems, as if we've all been waiting for it in 50 years of C being a thing. When it's kind of a learning project for them. I don't fault them for putting their learning on GitHub. But it's phrased in a bit of a grandiose manner.

acuozzo · on Dec 4, 2023

Pretty good work so far. Some nitpicks…

The library uses malloc rather liberally, but rarely (if, ever) checks the result. Allocation functions failing to allocate is a recoverable state¹.

In addition to this, the library does not provide a builtin mechanism to override malloc from the stdlib. Freestanding targets exist! Also, on hosted targets, who isn't using aligned_alloc in at least some places nowadays?

Having to do the following song & dance sucks.

  #define malloc xmalloc
  #include <yourthing.h>
  #undef malloc

[1] I'd wager most C programmers today write programs in C to exercise as much control (maximal determinism?) as possible without having to write in *-Assembly. Yes, I'm aware that its abstract model of computation and semantic complexity make it anything but "low-level", but C can get rather close on some architectures with some compilers.

allanrbo · on Dec 4, 2023

Fine looking general helper functions, but I find it a bit confusing to call this a "standard library". That has a particular meaning in C/C++ already.

linkdd · on Dec 4, 2023

It's not a "standard" library though...

It's a nice "toolbox", but definitely not "standard" (word which has a very specific definition).

saagarjha · on Dec 4, 2023

This is cute, and probably useful for hobby projects. If I may, it would be nice to have slightly more generic versions of some of the interfaces, and perhaps consider dropping "maximally" efficient from your tagline. Not only is it cleaner, you are less likely to get yelled at by people who focus on code like this:

  /* Function to concatenate multiple strings into a new string */
  static inline char *concatenate(const char *str1, const char *str2)
  {
      size_t len1 = strlen(str1);
      size_t len2 = strlen(str2);
  
      char *result = (char *)malloc(len1 + len2 + 1);
      strcpy(result, str1);
      strcat(result, str2);
  
      return result;
  }

krackers · on Dec 4, 2023

If you want a sledgehammer, CoreFoundation is open sourced and works on Linux.

cellularmitosis · on Dec 4, 2023

Think Different

samsquire · on Dec 4, 2023

Thank you for this.

I really enjoy C (I'm a beginner).

Segmentation fault.

I should probably be using Rust. I really like how C maps to the machine and I find the semantics simple to understand compared to type systems.

The semantics of this library and lodash and other functional JavaScript patterns could probably be mapped to something that resembles a database query engine for general data structure traversal. Then the computer can work out how to allocate memory and what traversals are faster. And what traversals or queries are equivalent, a bit like Prolog. Every program is a compiler :-)

The boilerplate in C such as hashmaps, async, event loops, libuv, visitor pattern, indirection and so on I haven't yet wrestled with much yet.

RcouF1uZ4gsC · on Dec 4, 2023

Some observations:

1. Who owns the memory? How is the memory freed?

2. C doesn't have a lambda syntax, so using functional idioms like map, etc is a bit more boilerplate.

3. Lack of generics is painful (print_ushort_array, print_double_array, print_ulong_array, print_long_long_array)

_benj · on Dec 4, 2023

Oh, I like this!!

I might need to look deeper but how is memory management done?

It seems like there’s some allocation happening on the background? For example, the split() or string_to_json(), some memory is happening somewhere.

I ask for more than curiosity, I wonder if this could be used in embedded development.

Either way, seems like a fantastic toolkit for working in C!

I’m the most excited about the string functions, because idk why but I find myself doing a bunch of string manipulation quite often, and not having to jump out of C or manually implement algos sounds awesome!

eesmith · on Dec 5, 2023

> how is memory management done?

It always uses malloc/free with no way to specify your own allocator, it nearly always assumes malloc succeeds, and when it does check it thinks malloc(0) returning NULL is an error, it prefers to malloc the maximal possible size then realloc smaller, and it does not document the API's memory ownership policy.

(For example, you have to read the code to learn that sorted_strings returns a sorted copy of the input strings, rather than the sorted string array in-place.)

A function like longest_common_prefix always mallocs space, when I would think just knowing the prefix size is more useful in an embedded program.

> Either way, seems like a fantastic toolkit for working in C!

Do take care as it appears to have a lot of small issues.

Take https://github.com/gregoryc/standardlib/blob/4fb308a5716927e... for example. It processes each regular file in a directory and saves the result to the file:

            char *file_data;
            char *new_str = map_function(file_data = read_file(entry->d_name));
            free(file_data);
            write_file(entry->d_name, file_data);
            free(new_str);

This should write new_str to the file, not file_data, which has already been freed.

Going back to memory ownership, you can also see there is no documentation that map_function must return memory that will be freed by the caller.

Digging deeper, "read_file" uses fopen(filename, "r") to open a file, then a SEEK_END/ftell() to get the size. There is a minor issue that ftell() may report -1 (on my machine it returns -1 for a fifo), causing a malloc(0) which is checked against NULL, and may falsely report a malloc failure rather than being an issue with how it tried to predict the size instead of using dynamic growth.

(The places which use read_file first checks it's a real file, but the file can be changed between the check and its use.)

I also think there's an issue with shellescape as it does not quote backslashes. I think the version in Python's shlex.quote() is more robust.

Then there's the lack of documentation, like, do you expect startswith() and endswith() to be case-insensitive? I don't. Why would you use str_to_double() instead of atof(), and why is there no locale-independent variant?

Quite a few of these seem like they should never be used, like "count_vowels()", which returns 0 for "sky" and 1 for "Brontë". And what is the difference between "is_string_alphanumeric" and "is_alphanumeric"?

Quite a few of these functions might better be written using C11's _Generic (https://stackoverflow.com/questions/479207/how-to-achieve-fu... .

_benj · on Dec 5, 2023

Thanks for the reply and analysis!

secondcoming · on Dec 4, 2023

It would be nice if functions that take a `const char*` parameter also took a `length` parameter where appropriate.

There's a whole lot of `strlen`'ing going on!

Cool though

typon · on Dec 4, 2023

Any maximally efficient C library should ideally do zero allocations and if it needs to, ask the user how they manage their memory.

csdvrx · on Dec 4, 2023

to me, that'd be jart cosmopolitan libc so if I mean to fopen() I can fopen

https://justine.lol/cosmopolitan/documentation.html#fopen

gren236 · on Dec 4, 2023

That's the most "C" readme that I've seen. Looks really interesting though.

tedunangst · on Dec 4, 2023

sorted_ulong_long_ptrs, where have you been all my life?

dhooper · on Dec 4, 2023

“Maximally efficient”…uses zero terminated strings. Facepalm