Sp.h is the standard library that C deserves

(spader.zone)

77 points | by dboon 2 days ago

17 comments

  • nextaccountic 2 hours ago
    > Principles

    > Be extremely portable

    > sp.h is written in C99, and it compiles against any compiler and libc imaginable. It works on Linux, on Windows, on macOS. It works under a WASM host. It works in the browser. It works with MSVC, and MinGW, it works with or without libc, or with weird ones like Cosmopolitan. It works with the big compilers and it works with TCC.

    > And, best of all, it does all all of that because it’s small, not because it’s big.

    vs

    > Non-goals

    > Obscure architectures and OSes

    > I write code for x86_64 and aarch64. WASM is becoming more important, but is still secondary to native targets. I don’t care to bloat the library to support a tiny fraction of use cases.

    > That being said, if you’re interested in using the library on an unsupported platform, I’m more than happy to help, and if we can make the patch reasonable, to merge it.

    Those are contradictory. Either the code is extremely portable, or it can't support "obscure" platforms, but not both.

    • dboon 31 minutes ago
      There are very few C libraries which compile, stock, against the matrix of toolchains, ABIs, and operating systems that this library does. For the subset of machines which run, I don't know, 99.9% of all instructions (i.e. x86_64 + aarch64, Linux + Darwin + Windows), the library just works. This is a definition of portability. Why would portability be a binary of supporting every possible system or being hard tied to a single one?
      • AlotOfReading 15 minutes ago
        The natural comparisons are libraries like glibc and newlib, which do support lots of architectures and more importantly make porting to new architectures or taking advantage of platform features pretty straightforward.
        • scott01 3 minutes ago
          I’m not as experienced as some people here, but in ~10 years, I’ve never needed to write code for anything other than x86 or arm. So I agree with the author on their priorities.
    • ktpsns 2 hours ago
      Exactly. This shows that "extremely portable" is actually marketing for "It supports a number of platforms. In my opinion, this number is big".
      • noosphr 1 hour ago
        We support extreme portability for sufficiently large values of two.
      • lelanthran 18 minutes ago
        > This shows that "extremely portable" is actually marketing for "It supports a number of platforms. In my opinion, this number is big".

        The number might just be zero - did anyone check if this compiles? I am trying to track down where the function `sp_mem_allocator_alloc_type` is defined (used in 3x places) but it doesn't appear in the GH search results.

        I'm not going to clone and build this (too dangerous).

    • kazinator 50 minutes ago
      You can be portable, without supporting obscure platforms.

      Supporting obscure platforms is what makes portability "extreme", though.

    • riedel 1 hour ago
      I could not even find a mention what platform it supports. There is a Linux example on the bottom. Have never seem a libc implementation that does not even mention for which platforms it is meant.
      • forrestthewoods 48 minutes ago
        > sp.h is written in C99, and it compiles against any compiler and libc imaginable. It works on Linux, on Windows, on macOS. It works under a WASM host. It works in the browser. It works with MSVC, and MinGW, it works with or without libc, or with weird ones like Cosmopolitan. It works with the big compilers and it works with TCC.
      • bsder 45 minutes ago
        You could, of course, spend 30 seconds look at the code on Github which you would have to do if you were interested in using it anyway?

          TRIPLES = \
            x86_64-linux-none x86_64-linux-gnu x86_64-linux-musl \
            aarch64-linux-none aarch64-linux-gnu aarch64-linux-musl \
            aarch64-macos \
            x86_64-windows-gnu \
            wasm32-freestanding wasm32-wasi
        
        Or you could actually try the compliance suite on an architecture and report back to us if it works?
    • forrestthewoods 48 minutes ago
      That’s a lot of text to say “well ackshually”.
  • zzo38computer 1 hour ago
    I agree with most of the criticisms they make.

    I agree that pointer and length is better than null-terminated strings (although it is difficult in C, and as they mention you will have to use a macro (or some additional functions) to work this in C).

    Making the C standard library directly against syscalls is also a good idea, although in some cases you might have an implementation that needs to not do this for some reason, generally it is better for the standard library directly against syscalls.

    FILE object is sometimes useful especially if you have functions such as fopencookie and open_memstream; but it might be useful (although probably not with C) to be able to optimize parts of a program that only use a single implementation of the FILE interface (or a subset of its functions, e.g. that does not use seeking).

    • alfiedotwtf 1 hour ago
      Making every C call a system call is not a good idea at all - think about malloc() etc - the OS shouldn’t care about individual allocations and only worry about providing brk() etc. otherwise, performance will die if you’re doing a thousand system calls per second!
      • HexDecOctBin 29 minutes ago
        No modern libc uses (or should use) brk() as the heap. Allocate virtual memory using mmap, VirtualAlloc, etc., and manage your set of heaps.
    • fithisux 1 hour ago
      Null terminated strings have some merits but they should be a completely different data type like in Freebasic.
      • Sankozi 1 hour ago
        Are there other merits than availability of literals in C?

        It seems like one of the worst data structures ever - lookup complexity of a linked list with a expansion complexity of an array list with security problems added as a bonus.

        • boricj 52 minutes ago
          It's fine as a serialization/deserialization primitive for on-disk files, as long as the NULL character is invalid.

          String tables in most object file formats work like that, a concatenated series of ASCIIZ strings. One byte of overhead (NUL), requires only an offset into one to address a string and you can share strings with common suffixes. It's a very compact layout.

      • tdeck 11 minutes ago
        Hearing someone mention FreeBASIC really brings me back. It was the first language I ever used pointers in.
  • Retr0id 3 hours ago
    > Program directly against syscalls

    Works nicely on Linux where the syscall interface is explicitly stable, but on many (most?) other platforms this is not the case.

    > There Is No Heap

    I don't understand what this means, when it's followed by the definition of a heap allocation interface. The paragraph after the code block conveys no useful information.

    > Null-terminated strings are the devil’s work

    Agreed! I also find the stance regarding perf optimization agreeable.

    • dboon 25 minutes ago
      Thanks for reading. "There is no heap" is meant to say that your mental model of memory shouldn't be one heap from which all memory is pulled. It should be many heaps, owned by many different allocators and providing different semantics. Hence the opinionated stance of the library; there is no allocation function that does not force you to specify the specific heap you want to allocate from. I'm sorry if I didn't explain that well.

      As far as the syscall thing, it's actually quite interesting. NT is also extremely stable. Likewise for the stock Darwin syscalls on macOS. In practice, though, Windows loads kernel32.dll automatically, so there's no drawback in using it when appropriate. I still call directly into NT sometimes (mostly to skip complex userspace path translations that aren't useful). On macOS, you are likewise forced to link to libc (libSystem.dylib), and so I usually just end up using the syscall-wrapper libc functions there.

    • Retr0id 2 hours ago
      Looks like the default allocator uses mmap(2) for every single allocation, which is horribly inefficient - you map a whole PAGE_SIZE worth of memory for every tiny string. Aside from just wasting memory this will make the TLB very unhappy.

      It looks like sp_log's string formatting is entirely unbuffered which results in lots of tiny write syscalls.

      • dboon 15 minutes ago
        The point of the library is that you do not call the low level allocation primitive to allocate a single string. Of course, in simple programs which exit immediately, there is no difference between using a page allocator and a heap allocator. In real programs, I use an appropriate allocator for the allocation rather than making arbitrary calls to malloc(). In the sp.h examples, I use the page allocator to keep freestanding Linux simple. I could swap out a single line to be backed by an arena, but it misses the forest for the trees.

        sp_log() writes directly to an IO writer. An IO writer can be buffered or unbuffered, but is unbuffered by default. This is a feature, not a bug. Have a look through the IO code!

        Cheers and thanks for reading.

      • monocasa 14 minutes ago
        Not just the TLB, but the L1 D$ will be very unhappy as well. All heap objects being page aligned on most microarchs ends up making every object start at cache set 0 because the set determination ends up being indexed off of the offest within a page so that the TLB lookup can happen in parallel with the set load.
      • AlotOfReading 2 hours ago
        That seems to be a pretty consistent quality level for the entire library. Look at the implementations in sp_math, yikes.
        • 12_throw_away 1 hour ago
          Oh man. Oof. I'm sure there must be some repository out there that has an AGENTS.md but isn't pure slopcode, but I haven't seen it yet. The number of people who can be trusted to vibe code "responsibly" is probably about the same as the number of people who can be trusted to write memory safe C.
        • locknitpicker 2 hours ago
          > That seems to be a pretty consistent quality level for the entire library. Look at the implementations in sp_math, yikes.

          That does spin the meaning of "Sp.h is the standard library that C deserves"

        • jcranmer 2 hours ago
          "How bad can it be, I mean I know that numerics are not many people's strong suit, but..."

          ... ... ... oh wow, the math functions are really bad implementations. The range reduction on the sin/cos functions are yikes-level. Like the wrong input gives you an infinite loop level of yikes.

      • gabriela_c 44 minutes ago
        Jesus! Claude could've told this guy all these things. People underestimate how much the average malloc implementation does and how many considerations it makes. Or how much IO sucks.
        • lelanthran 14 minutes ago
          > Jesus! Claude could've told this guy all these things.

          Claude probably wrote it.

    • zamadatix 2 hours ago
      > Works nicely on Linux where the syscall interface is explicitly stable, but on many (most?) other platforms this is not the case.

      There is a footnote on this saying as much:

      > 3. Where “syscall” means “the lowest level primitive available”. On Linux, it’s always actual syscalls. On Windows, that’s usually NT. On macOS, it’s usually the syscall-wrapper subset of libc because you’re forced to link libc and it’s not quite as open as Linux (although there is a rich “undocumented” set of APIs and syscalls that are very interesting).

      • DeathArrow 1 hour ago
        What about BSDs?
        • dboon 24 minutes ago
          I don't support non-macOS BSDs explicitly yet. Not for any reason of design, just hasn't been a priority.
        • whateverboat 1 hour ago
          syscalls
          • yjftsjthsd-h 16 minutes ago
            That might work on FreeBSD but is pretty well guaranteed to break on OpenBSD. (Dunno about Net and Dragonfly) (I'd caution that treating the BSDs as a monolith is likely to end in errors; they're quite diverse.)
    • quuxplusone 2 hours ago
      The "definition of a heap allocation interface" indicates that there is no standard heap. Instead, there's a standard interface for the use to define their own heaps. Any standard library function that needs to allocate will take a sp_allocator_t parameter, and use that to allocate. As opposed to e.g. strdup, which hard-codes a call to malloc internally. Sp.h's strdup-alike would take an sp_allocator_t as input and call into that to get the memory it needs.

      A C++ programmer might describe this as "PMR, but not default-constructible. And std::stable_sort takes a PMR allocator parameter. And PMR is the default, and there's no implementation of std::allocator (or new or delete)."

  • Panzerschrek 1 hour ago
    This doesn't look good:

      c8 buf [SP_PATH_MAX] = sp_zero;
      sp_cstr_copy_to_n(path, len, buf, SP_PATH_MAX);
    
    since

      #define SP_PATH_MAX 4096
    
    There should be a fallback for very long paths.
    • dboon 12 minutes ago
      Can you show me a realistic case with a longer path?
  • pjmlp 1 hour ago
    We should have left C in the 90's already, but then FOSS happened,

    "Using a language other than C is like using a non-standard feature: it will cause trouble for users. Even if GCC supports the other language, users may find it inconvenient to have to install the compiler for that other language in order to build your program. So please write in C."

    The GNU Coding Standard in 1994, http://web.mit.edu/gnu/doc/html/standards_7.html#SEC12

    • yjftsjthsd-h 15 minutes ago
      That sounds like GNU reacted to the problem rather than causing it.
  • skybrian 3 hours ago
    My impression of the sample programs is that they're unreadably noisy, but maybe this would be a good compiler target if you're writing your own language?
    • dboon 9 minutes ago
      How would you write https://github.com/tspader/sp/blob/main/example/ls.c in your statically typed language of choice? To be fair, this is definitely the kindest example to my library, but one reason I felt this project was worth pursuing was that that example reads basically like a slightly worse TypeScript to me. In other words, quite nice for how low level the code really is.
  • Panzerschrek 1 hour ago
    It's a disadvantage, that it's header-only. It needs to include <windows.h> and a bunch of other stuff, which slow-downs compilation. Splitting it into a couple of files (a header and an implementation) would be much better.
  • Panzerschrek 1 hour ago
    How does this library work in programs with parts still requiring libc?

    How does it deal with code executing before main? Libc does a bunch of necessary stuff, like calling initializers for global variables.

  • 504118318 1 hour ago
    Just taking a quick look at the atomics section:

    First, (on unix) it's wrapping pthread mutex. That's part of libc! (Technically it might not be libc.so, but it's still the standard library.)

    Also, none of the atomics talk about the memory model. You don't _have_ to use the C11 memory model (Linux, for example, doesn't). But if you're not using the C11 memory model and letting the compiler insert fences for you, you definitely need to have fence instructions, yourself.

    While C11 atomics do rely on libgcc, so do the __sync* functions that this library uses (see https://godbolt.org/z/bW1f7xGas) for an example.

    Oops... apparently this is vibecoded. Welp, I just wasted ten minutes of my life reviewing slop that I'm not going to get back.

  • charcircuit 26 minutes ago
    I do not want to include and compile a standard library for every file that includes it.

    Why do standard library headers always have to be insane?

    • dboon 2 minutes ago
      Have you considered compiling it into a binary of your choice? It works perfectly well as a traditional library. The only cost you pay is re-parsing the header part once per TU. Because C is so simple, this is virtually free. In any case, calling it insane makes me feel disrespected and I would prefer if you didn't do that.
  • JSR_FDED 2 hours ago
    I love how hyper-opinionated this is.
    • dboon 9 minutes ago
      Thank you!
    • smitty1e 1 hour ago
      Family saying: "It ain't bragging if you can do it."

      When one is competent to work at this level, strong opinions are in order.

      Their correctness is something I cannot gage. I'm barely competent to follow the conversation.

      • rmunn 1 hour ago
        Considering the first thing I saw in the thread was https://news.ycombinator.com/item?id=48244891 where the values returned from sp's sine function was compared to the correct values, I'm going to take any such opinions with a few grains of salt. Because the correct sine for the number they tested (31337 radians) is 0.3772 (0.3771522646 according to my calculator), sp's implementation returned 0.4385. That's not even close to right.
        • JSR_FDED 17 minutes ago
          It’s still alpha
  • KnuthIsGod 1 hour ago
    "The library’s stance, to put it simply, that the juice ain’t worth the squeeze when it comes to low level, compute-bound performance.

    Designing software and data structures for performance against unknown use cases on unknown hardware is extremely difficult and the resulting code is much more complicated. Even then, it’s often better to use code written against your actual use case and hardware when performance is that critical.

    Things that are off the table might be:

    SIMD A highly optimized hash table rewrite Figuring out where inlining or LIKELY causes the compiler to produce better code."

    LOL...

    Classic vibe coder.

  • nektro 2 hours ago
    not one mention of Zig on the whole page?
    • dboon 38 minutes ago
      I had half of a manifesto about how C programmers should be embarrassed on account of Zig but I ended up paring it down to be more focused on what the library is plainly.

      Zig is obviously incredible and this library would not exist without it being the standard bearer for systems programming in many ways

    • pyrolistical 26 minutes ago
      We should port the zig std lib as a c lib
  • Kab1r 3 hours ago
    Best library name.
  • TZubiri 3 hours ago
    > Every language that depends on third party libraries, like js and python, is getting massively infected with supply chain worms

    > Only couple of languages not affected are those that don't have a culture of downloading third party code, like C and C++

    > Ex js and python developer publishes a 'library'

    > Library is vibe coded

    > Published on github amidst GitHub being hit by supply chain attacks, had their source code leaked.

    The timing is terrible for starters, and I don't trust the vibe coded code at all. Imagine a pandemic and the cities are on fire, and you arrive to a rural town asking to kiss people.

    • redlewel 1 hour ago
      Thanks for this comment, I was about to bookmark the repo for later you saved me the time.
  • KnuthIsGod 1 hour ago
    Wonderful !

    Yet another slop coded library.

    What could possibly go wrong...