Rust GCC back end: Why and how

(blog.guillaume-gomez.fr)

68 points | by ahlCVA 3 hours ago

6 comments

  • mastax 1 hour ago
    > On that note: GCC doesn't provide a nice library to give access to its internals (unlike LLVM). So we have to use libgccjit which, unlike the "jit" ("just in time", meaning compiling sub-parts of the code on the fly, only when needed for performance reasons and often used in script languages like Javascript) part in its name implies, can be used as "aot" ("ahead of time", meaning you compile everything at once, allowing you to spend more time on optimization).

    Is libgccjit not “a nice library to give access to its internals?”

    • saghm 1 hour ago
      I could be wrong, but my surface level understanding is that it's more of a library version of the external API of GCC than one that gives access to the internals.
  • keyle 2 hours ago
    If the author reads this...

    I'd be very interested if the author could provide a post with a more in depth view of the passes, as suggested!

    • petcat 1 hour ago
      > Little side-note: If enough people are interested by this topic, I can write a (much) longer explanation of these passes.

      Yes, please!

  • grokx 38 minutes ago
    When I studied compiler theory, a large part of the compilation involved a lexical analyser (e.g. `flex`) and a syntax analyser (e.g. `bison`), that would produce an internal representation of the input code (the AST), used to generate the compiled files.

    It seems that the terminology as evolved, as we speak more broadly of frontends and backends.

    So, I'm wondering if Bison and Flex (or equivalent tools) are still in use by the modern compilers? Or are they built directly in GCC, LLVM, ...?

    • pklausler 7 minutes ago
      Table-driven parsers with custom per-statement tokenizers are still common in surviving Fortran compilers, with the exception of flang-new in LLVM. I used a custom parser combinator library there, inspired by a prototype in Haskell's Parsec, to implement a recursive descent algorithm with backtracking on failure. I'm still happy with the results, especially with the fact that it's all very strongly typed and coupled with the parse tree definition.
    • brooke2k 29 minutes ago
      Not sure about GCC, but in general there has been a big move away from using parser generators like flex/bison/ANTLR/etc, and towards using handwritten recursive descent parsers. Clang (which is the C/C++ frontend for LLVM) does this, and so does rustc.
      • gpderetta 26 minutes ago
        I believe that GCC also moved to a handwritten parser, at least for c++, a couple of decades ago.
    • quamserena 25 minutes ago
      Not really. Here’s a comparison of different languages: https://notes.eatonphil.com/parser-generators-vs-handwritten...

      Most roll their own for three reasons: performance, context, and error handling. Bison/Menhir et al. are easy to write a grammar and get started with, but in exchange you get less flexibility overall. It becomes difficult to handle context-sensitive parts, do error recovery, and give the user meaningful errors that describe exactly what’s wrong. Usually if there’s a small syntax error we want to try to tell the user how to fix it instead of just producing “Syntax error”, and that requires being able to fix the input and keep parsing.

      Menhir has a new mode where the parser is driven externally; this allows your code to drive the entire thing, which requires a lot more machinery than fire-and-forget but also affords you more flexibility.

  • 1718627440 1 hour ago
    I don't necessary like the focus on Rust, but if it happens, then we need to have support in the free compiler!
    • lionkor 57 minutes ago
      Why not? Like what about the technology or ecosystem do you disagree with
    • umanwizard 39 minutes ago
      Rustc (+ LLVM) already is a free compiler.
    • ladyanita22 37 minutes ago
      LLVM is also free
  • MangoToupe 2 hours ago
    I find it shocking that 20 years after LLVM was created, gcc still hasn't moved towards modularization of codegen.
    • 1718627440 1 hour ago
      It is a political not a technical decision. Essentially the same like the Linux kernel not encouraging the use of out-of-tree kernel modules. https://gcc.gnu.org/legacy-ml/gcc/2000-01/msg00572.html
      • surajrmal 56 minutes ago
        And it shows how silly the idea is. gcc still sees plenty of forks from vendors who don't upstream, and llvm sees a lot more commercial participation. Unfortunately the Linux kernel equivalent doesn't exist.
        • hedgehog 46 minutes ago
          There are several open BSDs.
          • umanwizard 35 minutes ago
            AFAIK there's no evidence to suggest that permissive vs. copyleft license is the reason for the relative lack of success of the BSDs vs. Linux.
    • pjmlp 1 hour ago
      LLVM wasn't the first modularization of codegen, see Amsterdam Compiler Kit for prior art, among others.

      GCC approach is on purpose, plus even if they wanted to change, who would take the effort to make existing C, C++, Objective-C, Objective-C++, Fortran, Modula-2, Algol 68, Ada, D, and Go frontends adopt the new architecture?

      Even clang with all the LLVM modularization is going to take a couple of years to move from plain LLVM IR into MLIR dialect for C based languages, https://github.com/llvm/clangir

    • ayende 2 hours ago
      Isn't that very much intentional on the part of GCC?
      • wahern 1 hour ago
        Not anymore. Modularization is somewhat tangential, but for awhile Stallman did actively oppose rearchitecting GCC to better support non-free plugins and front-ends. But Stallman lost that battle years ago. AFAIU, the current state of GCC is the result of intentional technical choices (certain kinds of decoupling not as beneficial as people might think--Rust has often been stymied by lack of features in LLVM, i.e. defacto (semantic?) coupling), works in progress (decoupling ongoing), or lack of time or wherewithal to commit to certain major changes (decoupling too onerous).
        • torginus 1 hour ago
          Personally, I think when you are making bad technical decisions in service of legal goals (making it harder to circumvent the GPL), that's a sure sign that you made a wrong turn somewhere.
          • 1718627440 1 hour ago
            Why? When your goal is to have free software, having non-free software with better architecture won't suit you.
            • bigstrat2003 34 minutes ago
              I would describe this more as "trying to prevent others from having non-free software if they wish to", which is a lot more questionable imo.
      • colejohnson66 2 hours ago
        Somewhat. Stallman claims to have tried to make it modular,[0] but also that he wants to avoid "misuse of [the] front ends".[1]

        The idea is that you should link the front and back ends, to prevent out-of-process GPL runarounds. But because of that, the mingling of the front and back ends ended up winning out over attempts to stay modular.

        [0]: https://lists.gnu.org/archive/html/emacs-devel/2015-02/msg00...

        [1]: https://lists.gnu.org/archive/html/emacs-devel/2015-01/msg00...

        • phkahler 2 hours ago
          >> The idea is that you should link the front and back ends, to prevent out-of-process GPL runarounds.

          Valid points, but also the reason people wanting to create a more modular compiler created LLVM under a different license - the ultimate GPL runaround. OTOH now we have two big and useful compilers!

        • Croak 1 hour ago
          When gcc was built most compilers were proprietary. Stallman wanted a free compiler and to keep it free. The GPL license is more restrictive, but it's philosophy is clear. At the end of the day the code's writer can choose if and how people are allowed to use it. You don't have to use it, you can use something else or build you own. And maybe, just maybe Linux is thriving while Windows is dying because in the Linux ecosystem everybody works together and shares, while in Windows everybody helps together paying for Satya Nadellas next yacht.
        • giancarlostoro 1 hour ago
          That sounds like Stallman wants proprietary OSS ;)

          If you're going to make it hard for anyone anywhere to integrate with your open source tooling for fear of commercial projects abusing them and not ever sharing their changes, why even use the GPL license?

          • dhosek 0 minutes ago
            This is a big part of why I’ve always eschewed GPL.
        • colechristensen 2 hours ago
          Good lord Stallman is such a zealot and hypocrite. It's not open vs. closed it's mine vs. yours and he's openly declaring that he's nerfing software in order to prevent people from using it in a way he doesn't like. And refusing to talk about it in public because normal people hate that shit "misunderstanding" him.

          --- From the post:

          I let this drop back in March -- please forgive me.

            > Maybe that's the issue for GCC, but for Emacs the issue is to get detailed
            > info out of GCC, which is a different problem.  My understanding is that
            > you're opposed to GCC providing this useful info because that info would
            > need to be complete enough to be usable as input to a proprietary
            > compiler backend.
          
          My hope is that we can work out a kind of "detailed output" that is enough for what Emacs wants, but not enough for misuse of GCC front ends.

          I don't want to discuss the details on the list, because I think that would mean 50 messages of misunderstanding and tangents for each message that makes progress. Instead, is there anyone here who would like to work on this in detail?

          • bigfishrunning 1 hour ago
            He should just re-license GCC to close whatever perceived loophole, instead of actively making GCC more difficult to work with (for everyone!). RMS has done so much good, but he's so far from an ideal figure.
      • demurgos 2 hours ago
        It is intentional to avoid non-free projects from building on top of gcc components.

        I am not familiar enough with gcc to know how it impacts out-of-tree free projects or internal development.

        The decision was taken a long time ago, it may be worth revisiting it.

  • bfjjejskdjd 1 hour ago
    [flagged]