How LLMs work

(0xkato.xyz)

75 points | by 0xkato 2 days ago

5 comments

  • andai 1 hour ago
    I couldn't load the article directly due to an SSL issue, so here's the archive link:

    https://archive.ph/aWtFG

  • 10GBps 50 minutes ago
    I learned TCP/IP by watching and reading raw packets over packet radio at 1200 baud.

    I've noticed the same thing is possible if you watch the output of a slow LLM. Eventually you start to see the machinery. input tokens = output tokens, it's math. I can't exactly predict the tokens generated but I can see how they are formed. It's a lot like chess. You can't see every possible move but the mechanism is understandable.

  • lhd1 24 minutes ago
    find it difficult to engage with AI generated text. What am I getting here that I couldn't get from a chatbot.
  • singpolyma3 1 hour ago
    Next do "why LLMs work"
    • sheeshkebab 1 hour ago
      considering they work with any architecture/configuration given enough compute, just more or less efficiently - then maybe it's fundamental, in the same sense as why electricity works...
    • soupspaces 1 hour ago
      Universal approximation theorem, embeddings, self-attention, gradient descent. And empirically, scaling laws.
    • skydhash 40 minutes ago
      Why does linear regression works? Why does computer works? Because it's about math and the encoding information. If we can encode words as numbers, then why can't we encode their order as a relation? It's just that neural networks are very apt at finding that relation even if it's noisy.