CS336: Language Modeling from Scratch

(cs336.stanford.edu)

189 points | by kristianpaul 4 hours ago

9 comments

chainsaw10 14 minutes ago
I’m intrigued by this course. However I’m also curious about its prerequisite:
> Machine Learning (e.g. CS221, CS229, CS230, CS124, CS224N) You should be comfortable with the basics of machine learning and deep learning.
Anyone have a good implementation-heavy self-study resource for those topics, or experience with the recorded lectures for those Stanford courses?
[-]
- alec_heif 9 minutes ago
  I found the 2024 Spring CS224N course sufficient for this pre-requisite, coupled with the textbook (chapters 1-13). Like CS336, this one also has videos and assignments available, and it being from 2024 is not a problem since the basics are mostly the same as recent years. Notably this is not true for 336, which spends much more time discussing cutting edge techniques, so the 2026 version there is essential.
  Course: https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1246...
  Lecture videos: https://www.youtube.com/playlist?list=PLoROMvodv4rOaMFbaqxPD...
  Textbook: https://web.stanford.edu/~jurafsky/slp3/
meken 2 hours ago
I have fond memories of cs224d [1] taught by richardsocher. It’s a bit dated at this point as it was created in the pre-transformer era, but it was a very cool introduction to applying deep learning to nlp at the time.
[1] https://cs224d.stanford.edu
[-]
- egl2020 1 hour ago
  Similar thoughts here. That was when I realized the potential of the Internet: I didn't have to be a grad student at a tier 1 research university to learn about the frontier.
skerit 2 hours ago
> GPU compute for self-study
Those suggestions they make for a B200 start at $4.99 an hour.
Is that really required, for starting out? I've been tinkering with my own from-scratch LLM, but in the early phases I don't need anything more than a 4090 on Vast.ai
[-]
- marcelroed 40 minutes ago
  TA here. Definitely not! In fact we explicitly added sections in the first assignment to allow for scaling down to even local compute (M-series GPUs). For assignment 2 there are a few regions that require Triton support for your GPU, but everything can be adapted for much cheaper GPUs.
  We were lucky enough to get Blackwell GPUs for Stanford students this year, which is why the writeups are written mostly around them.
- _0ffh 29 minutes ago
  You're right to be sceptical. I have trained reasonably good SLMs for the TinyStories dataset on my 4060Ti (16GB) with no problems. You'll only encounter problems if you want to try if your ideas scale up to models any bigger than "arguably tiny".
- grahameb 57 minutes ago
  It seems strange that the required resources aren't provided by the educational institution?
  [-]
  - marcelroed 28 minutes ago
    We do provide resources for enrolled students. The online suggestions are for external students or Stanford students who we weren't able to admit.
  - ReptileMan 44 minutes ago
    Two schools of thought - people are paying 100K per year, we should provide everything. Second is - they are paying 100K per year, do you think they will care for couple of hundred more.
- root-parent 1 hour ago
  You dont even need a GPU to train your own LLM.
- flakiness 1 hour ago
  I beliee these are affordable enough for the intended audience (which is Stanford undergrad/master)
  [-]
  - mrcrm9494 25 minutes ago
    for them Modal is sponsoring the compute, as stated on the website, the prices are for remote followers
sonabinu 1 hour ago
I brought a group together to do this class using the YouTube videos and course materials available online. It is challenging but rewarding. We tackled it one lecture video per week. Started with over 30 learners and by last session we were down to 8.
airstrike 1 hour ago
I wonder if people prefer to learn this on their own or if building a community around open learning is something that others are interested in
[-]
- danbrooks 1 hour ago
  I'd be interested in joining a discord server.
  Would be great to have a community to discuss the material - even if folks can't commit to the full course.
ChrisArchitect 38 minutes ago
Related:
AI Agent Guidelines for CS336 at Stanford https://github.com/stanford-cs336/assignment1-basics/blob/ma... (https://news.ycombinator.com/item?id=48359232)
storus 2 hours ago
Thanks for releasing this again! What are this year's changes to prior offerings?
[-]
- marcelroed 29 minutes ago
  TA here. Biggest changes are in the second assignment (distributed) where we added a bunch of memory, profiling and distributed tasks, as well as in the fifth assignment (alignment), where most of the RL tasks are fresh this year. Assignment 3 (scaling laws) was also completely updated, but in a way that might be difficult to run without substantial resources. I'm working on a way for external students to be able to run simulated experiments for free!
  Assignment 1 (basics) has the most hours of preparation invested in it, and only minor modernization/bug fixes were necessary this year.
tmule 2 hours ago
Are video lectures available online?
[-]
- Bilal_io 2 hours ago
  Youtube playlist link from the page https://www.youtube.com/watch?v=JuoVZkPBiKk&list=PLoROMvodv4...
- aerohit 2 hours ago
  https://www.youtube.com/watch?v=JuoVZkPBiKk&list=PLoROMvodv4...
- mindcrime 2 hours ago
  https://www.youtube.com/playlist?list=PLoROMvodv4rMqXOcazWaT...
dominotw 1 hour ago
i recently started reading "build reasoning model from scratch" then i realized that i am not really interested in building part and just want to understand theory and practice behind it.
A want like a casual lesswrong style from ground up explanation.
[-]
- ianand 19 minutes ago
  In that case I humbly suggest my talk from AI Engineer World's Fair https://www.youtube.com/watch?v=ZuiJjkbX0Og
  Gives you the basics on LLM internals in about 90 minutes and includes an already built model in JavaScript that you can step through in browser devtools to get as detailed as you want.