this post was submitted on 11 Sep 2025
811 points (96.3% liked)

Technology

75041 readers
2910 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

This is the technology worth trillions of dollars huh

you are viewing a single comment's thread
view the rest of the comments
[โ€“] JustTesting@lemmy.hogru.ch 2 points 7 hours ago (1 children)

For the byte pair encoding (how those tokens get created) i think https://bpemb.h-its.org/ does a good job at giving an overview. after that i'd say self attention from 2017 is the seminal work that all of this is based on, and the most crucial to understand. https://jtlicardo.com/blog/self-attention-mechanism does a good job of explaining it. And https://jalammar.github.io/illustrated-transformer/ is probably the best explanation of a transformer architecture (llms) out there. Transformers are made up of a lot of self attention.

it does help if you know how matrix multiplications work, and how the backpropagation algorithm is used to train these things. i don't know of a good easy explanation off the top of my head but https://xnought.github.io/backprop-explainer/ looks quite good.

and that's kinda it, you just make the transformers bigger, with more weight, pluck on a lot of engineering around them, like being able to run code and making it run more efficientls, exploit thousands of poor workers to fine tune it better with human feedback, and repeat that every 6-12 month for ever so it can stay up to date.

[โ€“] fading_person@lemmy.zip 1 points 4 minutes ago

Thank you very much