Understanding how LLMs actually work that each word is a token (possibly each letter) with a calculated highest probably of the word that comes next, this output makes me think the training data heavily included social media or pop culture specifically around "teen angst".
I wonder if in context training would be helpful to mask the "edgelord" training data sets.
Understanding how LLMs actually work that each word is a token (possibly each letter) with a calculated highest probably of the word that comes next, this output makes me think the training data heavily included social media or pop culture specifically around "teen angst".
I wonder if in context training would be helpful to mask the "edgelord" training data sets.