@violet reading some papers about quantizing the parameters down to 8 bits without much accuracy loss, and it's pretty cool
@RumPartov what is your opinion on powerful language models that can generate plausible text extremely quickly?
@halcy well, my understanding is GPT-2 actually has quite a bit of long-scale context at its disposal, so lack of context isn't that big of a problem?
@SuricrasiaOnline better example: train a model to predict a (fair) coin toss result
say it learns it's heads 51% of the time and tails 49% of the time
if you sample using T=1, you'll get something like THHTTHTTHHHTHTH
if you sample using T=0 (ie max), you'll get HHHHHHHHHHHHHHHH
which looks more natural?
however I think most of the difficulty in predicting the next word is when someone is communicating an unguessable statement. like "I couldn't find the ____" isn't guessable with high confidence. like of course the next word isn't going to be "is", but the number of likely next works is incredibly large, and that's the source of the perplexity
@halcy yeah, 1 is true for certain words in a sentence. for example common phrases where the next word is highly likely (the paper gives "I at the pizza while it was still ____", gpt-2 gives HOT or WARM, with HOT being about 80% likely.)
my idea is perhaps human text is more unlikely than repeated sentences because human text is communication. Meaning, it is an encoding of actual, incompressible semantic data. So the most likely sample is a sample that contains no semantic information. Would it then make sense to optimize for a sample that has a specific (non-maximal) likelihood?
they talk about how text generated to be "maximally likely" just repeats itself forever, however human text does not do that. This is perplexing given that these models are designed to give high probability to human-written text. So why is the most likely sample from a language model actually the most unlikely thing for a person to write?
https://arxiv.org/pdf/1904.09751.pdf here's a cool paper about sampling text from a language model like GPT-2
connecting your world, whenever!
Cybrespace is an instance of Mastodon, a social network based on open web protocols and free, open-source software. It is decentralized like e-mail.