would make it easier to play when you're not feeling particularly creative
that game would also benefit a lot from some kind of cycle removal thing. like, if it detects a repeat of text from earlier it jumps back with those token probabilities blacked out, or something
thinking again about how when we communicate with others, we actually are transmitting information and language is just a coding scheme. so trying to minimize perplexity will eventually run into a wall because a neural network couldn't ever predict everything everyone will ever say
with this in mind I think it's clear why beam search, which maximizes the likelihood of the generated sentence (and therefore produces a sentence with minimal perplexity/entropy) will be completely devoid of meaning. it's basically asking the computer to produce a sentence that contains no information
to produce good results I think a better algorithm should be to generate a sentence that hits a target perplexity, instead of a minimal one. the nucleus sampling described in the paper would probably fail to do this if the language model is trapped in a cycle of giving high probabilities to the same sequence of tokens, especially if every token in the sequence has a probability greater than the nucleus parameter p
I should try and fix my tensorflow installation and actually try this with one of the gpt2 models
@SuricrasiaOnline I still want to try and train a model to do my job. Train it on bland marketing copy, of which there is plenty
Cybrespace is an instance of Mastodon, a social network based on open web protocols and free, open-source software. It is decentralized like e-mail.