someone should make an AI dungeon game but instead of letting you fill in the prompt it just gives you three options of possible continuations

would make it easier to play when you're not feeling particularly creative

Show thread
Follow

that game would also benefit a lot from some kind of cycle removal thing. like, if it detects a repeat of text from earlier it jumps back with those token probabilities blacked out, or something

I understand that cycles are a general problem with language models. this is a pretty good paper about the problem arxiv.org/pdf/1904.09751.pdf

Show thread

thinking again about how when we communicate with others, we actually are transmitting information and language is just a coding scheme. so trying to minimize perplexity will eventually run into a wall because a neural network couldn't ever predict everything everyone will ever say

Show thread

with this in mind I think it's clear why beam search, which maximizes the likelihood of the generated sentence (and therefore produces a sentence with minimal perplexity/entropy) will be completely devoid of meaning. it's basically asking the computer to produce a sentence that contains no information

Show thread

to produce good results I think a better algorithm should be to generate a sentence that hits a target perplexity, instead of a minimal one. the nucleus sampling described in the paper would probably fail to do this if the language model is trapped in a cycle of giving high probabilities to the same sequence of tokens, especially if every token in the sequence has a probability greater than the nucleus parameter p

Show thread

I should try and fix my tensorflow installation and actually try this with one of the gpt2 models

Show thread

@SuricrasiaOnline I still want to try and train a model to do my job. Train it on bland marketing copy, of which there is plenty

Sign in to participate in the conversation
Cybrespace

Cybrespace is an instance of Mastodon, a social network based on open web protocols and free, open-source software. It is decentralized like e-mail.