is a user on cybre.space. You can follow them or interact with them if you have an account anywhere in the fediverse.
@iliana
uhhhh hmm
let’s say I want to store the ASCII integer representations of values 0..15 inclusive in memory
if I allow the memory to overlap I can represent it as
151413121109876
thus 15 and 5 overlap, 11 and 10 overlap
this *might* be the shortest possible representation
is there already a known algorithm for this? is this problem called something?
@iliana your example looks ambiguous, you won't be able to reliably decode that
@iliana Huh. This is really clever, and has all sorts of implications on things like gzip dictionaries
@iliana Lemme clarify since there are already doubters: gzip lets you say "a symbol with length x comes from location y in the dictionary buffer", so the ambiguous locations aren't a problem. The more bytes your symbols are, and the more times they're repeated, the more worthwhile this packing is (just like compression in general).
Never heard it specifically named, but if I was backed into a corner I'd say order-independant entropy coding.
@iliana this feels very close to group theory, but I'm not a maths
@iliana if you do it bit by bit, you could overlap leading and trailing zeroes, though processors are not good at arbitrary bit offsets.
@iliana
are you requiring things to be 8bit aligned? If not you might be able to shorten it. (though a 15-short lookup table is still not very big and it's probably faster to use)
@iliana reminds me of a prefix tree (or suffix tree in this case) with a flat representation (I forget what they call that kind of representation)
@iliana but I'm tired so maybe that's nonsense
@iliana Yes! This is known as the shortest superstring problem. Lots more info here https://www.geeksforgeeks.org/shortest-superstring-problem/
tl;dr it's NP-hard
@bilbono oh interesting! I wonder if it's still NP-hard if you assume the input is a range of integers (i.e. 0..15 per the example).
@iliana Hmm, the more I think about it the more I feel like it shouldn't be NP-hard in the case of a range of integers. There's too much structure/regularity to exploit. Nothing more than a hunch though.
@iliana There's a similar thing called De Bruijn sequences, though that's for listing all n-length strings of a size-k alphabet. This example is too irregular to reliably reproduce.
I’ve decided I’m going to brute force it since I specifically need 0..255 inclusive lol
@iliana
so like
i gotta know
why?
@iliana Sounds like an interesting problem to try to solve in Forth, especially since I haven't done any coding lately. I'm not sure if this has a name for it, but I'm reminded of a trick the programmers who worked on a pinball game called Banzai Run did: they ran out of sound board memory, and needed an "and" so they made it the "an" out of "welcome race fans" and called it a day.
@iliana De Bruijn sequences have been suggested, but I'm wondering if you can do it by constructing a De Bruijn graph[1] and deleting some vertices. You can safely ignore all numbers below 10^n if your count goes up to 2*10^n as you indicated (for 255), so this should be simple enough.
@iliana just a guess but it's probably come up in compilers/languages that do string interning?