Pinned ping
Pinned ping

current thoughts: based on the user time, the mod-after-summing method seems almost on par with the hardware implementation. this implies it might be faster if implemented in hardware.

Show thread

these results are a little weird tbh, I don't know why the software mod-while-summing part has such less system time(I did multiple tests and got the same results). There's no difference besides the modulo operation. Need to do more testing and time it differently.

Show thread

after initial testing, it appears that(with precomputation) this is indeed about 10% faster than the modulo instruction on my laptop with unsigned 64-bit numbers, modulo'ing by a small prime(541).

test was performed by reading 30000000 64 bit integers from /dev/urandom with dd, modding them, and then piping it to /dev/null.

Arch Linux 5.4.13.a-1-hardened #1 SMP PREEMPT

CPU: Intel(R) Core(TM) i3 CPU M 370 @ 2.40GHz
Voltage: 1.2 V
External Clock: 133 MHz
Max Speed: 3200 MHz

I can provide source for the test if anyone's interested.

hardware modulo:
30000000+0 records in
30000000+0 records out
240000000 bytes (240 MB, 229 MiB) copied, 109.561 s, 2.2 MB/s

real 1m49.610s
user 0m51.031s
sys 2m17.145s

software modulo, mod-while-summing:
240000000 bytes (240 MB, 229 MiB) copied, 75.6668 s, 3.2 MB/s

real 1m15.696s
user 1m6.183s
sys 1m18.104s

software modulo, mod-after-summing:
240000000 bytes (240 MB, 229 MiB) copied, 101.806 s, 2.4 MB/s

real 1m41.840s
user 0m52.104s
sys 2m5.672

Show thread
zardoz relayed

you can also modulo-reduce the final sum as you go along, in just the same way as you modulo-reduce 2^i as you go along, for further speedup

Show thread

(most modulo implementations use repeated subtraction which is O(n) for reference. This will be *much* faster than that when n >> k)

Show thread
zardoz relayed

forgot the name of an acquaintance, opened a duckduckgo tab, realised what i was doing, and sat in silence contemplating my existence for a few minutes

zardoz relayed

duckduckgo what's my friends name

quora can you tell me what the people i know are called

Show thread

i thought of a pretty cool way to speed up the modulo operation

it's based on the "add up the digits" trick, but base 2 so it's faster on computers. To compute n mod k, you first precompute 2^i mod k = l_i for each i = 0..log_2(n)+1, and then you sum up n_i * l_i and modulo the result by k.

expected speedup: the precomputation takes O(log_2(n)+1), since we just have to repeatedly double a number, check if it's greater than k, and if so subtract k from it (since we know that the original number was less than k). Most of these steps are contained in the modulo instruction but I thought it was just important to note that the modulo part will be constant-time in this case. Adding up the digits is then O(log_2(n)+1) and the final modulo will also be O(log_2(n)+1) because it's mod k of at most log_2(n)+1 numbers which are all less than k.

So the whole algorithm runs in O(log_2(n)).

usefulness: low unless n is very very large and you can frequently re-use the precomputed values. Maybe a very large hash map?

zardoz relayed

Staggering like a drunk from one work catastrophe to the next, knocking over chairs and shitting in the elevator

I leap from my bed. suddenly I have one burning question how do they implement modulo on most processors

zardoz relayed
zardoz relayed
zardoz relayed
zardoz relayed

humans: i need to work weekends or my life will collapse
bears: time to nap for five to seven months

Show thread
zardoz relayed

bears cannot be trademarked, they refuse to participate in capitalism

Show thread
zardoz relayed

subtoot 

zardoz relayed

SAURON: I shall create three rings for the elves, seven for the dwarf lords and nine for mortal men

HOBBITS: wow ok none for us cool

SAURON: and thus I shall have dominion over all the civilized races

HOBBITS: WOW

Show more
Cybrespace

Cybrespace is an instance of Mastodon, a social network based on open web protocols and free, open-source software. It is decentralized like e-mail.