it appears that the SEO optimum is "repeat common search terms but in natural language" and so now every google result for thing like "fix coil whine" or "toilet not filling" are FAQ style posts by domains you've never heard that are like:

How to fix coil whine?
...
What causes coil whine?
..
Can coil whine cause damage?
...
etc

the search situation is dire because sure "toilet not filling" doesn't tend to have misinformation, but if any schmuck can register a 4 dollar domain and get gpt2 to blast out some misinfo that the google algo loves and make bank on advertising (also sold by google) then how am I supposed to trust google when I ask it about things that tend to have common misconceptions or falsehoods like pest remedies or medical symptoms?

here's a hint. if you're on a page you found through google, ctrl+f for the phrase "Amazon Services LLC Associates Program"

I wish archive.org got a full-book search engine like hathitrust

Follow

if we assume from now on that google can't source good info, it may be time to move to a "have a collection of reputable sites and use their internal search, or use site:domain"

· · Web · 3 · 9 · 16

before you suggest duckduckgo, I don't think their search quality is that much better. I think this is an issue affecting all search engines

maybe the right path is to use a search engine that lets you set a domain whitelist

a search engine that only searches wikipedia, hathitrust, archive.org, github.com, and maybe stack exchange. that would be hype. any other domains?

this is quite bad for net neutrality, unfortunately

@SuricrasiaOnline remembering the time when google let you remove certain domains from results for your account. was amazing. it's gone now, of course

@iliana @SuricrasiaOnline Google Custom Search lets you do this. So does an adblocker.

Kagi and Neeva let you boost/demote/ban domains, but since the metasearch engines do not get detailed ranking info through search APIs this only happens on a per-page basis (i.e. a result can't move from page 2 to page 1 very easily). Both do have their own indexes too so that helps build custom ranking.

@Mojeek at one point planned on doing this the "right" way by actually using custom ranking algos (instead of modifying SERPs after they are generated with simple promote/demote rules) but that's Really Hard To Do so they shelved that for now.

@SuricrasiaOnline Yeah, for whatever reason search has gotten disastrous in the last five years and ... well, I have my suspicions why, but

@SuricrasiaOnline I feel like it doesn’t have to be, if it lets the user choose sources they want to use…?
Maybe it could allow rating quality of results for sources to help highlight good/new ones? Idk

@SuricrasiaOnline i know kagi.com is trying to be normal but i haven't used it enough to know

@catalina I like how I clicked "random websites" and got the main page for a tilde server populated by people I know

@catalina @SuricrasiaOnline ha, I searched "how to fix a toilet" and got an art project I'd never heard of by a friend of mine about how people use search engines

@SuricrasiaOnline archwiki, tvtropes, etymonline, dict.org, something with all the man pages, memory alpha, emojipedia

@brennen @SuricrasiaOnline all pages created before <date when SEO sites became a thing>

@SuricrasiaOnline here u go: easrng.github.io/no-shit-sherl
it's a google custom search engine but i added some js before loading it that makes it so the ad script doesn't load even without adblock
it searches wikipedia, wiktionary, wikimedia commons, wikidata, all the SE sites, github, gitlab, sourcehut, codeberg, archive.org, hathitrust, project gutenberg, unsplash, and pexels for now

@SuricrasiaOnline ok i added a few more sites (MDN, css-tricks, old.reddit, emojipedia, dict.org, etymonline, emojipedia, arch wiki, debian.org)

@easrng @SuricrasiaOnline probably better to use a custom centrality algo (like custom pagerank) centered around a few certain sites you deem to be the “Ideal Archetype” of the content you want and then use huerestics to adjust the weight of a site. Huerestics could include:

How much Readability strips
How much uBlock Origin filters (Teclis.com does this)
Manual actions
JavaScript and tracker weight (Marginalia does this)
Structured data (FairSearch, Google, Bing, Yandex, and maybe Petal do this)

@SuricrasiaOnline DDG just uses Bing for search data, but the ! searches make it actually useful despite that.

@SuricrasiaOnline @mdhughes ddg calls it "bang syntax" and it just lets you query another search engine, often site-specific, through ddg. all it does is redirect you.

@SuricrasiaOnline @mdhughes it reminds me of the query shortcuts from krunner in kde3.5 except it's curated and in your browser instead of your desktop

@SuricrasiaOnline if anything, ddg is *worse* at seospam than google, i just can't stand giving google another bit of data so it's worth the sightly shittier results

@_ @SuricrasiaOnline the duck is pretty good at tech and and other "clinical" questions for me at least but with more organic and open ended questions really mess it up

@mdhughes @SuricrasiaOnline at this point "!g to get different results" is so ingrained in my muscle memory I sometimes do it in google

@SuricrasiaOnline I kinda wonder if I should blame javascript for search being Bad now or if I'm being crazy. Many websites these days don't load much of a document at first and just give some script tags to load the rest through a JSON API. Web crawlers can't parse that, and even worse I've seen some sites have stopped using useful tags for links and instead implement everything through onclick events. More and more information is on closed apps like Facebook or Twitter, so it's mainly the SEO hackers hosting their own sites.

@SuricrasiaOnline just realized the formatting on my instance got rid of the "<a> tag" part lol

@SuricrasiaOnline yeah i wasn't gonna because _i_ use ddg and i get those weirdo results

and now i'm probably to reply to the next post and say "just use !bangs at that point"

@SuricrasiaOnline i don't think allow-list only for search is a good idea, but allowing users to block certain sites from showing up, or at least rate results as helpful / not helpful, would be pretty useful

@trwnh @SuricrasiaOnline you should try YACY, it allows you to do both (but the search is still eh)

@SuricrasiaOnline isnt the duck literally just bing trough some anonymizing?

@SuricrasiaOnline that's already become kind of a meme of people searching specific sites for good info

@vulp @SuricrasiaOnline it’s like that image ai that got better when you added ‘unreal engine’

@SuricrasiaOnline I think wikipedia works quite well as a base for this.

I think there is a need for a different kind of search engine, Google-style is fine for finding memes or info about something you do know at least a bit about but ultimately they're trying to rely on unverified data from the internet.
@lanodan @SuricrasiaOnline Runnaroo actually built its index with Wikipedia as one of the seed sources.
Another popular option is Common Crawl, which Alexandria used.
Sign in to participate in the conversation
Cybrespace

cybrespace: the social hub of the information superhighway jack in to the mastodon fediverse today and surf the dataflow through our cybrepunk, slightly glitchy web portal support us on patreon or liberapay!