Remix.run Logo
CamouflagedKiwi 6 days ago

> of course a dictionary program will include code to talk to dictionary-providing web sites.

I wouldn't say that is just a given, if I've apt-get installed a dictionary I might expect that is the whole thing on my machine. It's not like we haven't had dictionaries in physical books for centuries... It seems like stardict is very much an online thing, which I suppose could be legit, but the whole thing does seem like a trap.

kazinator 6 days ago | parent | next [-]

I's a generational thing. I would guess that someone who expects applications to phone home, on the off chance that they are actually otherwise local, is likely someone pretty young who hasn't lived in a world of locally installed software that doesn't talk to anything.

If we search for the author's bio, that seems to check out. They are a well-credentialed CS person; obviously they know that dictionary programs such as translation pop ups can have offline dictionaries, and mentions that. But they are a person of their time with an according set of "of courses".

Today, an application being locally installed and works with offline data is like a a statement of quaint chivalry, promulgated by a few remaining Don Quixotes of computing. (It saddens me to say. So much that this analogy brings me insufficient amusement.)

yorwba 6 days ago | parent | next [-]

For many languages, there simply isn't a comprehensive dictionary file that could be redistributed legally as part of a free-software offline dictionary application. You either settle for a few thousand words put together by a handful of volunteers, or you redistribute a commercial dictionary illegally, or you have to connect to an online service to provide sufficient coverage legally.

zamadatix 5 days ago | parent | next [-]

I could buy the idea of the plugin system itself being desired (e.g. maybe I even want english definitions from Merriam-Webster or something because I like their style more than the open source database) but I think that's separate from what an app does by default. Especially on something like Debian, one should expect a FOSS-first approach whenever reasonable, and for >99% of users the reasonable default is a local dictionary.

5 days ago | parent | prev | next [-]
[deleted]
piperswe 5 days ago | parent | prev [-]

Wiktionary is massive with 1.4M English entries [1] (3x the size of the Merriam Webster's Unabridged dictionary [2], though with a lower average quality), and CC-BY-SA-licensed[3]

[1]: https://en.wiktionary.org/wiki/Wiktionary:Statistics [2]: https://www.merriam-webster.com/help/faq-how-many-english-wo... [3]: https://en.wiktionary.org/wiki/Wiktionary:Copyrights

yorwba 5 days ago | parent [-]

Yes, the abundance of English data is why I felt the need to point out that this isn't the case for other languages. (Also, the raw count from Wiktionary can be misleading if you don't take into account that there are many low-effort entries for different forms of the same word.)

pxc 5 days ago | parent | prev | next [-]

Dictionaries are small! It's insane to think that a dictionary requires network access. If it did, why would I install it locally??

> Today, an application being locally installed and works with offline data is like a a statement of quaint chivalry, promulgated by a few remaining Don Quixotes of computing.

But a dictionary package has no valid reason to be online.

ryandrake 5 days ago | parent | prev | next [-]

Wouldn't someone's expectation instead depend on the nature of the application, and what data it needs? My expectation is that an application does not access the network unless it requires a resource only available from the network. I would totally expect a "Yelp" application to make network requests as part of its core functionality. Yelp is an online service, and in order to use it, you have to talk to the network, and you're generally requesting data that might often change, so you need fresh copies. Same for an Internet browser, or ftp or git (for remotes) or things like that. I would not expect a spell checker to need to access a network because it can all be done locally and the spelling of words doesn't change often enough to need a fresh dictionary from the network over and over. And I certainly would not expect the software to send data to the network. I would also not expect a calculator application to request math function from the network or send my equations to a network service so that the network service could provide a result.

jcelerier 5 days ago | parent | prev [-]

> I's a generational thing

... Is it? Dictionary apps have been working like this for more than twenty years. Babylon Pro of which stardict is pretty much a clone was doing this with already millions of users in the year 2000! Kindles work like that!

hdjrudni 6 days ago | parent | prev | next [-]

Even if it's "legit", it shouldn't be using unencrypted HTTP.

sam_lowry_ 6 days ago | parent [-]

Why? Should it use the dict protocol, then?

mattmanser 6 days ago | parent | next [-]

Because without HTTPS it's trivial to MITM that clipboard content if they're always sending it via http.

People in your coffee shop on the same WiFi could read it.

I get some people don't realize that's how TCP/IP works and the firesheep stuff all happened 15 years ago. But a bit worrying to see a frequent HN contributor challenging that.

That's why we now push for Https everywhere.

charcircuit 5 days ago | parent | next [-]

>People in your coffee shop on the same WiFi could read it.

WEP has been deprecated for over 2 decades.

kstrauser 5 days ago | parent | next [-]

That has no effect on the owner of a malicious access point. HTTP over WPA2 is plaintext again the moment the AP decrypts it.

ants_everywhere 5 days ago | parent | prev | next [-]

you may be surprised at the number of unsecured WiFi networks there are.

I see them in 2025 in captive portals, public libraries, and when traveling abroad.

zamadatix 5 days ago | parent | prev | next [-]

Not all guest Wi-Fi uses a PSK. In general, assuming all networks will already be encrypted along each hop to the server is a losing assumption for users.

5 days ago | parent | prev [-]
[deleted]
__MatrixMan__ 5 days ago | parent | prev [-]

Https everywhere is a good start, it keeps the other plebs at the coffee shop out of your business. But it's still open to anyone with enough power to coerce a CA, which is the more concerning sort of adversary anyhow. So yes, https everywhere, but let's not stop there.

dannyw 5 days ago | parent [-]

Yes, but we have widely deployed efforts like certificate transparency, and cert pinning.

The first makes such attacks widely known events, browsers report by default, and it s provable. It’s very rare.

The second allows apps to only trust specific certs or CAs, ignoring system root of trust.

I just want to clarify HTTPS in practice is quite secure.

__MatrixMan__ 5 days ago | parent [-]

I'll not let go of my distaste for roots of trust in any form, but you likely have a point. I'll have to learn more about this transparency thing.

rootnod3 6 days ago | parent | prev [-]

How about HTTPS?

account42 6 days ago | parent | prev | next [-]

That stood out to me as well. It's a sad world when people expect even simple functionality to be a live service.

pantalaimon 6 days ago | parent | prev | next [-]

The venerable ding does well with a local dictionary - and it's packaged in Debian too

https://www-user.tu-chemnitz.de/~fri/ding/

mkesper 5 days ago | parent [-]

But only english-german, sadly

mayama 6 days ago | parent | prev | next [-]

At some point I started running gui apps without network access, first with firejail and then bubblewrap. This was before flatpak became a thing. I still use collection of bash scripts that built up over time to run applications in sandbox.

waterhouse 6 days ago | parent | prev | next [-]

  ~> wc -cl /usr/share/dict/words
  235976 2493885 /usr/share/dict/words
One might even expect a program to use a common Unix preinstalled dictionary.
dkiebd 6 days ago | parent [-]

"words" is nothing but a list of words. It does not contain definitions for those words, which is what one expects from a dictionary.

delfinom 5 days ago | parent | next [-]

I wonder where one files a bug report that it's misusing "dict" under "words"

waterhouse 6 days ago | parent | prev [-]

Hmm, you are correct.

yjftsjthsd-h 6 days ago | parent | prev | next [-]

Dumb question... Could you do a per-word bloom filter to do online spell checking without actually disclosing the words you're checking?

markasoftware 6 days ago | parent | next [-]

a bloom filter look up is by hash, and given the relatively small set of words in english, it would be pretty easy for the server to reverse the hash sent to it. Thus a bloom filter wouldn't be very private.

Additionally, a typical spell checker feature is to provide alternative, correct, spellings, rather than just telling you whether a word is correctly spelled.

I bet there's some cool way to do this with zero-knowledge or homomorphic cryptography though!

notpushkin 6 days ago | parent | next [-]

There’s also a way simpler way: send a hash prefix to server, get a list of matches. Google Safe Browsing does this with URLs, for example.

chuckadams 5 days ago | parent [-]

haveibeenpwned.com does this for passwords too. I doubt you could make it work for all the smaller words though, let alone offer corrections.

shakna 6 days ago | parent | prev | next [-]

You should be able to do a K-means type thing. Where your query is an entire group, and you grab the field from the chunk locally.

But you might still be able to use some frequency sampling to predict the words used, unless those chunks are very very carefully constructed.

Sesse__ 5 days ago | parent | prev | next [-]

> a bloom filter look up is by hash, and given the relatively small set of words in english, it would be pretty easy for the server to reverse the hash sent to it. Thus a bloom filter wouldn't be very private.

The typical use of a Bloom filter is to have it locally as a prefilter, not to send hashes to the server.

account42 6 days ago | parent | prev | next [-]

> I bet there's some cool way to do this with zero-knowledge or homomorphic cryptography though!

The code for which would almost certainly be larger than a fully local dictionary for any human language.

bmacho 6 days ago | parent | prev [-]

> a typical spell checker feature is to provide alternative, correct, spellings, rather than just telling you whether a word is correctly spelled.

I personally don't use that one, for me the red underline is enough.

yk 6 days ago | parent | prev | next [-]

There are two scenarios I believe, first accidentally sending a (decent) password, and second the server not learning what you actually look up.

For the first case, sending a hash would prevent the server from learning a password that is not in the dictionary, something like password5 would hash to gibberish.

For the second, the server needs to know what to actually send back. I believe Google's malicious website check works (or used to) by truncating a hash an then just sending the answer for some 128 or so websites and have the browser figure out which of them the user wanted to visit. That creates some deniability over witch website you actually visited and should be also usable to prevent the server from learnering what you actually looked up.

So yes, I think you could design a more secure Protokoll. Though general security disclaimer the people trying to read your letters probably spend more time attacking than I spend writing this post.

CGamesPlay 6 days ago | parent | prev | next [-]

Just want to mention that the feature in question here is for translation, not spell checking.

5 days ago | parent | prev [-]
[deleted]
5 days ago | parent | prev | next [-]
[deleted]
phkahler 5 days ago | parent | prev | next [-]

>> of course a dictionary program will include code to talk to dictionary-providing web sites.

Maybe to download a dictionary, but not to provide the same services that the dictionary program provides locally.

wat10000 6 days ago | parent | prev [-]

This sort of crap makes me sure I’ll be employable forever.

I may not be on top of the latest trends, but at least I understand how computers work and what they can actually do.