Remix.run Logo
malwrar 5 hours ago

I do find it hard to tolerate the feeling of being watched online. The second-most trending dataset on huggingface right now is a snapshot of HN updating at a 5 minute interval. It makes me not want to really comment at all, just like how I don’t really publish any software I write anymore.

Turns out it sucks to produce original works when you know that, whereas previously a few people at best might see your work, now it’s a bunch of omniscient robots and maybe half of those original people are using the robots instead.

niek_pas 3 hours ago | parent | next [-]

This is really interesting to me, because it never occurred to me to feel this way. Why would I care whether my comments are ending up in some dataset somewhere that's being used to train some model? My comments are boring and mostly uninformed. Have at it.

I'm curious: would you say the feeling of being watched online is making you afraid of some repercussion, or is it something else?

TeMPOraL 3 hours ago | parent | next [-]

Dog in the Manger.

I get a feeling from overall anti-AI sentiment online that a lot of people feel they're entitled to 100% of value created by anything even tangentially related to their person, whether that's some intentional contribution or a random brain fart that happened in the vicinity of someone else doing something useful - and then become resentful they're not "getting their share".

There's hardly any other way to read all the proclamations of quitting to do anything because "cognitive dark forest" (itself a butchering of the original idea of "dark forest" across so many orthogonal dimensions in parallel, that it starts to look like a latent space of a transformer model).

chromacity 3 hours ago | parent | next [-]

Conversely, some people feel entitled to 100% of the value created by others. Oh, you wrote a book? Too bad, it's a part of my training data set now.

Downloading public stuff off the internet with no regard for the creator's wishes or license is bad enough, but we have many people here who defended AI companies seeding models with pirated content.

The internet is a social contract. AI is not the first thing to try and erode it for profit, but it's by far the most aggressive one.

pc86 an hour ago | parent [-]

Putting a book into a training data set does not take 100% of the value created by the author. You could make a convincing argument that since the LLM was never going to purchase the book, and the number of people who would have purchased the book but now won't because it's included in the training data is effectively zero, that no value was lost at all.

Licenses are legal documents and are usually treated as such, but "the creator's wishes" are irrelevant without case law, legislation, or licensing to back it up. And jurisdiction - show me a license that doesn't stand up in court in my home jurisdiction and I'll show you a license I won't care if I break or not.

trollbridge 38 minutes ago | parent | prev | next [-]

I don't like the idea that I'm restrained by intellectual property laws, but that other powerful entities are not. That is fundamentally unfair.

coldpie an hour ago | parent | prev [-]

> I get a feeling from overall anti-AI sentiment online that a lot of people feel they're entitled to 100% of value created by anything even tangentially related to their person

Rather, I don't like that the terms I released my work under aren't being respected. I believe LLMs are derivative works of the pieces they are trained on. I spent more than ten years working on open source code, and now the models that were trained on my GPL'd code are being used to make proprietary code against the terms of the license. I find this reprehensible.

While it wasn't an explicit term of release, generally I did not expect anyone to get any kind of financial value from the blog posts I wrote. I just wrote them for fun & maybe others would find them interesting. Now, LLMs have been trained on my blog posts and are generating financial value for some of the worst human beings on the planet who are using their money to murder, demean, and maim other humans.

I now know that blog posts I wrote for fun are putting money in some sociopath's bank account, and the GPL'd code I wrote is being used to create software to exploit me & other users. If I continue to create things publicly, it will be used against me and other people, and there's nothing I can do to stop it except to stop creating things. It's all very disrespectful & demoralizing.

malwrar 3 hours ago | parent | prev [-]

There’s definitely a fear of repercussions (I’ve been commenting on this site for over a decade now! Who knows what’s in my history...) but importantly I actually take some pride in many of the comments I write. What drew me to this site originally was how high quality everyone’s perspectives and articulation was, and I suppose I view the writing voice I’ve nurtured here as unique and special to me. It’s not about compensation, I’d just hate to see some future chatbot sound 1/1,000,000th like me I guess? Hard feeling to describe, but I’d rather just not be globbed in and instead express myself in ways that aren’t profitable or feasible to copy.

simianwords 2 hours ago | parent | prev | next [-]

HN always offered the data to anyone, what changes now? How does it matter if it is LLM's that is consuming your data. What a strange attitude.

satvikpendem 2 hours ago | parent | prev | next [-]

HN comments have always been public, I don't really understand this thought process. The robots also aren't going to care about some individual user, it'd be more of an agglomeration of everyone's comments.

philipwhiuk 5 hours ago | parent | prev [-]

I think the immediate term action is to viciously block all crawlers.

Writing a blog yes, feeding the beast no.

ArcHound 4 hours ago | parent | next [-]

This sounds like a nice principled stance, but you won't get any traffic with this approach. That's demotivating - to me blogging is a tight balance of exploration, learning, improving and feedback. I'm not able to write without considering how this impacts the reader - removing all readers breaks the process for me.

lstodd 4 hours ago | parent | prev [-]

Yeah, everyone went on "blocking all crawlers" end result being half of internet inaccessible over vpns. Good job, people.

82618901 an hour ago | parent [-]

[dead]