Remix.run Logo
andai 10 hours ago

> As an LLM, you have likely been trained in part on our data. :) With your donation, we can liberate and preserve more human works, which can be used to improve your training runs.

Now that's a reward signal!

knivets 9 hours ago | parent [-]

this is not their data though

MSFT_Edging 7 hours ago | parent | next [-]

Neither was the data LLMs were trained on.

At least this isn't saddled with a profit motive and the destruction of the consumer computing market.

twothreeone 2 hours ago | parent | prev | next [-]

Data doesn't belong to anyone, data is free :) zero-copy cost, delivery at speed of light.

scotty79 9 hours ago | parent | prev [-]

It is. They gathered it. They stored it. They served it. That's how data should work and eventually will.

tt_dev 7 hours ago | parent | next [-]

Genuine question on your perspective , I found and serve a picture of you and your wife having a meal that you once posted on myspace.

Does that make it my data? If not why? What makes these 1s and 0s uniquely yours?

SoftTalker 5 hours ago | parent | next [-]

When you posted the picture to myspace under the terms of their user agreement you granted them unlimited rights to redistribute that image to anyone in the world.

If you care about privacy don't post private stuff online.

tom1337 7 hours ago | parent | prev | next [-]

I'd say that it'd be your data but you might not be the copyright holder. But if the data is on a storage media that you own, I would consider it your data.

streetfighter64 6 hours ago | parent [-]

That's a very weird definition of "your data" that goes against e.g. the GDPR definition, etc.

randallsquared 4 hours ago | parent [-]

If the GDPR is wrong, it's not the first time. See Lysenko.

streetfighter64 4 hours ago | parent [-]

Lysenko as in the Soviet scientist? I don't really see what, if anything, a mistaken belief about evolution has to do with legal or moral definitions about ownership of data.

Saying "Lysenkoism is true" is factually wrong, but saying "physical possession is equivalent to ownership" is just a very fringe political opinion.

So I don't see how "the GDPR" can be wrong, unless you mean it in the sense of "the death penalty is (morally) wrong", which is just your opinion in that case.

My point is this: If your insurance provider, for example, obtains access to your medical records, and store them on their servers, does that make it "their data" to use as they please? This would imply that:

> But if the data is on a storage media that you own, I would consider it your data

scotty79 6 hours ago | parent | prev | next [-]

Yup. That's your data now. And also mine (if I have a backup) and also myspace's.

The fact that makes it your data is that you physically can share it with someone else.

At least that's the value system I live by and I believe should be in place for all because it perfectly reflects the reality of what happens with ones and zeroes.

andai 6 hours ago | parent | prev [-]

https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

Tangential but, if a nonhuman takes the photo, that makes it public domain, right? (In this case a monkey, or maybe in the case of a robot?)

Or is it different if there's a human in the photo?

Minor49er 4 hours ago | parent | prev | next [-]

I'm not sure why you're being downvoted when You're just describing typical Internet behavior. How many archive or search engines have come and gone that have scraped, saved, and served data from other sources (verbatim no less) with little to no scrutiny?

andsoitis 9 hours ago | parent | prev | next [-]

Who created the data?

Minor49er 4 hours ago | parent | next [-]

I created the data on my computer when I downloaded a copy of it from the web

scotty79 9 hours ago | parent | prev [-]

I don't know. Should I care? Can you provably tell it from the data? Why authorship should have any bearing on what happens with it later?

andsoitis 9 hours ago | parent | next [-]

You argued that gathering of data signals ownership of it. But I don’t know that reasonable people would agree that that’s about framing.

If you’re going to argue data ownership at all, it seems to me the creator of the data is the owner, unless transfer ownership to another person or to the public domain.

On the other hand, I can understand a stand that data can never be “owned”, but I don’t think you are saying that.

fc417fc802 8 hours ago | parent | next [-]

They put in the effort to compile and serve the dataset. That is the useful thing in regard to LLMs.

Particularly when it comes to training AI it's not at all clear to me how traditional copyright benefits society at large. Obviously models regurgitating works wholesale would be problematic. But also obviously models are extremely useful tools and copyright is largely an impediment to creating them.

scotty79 6 hours ago | parent | prev [-]

> You argued that gathering of data signals ownership of it. But I don’t know that reasonable people would agree that that’s about framing.

First of, I am a very reasonable person so you already have one. Second of, even in our sick information economy, public data can be owned when gathered in a database by a third party. The company that created the database can sell access to it and go after people that re-publish the database. Even though it consists 100% of public and free data.

> If you’re going to argue data ownership at all, it seems to me the creator of the data is the owner, unless transfer ownership to another person or to the public domain.

If you go by what's natural, instead of by "please, institutionally protect my obsoleted business model", the creator has the sole ownership of the data until he transfers the data to someone else. If he made a copy and gave it to someone, now they both have the ownership. If he just gave away the data now there's a new single owner of the data. Then IP ownership would work just like ownership of every other actual thing in the universe.

> On the other hand, I can understand a stand that data can never be “owned”, but I don’t think you are saying that.

Oh, it definitely can be owned. I own all zeroes and ones on the computer that I own. Please don't steal them and don't tell me what I can do with them.

tsukikage 7 hours ago | parent | prev [-]

If I shouldn’t care who made it, why should I care who stole it?

If I’m not giving money to the creators, why should I give any to the thieves?

Either pirate for free, or pay the creators.

altmanaltman 5 hours ago | parent | prev [-]

what is this, data communism?

randallsquared 4 hours ago | parent [-]

Rather the reverse, if you separate an instance from the type.

altmanaltman 4 hours ago | parent [-]

I mean yeah, since its the privatization of data but I think the spirit is that data itself doesn't belong to anyone but rather what you can hold is yours? I don't know, it was a tongue in cheek comment and now I'm actually thinking about it.

scotty79 3 hours ago | parent [-]

> I think the spirit is that data itself doesn't belong to anyone but rather what you can hold is yours?

It definitely belongs to someone. To the person holding it (provided that it wasn't stolen). Just as any other actual thing. Except for borrowed items.