| ▲ | protocolture 5 days ago |
| >One of the many pressing issues with Large Language Models (LLMs) is they are trained on content that isn’t theirs to consume. One of the many pressing issues is that people believe that ownership of content should be absolute, that hammer makers should be able to dictate what is made with hammers they sell. This is absolutely poison as a concept. Content belongs to everyone. Creators of content have a limited term, limited right to exploit that content. They should be protected from perfect reconstruction and sale of that content, and nothing else. Every IP law counter to that is toxic to culture and society. |
|
| ▲ | kstrauser 5 days ago | parent | next [-] |
| Tencent scrapers are hitting my little Forgejo site 4 times a second, 24/7. I pay for that bandwidth. Platitudes sound great, but this isn’t a lofty “drinking from the public well”. This is bastard operators taking a drink and pooping in it. My thoughts will have more room for nuance when they stop abusing the hell out of my resources they’re “borrowing”. |
| |
| ▲ | psychoslave 5 days ago | parent | next [-] | | Why are there even doing so? This doesn’t feel like something that can even bring any value downstream to their own selfish pipelines, or am I missing something? | | |
| ▲ | kstrauser 5 days ago | parent | next [-] | | No! They’re constantly hitting the same stupid URL (“show me this file in this commit in this repo with these 47 query params”) from a few thousand IPs in China and Brazil, with user agents showing an iPod or a Linux desktop running Opera 3. I wrote a little script where I throw in an IP and it generates a Caddy IP-matcher block with an “abort” rule for every netblock in that IP’s ASN. I’m sure there are more elegant ways to share my work with the world while blocking the scoundrels, but this is kind of satisfying for the moment. | |
| ▲ | danaris 5 days ago | parent | prev [-] | | Best I can figure, they've decided that it's easier to set up their scrapers to simply scrape absolutely everything, all the time, forever than to more carefully select what's worth it to get. Various LLM-training scrapers were absolutely crippling my tiny (~125 weekly unique users) browser game until I put its Wiki behind a login wall. There is no possible way they could see any meaningful return from doing so. | | |
| ▲ | HankStallone 5 days ago | parent [-] | | I get the impression that they're just too lazy or incompetent, or in too big a hurry, to put some sensible logic in their scrapers. Maybe they have an LLM write the scraper and don't bother to ask for anything more than "Make a web scraper that gets all the files it can as fast as it can." The last one I blocked was hitting my site 24 times/second, and a lot of them were the same CSS file over and over. |
|
| |
| ▲ | protocolture 2 days ago | parent | prev [-] | | Being a dick while scraping isnt really the same question as the use of that data. Anyway the answer is block em. |
|
|
| ▲ | latexr 5 days ago | parent | prev | next [-] |
| > One of the many pressing issues is that people believe that ownership of content should be absolute, that hammer makers should be able to dictate what is made with hammers they sell. You’re conflating and confusing two different concepts. “Content” is not a tool. Content is like a meal, it’s a finished product meant to be consumed; a tool, like a hammer, is used to create something else, the content which will then be consumed. You’re comparing a JPEG to Photoshop. You can remix content, but to do that you use a tool and the result is related but different content. > Content belongs to everyone. Even if we conceded that point, that still wouldn’t excuse the way in which these companies are going about getting the content, hammering every page of every website with badly-behaved scrapers. They are bringing websites down and increasing costs for their maintainers, meaning other people have limited or no access to it. If “content belongs to everyone”, then they don’t have the right to prevent everyone else from accessing it. I agree current copyright law is toxic and harmful to culture and society, but that doesn’t make what these companies are doing acceptable. The way to counter a bad system is not to shit on it from a different angle. |
| |
| ▲ | malfist 5 days ago | parent | next [-] | | To extended your metaphor, we don't get annoyed an a neighbor knocking on our door to ask us a question, but we absolutely do not want some random stranger that's trying to get rich from knocking on our door as asking questions when they're doing it over and over and over all hours of the day and night. | |
| ▲ | protocolture 2 days ago | parent | prev [-] | | >Even if we conceded that point, that still wouldn’t excuse the way in which these companies are going about getting the content Unrelated point that I wouldnt even defend. Block em. Its cool with me. >You’re conflating and confusing two different concepts. “Content” is not a tool. Content is like a meal, it’s a finished product meant to be consumed; a tool, like a hammer, is used to create something else, the content which will then be consumed. You’re comparing a JPEG to Photoshop. Eh I see what you are trying to say but a hammer is also a finished good thats sold as a finished good. I can also modify the hammer if I like. And after modification I can sell the hammer. JPEGs can also be inputs to things like collage. |
|
|
| ▲ | blagie 5 days ago | parent | prev | next [-] |
| I'd be totally down with "content belongs to everyone." The problem is when you steal my content, repackage it, and resell it. At that point, my content doesn't belong to everyone, or even to me, but to you. * I'd have no problem with OpenAI, the non-profit developing open source AI models and governance models, scraping everyone's web pages and using it for the public good. * I have every problem with OpenAI, the sketchy for-profit, stealing content from my web page so their LLMs can regenerate my content for proprietary products, cutting me out-of-the-loop. |
| |
| ▲ | protocolture 4 days ago | parent [-] | | Yes but I am sure your content that you want to exploit was made in a complete vacuum. | | |
|
|
| ▲ | account42 5 days ago | parent | prev | next [-] |
| So I get to freely copy Windows and Office and use them in products that I sell to others without Microsoft's consent now? Or is this only true when it benefits big corporations? |
| |
| ▲ | protocolture 5 days ago | parent [-] | | Yeah go for it I did say this >They should be protected from perfect reconstruction and sale But I dont even really believe in that much so go nuts. | | |
| ▲ | jmye 5 days ago | parent [-] | | I want to get this straight, given you > dont even really believe in that much If I write a book, let’s say it’s a really good book, and self-publish it, you’re saying you think it’s totally kosher for Amazon to take that book, make a copy, and then make it a best seller (because they have vastly better marketing and sales tools), while putting their own name in as author? That seems, to you, like a totally fine and desirable thing? That literally all content should only ever be monetized by the biggest corporations who can throw their weight around and shut everyone else out? Or is this maybe a completely half-baked load of nonsense that sounded better around the metaphorical bong circle? Come on, now. | | |
| ▲ | protocolture 2 days ago | parent | next [-] | | Actually the more common outcome is that some enterprising random makes a compilation of public domain content and markets it for like 25 cents. Competing to make it as available to me as possible. Have a look at REH short stories on Google Books. This is super common. Do I want someone to do that to your book? To make it as available and as cheap for me to read on the platform of my choice. Yes. Its just data, and culturally speaking, it already belongs to me. I own your book. People can compete to deliver it to me for the cheapest price. I welcome that. I don't begrudge you going on tour, and selling author signed copies for whatever price you want. But likewise don't expect me to support a set a property norms that would deprive me of elements of the culture I live in. Come on, now. | | |
| ▲ | jmye 2 days ago | parent [-] | | > that some enterprising random makes a compilation of public domain content My hypothetical book is not, at all, public domain. This is always a non-starter. > But likewise don't expect me to support a set a property norms that would deprive me of elements of the culture I live in. I could simply choose not to publish my book, and carry it around and let people read it in front of me. Apparently this is an insufferable “property norm” as you would be unable to consume my work at all, let alone for free and in the manner of your own choosing. What an absurd thing to believe in. Do you similarly think your entitled to sleep on my couch, or eat my dinner, or do you only think you’re entitled to take what you want when it’s words rather than, say, oranges? Or do you just have a weirdly tenuous grasp of what culture is? | | |
| ▲ | protocolture a day ago | parent [-] | | >My hypothetical book is not, at all, public domain. This is always a non-starter. Right but in your weird strawman argument that assumes big scary amazon can reproduce it for free, it is effectively public domain. >I could simply choose not to publish my book, and carry it around and let people read it in front of me. Apparently this is an insufferable “property norm” as you would be unable to consume my work at all, let alone for free and in the manner of your own choosing. What an absurd thing to believe in. "I only want to contribute to human society if I can profit by it" as long as you can live knowing you are a sell out, I can live without reading your book, or using it to prop up my table. >Do you similarly think your entitled to sleep on my couch, or eat my dinner, or do you only think you’re entitled to take what you want when it’s words rather than, say, oranges? Or do you just have a weirdly tenuous grasp of what culture is? "Do you think you are entitled to <Scarce, physical thing> because you believe everyone is entitled to <non scarce, non physical thing intrinsic to human culture, able to be spread around the world to millions of people instantly>" No lmao. |
|
| |
| ▲ | Lerc 5 days ago | parent | prev [-] | | Do you think if it were allowable for Amazon to do that, it would actually be profitable for them to do so? As soon as any work became popular, anyone could undercut Amazon. If you really think that Amazon is in a position where they can charge significant money for something others can provide for much less, then you are talking about an anticompetitive monopoly. If that's the case the problem is not with copyright, it's lack of competition. The situation we have now is just one where copyright means they can't publish just anything, but Amazon can always acquire the rights to something and apply those same resources to make it a best seller. They don't care if the book is great or not. They just want to be able to sell it. Being able to be the only producer of the thing incentives making the thing that they own popular, not the thing that is good. Having the option to pick what succeeds puts them in a dominant negotiating position so they can acquire rights cheaply. I guess if that were the case though it would be easy to spot things that were popular when though they seemingly lack merit or any real reason other than a strong marketing department. It would really suck in that world. Not only would there be talented people making good works and earning little money, but most people would not even get to see what they had created. For many creatives, that would be the worst part of it. | | |
| ▲ | fwip 5 days ago | parent [-] | | Yes, Amazon would do that. Why would another person be able to meaningfully "undercut" Amazon here? Amazon would profit from selling e-books even if it's only for 10 cents - and integration with their Kindles and convenience of discovery would make it difficult for anyone to compete meaningfully on price. For printed books, economies of scale work in their favor as well - if it costs them $1.20 to manufacture/store/ship a paperback, and me $1.50, how am I supposed to undercut them? |
|
|
|
|
|
| ▲ | computerthings 5 days ago | parent | prev [-] |
| [dead] |