Suing Writers Seethe at OpenAI's Excuses in Court

floofloof@lemmy.ca · 2 years ago

Suing Writers Seethe at OpenAI's Excuses in Court

archomrade [he/him]@midwest.social · 2 years ago

Copyright is already just a band-aid for what is really an issue of resource allocation.

If writers and artists weren’t at risk of loosing their means of living, we wouldn’t need to concern ourselves with the threat of an advanced tool supplanting them. Nevermind how the tool is created, it is clearly very valuable (otherwise it would not represent such a large threat to writers) and should be made as broadly available (and jointly-owned and controlled) as possible. By expanding copyright like this, all we’re doing is gatekeeping the creation of AI models to the largest of tech companies, and making them prohibitively expensive to train for smaller applications.

If LLM’s are truly the start of a “fourth industrial revolution” as some have claimed, then we need to consider the possibility that our economic arrangement is ill-suited for the kind of productivity it is said AI will bring. Private ownership (over creative works, and over AI models, and over data) is getting in the way of what could be a beautiful technological advancement that benefits everyone.

Instead, we’re left squabbling over who gets to own what and how.

Franzia@lemmy.blahaj.zone · 2 years ago

fourth industrial revolution" as some have claimed

The people claiming this are often the shareholders themselves.

prohibitively expensive to train for smaller applications.

There is so much work out there for free, with no copyright. The biggest cost in training is most likely the hardware, and I see no added value in having AI train on Stephen King ☠️

Copyright is already just a band-aid for what is really an issue of resource allocation.

God damn right but I want our government to put a band aid on capitalists just stealing whatever the fuck they want “move fast and break things”. It’s yet another test for my confidence in the state. Every issue, a litmus test for how our society deals with the problems that arise.

archomrade [he/him]@midwest.social · 2 years ago

There is so much work out there for free, with no copyright

There’s actually a lot less than you’d think (since copyright lasts for so long), but even less now that any online and digitized sources are being locked down and charged for by the domain owners. But even if it were abundant, it would likely not satisfy the true concern here. If there was enough data to produce an LLM of similar quality without using copyrighted data, it would still threaten the security of those writers. What is to say a user couldn’t provide a sample of Stephen King’s writing to the LLM and have it still produce derivative work without having trained it on copyrighted data? If the user had paid for that work, are they allowed to use the LLM in the same way? If they aren’t who is really at fault, the user or the owner of the LLM?

The law can’t address the complaints of these writers because interpreting the law to that standard is simply too restrictive and sets an impossible standard. The best way to address the complaint is to simply reform copyright law (or regulate LLM’s through some other mechanism). Frankly, I do not buy that the LLM’s are a competing product to the copyrighted works.

The biggest cost in training is most likely the hardware

That’s right for large models like the ones owned by OpenAI and Google, but with the amount of data needed to effectively train and fine-tune these models, if that data suddenly became scarce and expensive it could easily overtake hardware cost. To say nothing for small consumer models that are run on consumer hardware.

capitalists just stealing whatever the fuck they want “move fast and break things”

I understand this sentiment, but keep in mind that copyright ownership is just another form of capital.

Franzia@lemmy.blahaj.zone · edit-2 2 years ago

Thanks for this reply. You’ve shown this issue has depth that I’ve ignored because I like very few of the advocates for the AI we’ve got.

So one thing that trips me up is I thought copyright is about use. As a consumer rather than a creator this makes complete sense - you can read it, if you own it or borrowed it, and do not distribute it in any way. But there are also gentleman’s agreements built in to how we use books and digital prints.

Unintuitively, copying is also very important. Artists copy to learn, for example. Musicians have the right to cover anyone’s music. Engineers will deconstruct. and reverse engineer another’s solution. And businesses cheat off of one another all the time. Even when it has been proven to be wrong, the incentive is high.

So is taking the text of the book, no matter how you got it, and using it as part of a new technology okay?

Clearly the distribution isn’t wrong. You’re not distributing the book, you’ve made a derivative.

The ownership isn’t there, I mean the works were pirated. We’ve been taught that simply having something that was gotten through online copying is not only against the ‘rightholder’ but “piracy” and “stealing”. I have a really simplistic view of this - I just want creators paid for their work, and have autonomy (rights) over what is done with their work. This is rarely the case, we live in a world with publishers.

So it’s that first action. Is that use of the text in another work legal?

My basic understanding of fair use is that fair use is when you add to a work. You critique or reuse that work. Your work is about the other work, but also something new that stands on its own like an essay or a collage, rather than a collection.

I am so confused. Text based AI is run by capitalists. And we only have it FOSS because META can afford to lose money in order to remove OpenAI from the competition. Image based AI is almost certainly wrong, it copied and plugged in all of this other work and now tons of people are suing, Getty images is leveraging their rights management to make an AI that follows the rules we are living with. My gut reaction is a lot of people deserve royalties.

But in the other hand it sounds like AI did not work until they gave it the entire internet worth of data to train on. Training on smaller, legal sets was a failure? Or maybe it was because they took the tech approach of training the AI on every google image of dogs, or cats, etc. Without any real variation. Because they’re engineers, not artists. And not even good engineers, if their best work is just scraping other people’s work and giving it to this weird computer program.

This is all just stealing, right? But stealing is a lot more legal than I thought, especially when it comes to digitally published works of art, or physically published art that’s popular enough to be shared online.

Moobythegoldensock@lemm.ee · 2 years ago

At the crux of the author’s lawsuit is the argument that OpenAI is ruthlessly mining their material to create “derivative works” that will “replace the very writings it copied.”

The authors shoot down OpenAI’s excuse that “substantial similarity is a mandatory feature of all copyright-infringement claims,” calling it “flat wrong.”

Goodbye Star Wars, Avatar, Tarantino’s entire filmography, every slasher film since 1974…

anachronist@midwest.social · 2 years ago

OpenAI is trying to argue that the whole work has to be similar to infringe, but that’s never been true. You can write a novel and infringe on page 302 and that’s a copyright infringement. OpenAI is trying to change the meaning of copyright otherwise, the output of their model is oozing with various infringements.

Echo Dot@feddit.uk · edit-2 2 years ago

I can quote work that’s already been published, that’s allowable and I don’t have to get to the author’s consent to do that. I don’t have to get consent to do that because I’m not passing the work off my own, I am quoting it with reference.

So if I ask the AI to produce something in the style of Stephen King no copyright is violated because it’s all original work.

If I ask the AI to quote Stephen King (and it actually does it) then it’s a quote and it’s not claiming the work is its own.

Under the current interpretation of copyright law (and current law is broken beyond belief, but that’s a completely different issue) a copyright breach has not occurred in either scenario.

The only arguement I can see working is that if the AI actually can quote Stephen King that will prove that it has the works of Stephen King in its data set, but that doesn’t really prove anything other than the works of Stephen King are in its data set. It doesn’t definitively prove openAI didn’t pay for the works.

d-RLY?@lemmy.ml · 2 years ago

It doesn’t definitively prove openAI didn’t pay for the works.

But since they are a business/org and has all of those works and using them for profit. Then it kind of would be provable if openAI did or didn’t pay the correct licenses as they and/or the publisher/Stephen King (if he directly were to handle those agreements) would have a receipt/license document of some kind to show it. I don’t agree with how copyrights are done and agree that things should be public domain much sooner. But a for-profit thing like openAI shouldn’t be just allowed to have all these exceptions that avoids needing any level of permission and paying for ones that ask for it to use it. At least not while us regular people that aren’t using these sources for profits/business also aren’t allowed to just use whatever we want.

The only way that (I at least) see such an open use of everything at the level of all the data/information being fine is in a socialist/communist system of some kind. As the main concern for generally keeping stuff like entertainment/information/art/etc at a creator level is to have money to live in modern society where basic and crucial needs (food/housing/healthcare/etc) costs money. So for the average author/writer/artist/inventor a for-profit company just being able to take their shit and much more directly impact their ability to live.

It is a highly predatory level of capitalism and should not have exceptions. It is just setting up a different version of the shit that needs to also be stopped in the entertainment/technology industries. Where the actual creators/performers/etc are fucked by the studios/labs/corps by not being paid anywhere near the value being brought in and may not have control over it. So all of the companies and the capitalist system are why a private entity/business/org shouldn’t just be allowed to pull this shit.

anachronist@midwest.social · edit-2 2 years ago

You can quote a work under fair use, and if it’s legal depends on your intent. You have to be quoting it for such uses as “commentary, criticism, news reporting, and scholarly reports.”

There is no cheat code here. There is no loophole that LLMs can slide on through. The output of LLMs is illegal. The training of LLMs without consent is probably illegal.

The industry knows that its activity is illegal and it strategy is not to win but rather to make litigation expensive, complex and slow through such tactics as:

Diffusion of responsibility: (note the companies compiling the list of training works, gathering those works, training on those works and prompting the generation of output are all intentionally different entities). The strategy is that each entity can claim “I was only doing X, the actual infringement is when that guy over there did Y”.
Diffusion of infringement: so many works are being infringed that it becomes difficult, especially on the output side, to say who has been infringed and who has standing. What’s more, even in clear cut cases like, for instance, when I give an LLM a prompt and it regurgitates some nontrivial recognizable copyrighted work, the LLM trainer will say you caused the infringement with your prompt! (see point 1)
Pretending to be academic in nature so they could wrap themselves in the thick blanket of affirmative defense that fair use doctrine affords the academy, and then after the training portion of the infringement has occurred (insisting that was fair use because it was being used in an academic context) “whoopseeing” it into a commercial product.
Just being super cagey about the details of the training sets that were actually used and how they were used. This kind of stuff is discoverable but you have to get to discovery first.
and finally magic brain box arguments. These is typically some variation of “all artists have influences.” It is a rhetorical argument that would be blown right past in court, but it muddies the public discussion and is useful to them in that way.

Their purpose is not to win. It’s to slow everything down, and limit the number of people who are being infringed who have the resources to pursue them. The goal is that if they can get LLMs to “take over” quickly then they can become, you know, too big and too powerful to be shut down even after the inevitable adverse rulings. It’s classic “ask for forgiveness, not permission” silicon valley strategy.

Sam Altman’s goal in creeping around Washington is to try to get laws changed to carve out exceptions for exactly the types of stuff he is already doing. And it is just the same thing SBF was doing when he was creeping around Washington trying to get a law that would declare his securitized ponzi tokens to be commodities.

Echo Dot@feddit.uk · 2 years ago

There is no cheat code here.

No one said there was one. This isn’t about looking for way to break the law and get away with it, this is about the people who want the law to work a particular way not understanding that it doesn’t actually work that way.

The output of LLMs is illegal.

No its not. There is no way in which the output of an AI can be illegal. All can be proven is that the various providers did not pay for the various licences but that’s not the same as saying the output is automatically a crime, if it was then we’d not even be needing the case. The law is incredibly vague in this area.

Sam Altman’s goal in creeping around Washington is to try to get laws changed to carve out exceptions for exactly the types of stuff he is already doing.

Yes and that’s a good thing. Think about it for 15 seconds. If it weren’t for people like him AI would be limited to the mega corporations who can afford the licensees, we don’t want that, we want a AI technology to be available to anyone, we want AI technology to be open source. None of that can happen if the law does not change.

You seem to be under the impression there is some evil sadistic overlord here trying to force artificial intelligence on the world when it does not wanted, but nothing could be further from the truth, if anything artificial intelligence is being developed in a way that is surprisingly egalitarian considering the corporations that are investing in it, and vague unclear unhelpful broken copyright law is getting in the way of that.

erogenouswarzone@lemmy.ml · edit-2 2 years ago

Speaking of slasher films, does anybody know of any movies that have terrible everything except a really good plot?

taanegl@beehaw.org · 2 years ago

Uh, yeah, a massive corporation sucking up all intellectual property to milk it is not the own you think it is.

Umbrias@beehaw.org · edit-2 2 years ago

But this is literally people trying to strengthen copyright and its scope. The corporation is, out of pure convenience, using copyright as it exists currently with the current freedoms applied to artists.

The fix to the issues with ai displacing a market for artists isn’t yet stronger copyright…

taanegl@beehaw.org · edit-2 2 years ago

Listen, it’s pretty simple. Copyright was made to protect creators on initial introduction to market. In modern times it’s good if an artist has one lifetime, i.e their lifetime of royalties, so that they can at least make a little something - because for the small artist that little something means food on their plate.

But a company, sitting on a Smaug’s hill worth of intellectual property, “forever less a day”? Now that’s bonkers.

But you, scraping my artwork to resell for pennies on the dollar via some stock material portal? Can I maybe crawl up your colon with sharp objects and kindling to set up a fire? Pretty please? Oh pretty please!

Also, if you AI copies my writing style, I will personally find you, rip open your skull AND EAT YOUR BRAINS WITH A SPOON!!! Got it, devboy?

Won’t be Mr Hotshot with a pointy objects and a fire up you ass, as well as less than half a brain… even though I just took a couple of bites.

Chew on that one.

EDIT: the creative writer is doomed, I tells ya! DOOOOOOMED!

Umbrias@beehaw.org · 2 years ago

This is remarkably aggressive and assumptive. It also addresses none of my beliefs substantively so not much to really chew on there.

You let me know if you ever want to chat about the issue, but right now it looks like you just want to vent. Feel free to do that but I’m not going to just be an object of your anger.

winky88@startrek.website · edit-2 2 years ago

Bleeding hearts rarely do their cause justice (referring to the person you replied to)

taanegl@beehaw.org · edit-2 2 years ago

Here’s maybe something a bit more your speed.

https://beehaw.org/comment/1261900

taanegl@beehaw.org · 2 years ago

You also didn’t get the gist =) Because you didn’t want to. Ah well. Better luck next time.

taanegl@beehaw.org · edit-2 2 years ago

Actually, I hid all that in the goop - but it went passed ya because you didn’t want to read with your minds eye. You also weren’t showing any sympathy, but false sympathy, because you just wanted to dismiss the person and not their concern. This is called an “as hominem”. You argue the person rather than the points, and they did substantially tell you that it’s a matter of being able to be paid for your labours.

A little hint: copyright is there to protect creators from over reach, which should be fairly obvious. Both mass consolidation of intellectual property and fuzzing of copyright through AI is also abuse of the very founding principles of copyright.

But sometimes people just want to dismiss, it becomes easier if someone is upset, cus then your can take the high road about stuff and people will be happy with it…

Ok, byyyyye~ ^_

makeasnek@lemmy.ml · edit-2 2 years ago

Amazing how every new generation of technology has a generation of users of the previous technology who do whatever they can do stop its advancement. This technology takes human creativity and output to a whole new level, it will advance medicine and science in ways that are difficult to even imagine, it will provide personalized educational tutoring to every student regardless of income, and these people are worried about the technicality of what the AI is trained on and often don’t even understand enough about AI to even make an argument about it. If people like this win, whatever country’s legal system they win in will not see the benefits that AI can bring. That society is shooting themselves in the foot.

Your favorite musician listened to music that inspired them when they made their songs. Listening to other people’s music taught them how to make music. They paid for the music (or somebody did via licensing fees or it was freely available for some other reason) when they listened to it in the first place. When they sold records, they didn’t have to pay the artist of every song they ever listened to. That would be ludicrous. An AI shouldn’t have to pay you because it read your book and millions like it to learn how to read and write.

Tosti@feddit.nl · edit-2 2 years ago

deleted by creator

makeasnek@lemmy.ml · edit-2 2 years ago

No that’s not how it works. It stores learned information like “word x is more likely to follow word y than word a” or “people from country x are more likely to consume food a than b”. That is what is distributed when the AI model is shared. To learn that, it just reads books zillions of times and updates its table of likelihoods. Just like an artist might listen to a Lil Wayne album hundreds of times and each time they learn a little bit more about his rhyme style or how beats work or whatever. It’s more complicated than that, but that’s a layperson’s explanation of how it works. The book isn’t stored in there somewhere. The book’s contents aren’t transferred to other parties.

Madison_rogue@kbin.social · 2 years ago

The learning model is artificial, vs a human that is sentient. If a human learns from a piece of work, that’s fine if they emulate styles in their own work. However, sample that work, and the original artist is due compensation. This was a huge deal in the late 80s with electronic music sampling earlier musical works, and there are several cases of copyright that back original owners’ claim of royalties due to them.

The lawsuits allege that the models used copyrighted work to learn. If that is so, writers are due compensation for their copyrighted work.

This isn’t litigation against the technology. It’s litigation around what a machine can freely use in its learning model. Had ChatGPT, Meta, etc., used works in the public domain this wouldn’t be an issue. Yet it looks as if they did not.

EDIT

And before someone mentions that the books may have been bought and then used in the model, it may not matter. The Birthday Song is a perfect example of copyright that caused several restaurant chains to use other tunes up until the copyright was overturned in 2016. Every time the AI uses the copied work in its’ output it may be subject to copyright.

Heratiki@lemmy.ml · 2 years ago

The creator of ChatGPT is sentient. Why couldn’t it be said that this is their expression of the learned works?

Madison_rogue@kbin.social · 2 years ago

https://crsreports.congress.gov/product/pdf/LSB/LSB10922

Heratiki@lemmy.ml · 2 years ago

I’ve glanced at these a few times now and there are a lot of if ands and buts in there.

I’m not understanding how an AI itself infringes on the copyright as it has to be directed in its creation at this point (GPT specifically). How is that any different than me using a program that will find a specific piece of text and copy it for use in my own document. In that case the document would be presented by me and thus I would be infringing not the software. AI (for the time being) are simply software and incapable of infringement. And suing a company who makes the AI simply because they used data to train its software is not infringement as the works are not copied verbatim from their original source unless specifically requested by the user. That would put the infringement on the user.

Phanatik@kbin.social · 2 years ago

There’s a bit more nuance to your example. The company is liable for building a tool that allows plagiarism to happen. That’s not down to how people are using it, that’s just what the tool does.

FaceDeer@kbin.social · 2 years ago

Did anyone expect them to go “oh, okay, that makes sense after all”?

JokeDeity@lemm.ee · 2 years ago

Wah. Waaaah. Cry more rich people.

Franzia@lemmy.blahaj.zone · 2 years ago

Writers are rich because they’ve made artwork and sold it. I personally hold that to a higher value than CEOs.

floofloof@lemmy.ca · 2 years ago

And while these ones may not be badly off, most writers are far from rich.