Artificial intelligence
This couple is launching an organization to protect artists in the AI era
Mat Dryhurst and Holly Herndon want fellow creatives to be able to opt into or out of having their work used as training data for DALL-E and the like.
Want a surefire way to make AI image generators like DALL-E create captivating images? Prompt the system to mimic an artist.
It’s something the internet latched onto quickly, realizing the technique gives the best possible returns. By doing so, it’s possible to time travel to see how, say, Donald Trump would be portrayed by Picasso or Van Gogh or even a prehistoric cave painter.
Such “X in the style of Y” prompts work so well in large part because the reams of data used to train AI like DALL-E, Midjourney, and WOMBO are pulled from the internet, which is often populated by copyrighted imagery. The legality of using that training data is contested in the European Union; in the U.S., it’s widely believed that a lot of it is permissible under fair-use doctrine.
What’s legal and what’s moral are different questions — which is where artists Mat Dryhurst, an academic, and Holly Herndon, a renowned musician, come in. The married couple, based in Berlin, are AI veterans: Around 2016, they began training neural networks, later building such networks themselves. One project, Holly+, launched last year, allows anyone to upload a polyphonic track that can then be “sung” by a deepfaked version of Herndon’s voice, which is created using AI generative tools.
The pair are very considerate about what training data they use. “We ended up making the decision we would train our machine-learning systems on data that only came from us, or people who consented,” says Dryhurst.
In keeping with this mindset, Dryhurst and Herndon are developing a standard they’re calling Source+, which is designed as a way of allowing artists to and opt into — or out of — allowing their work being used as training data for AI. (The standard will cover not just visual artists, but musicians and writers, too.) They hope that AI generator developers will recognize and respect the wishes of artists whose work could be used to train such generative tools.
Source+ (now in beta) is a product of the organization Spawning — a partnership between the couple and Jordan Meyer and Patrick Hoepner, founders of the WolfBear software studio. Spawning, which officially launches today, also developed Have I Been Trained, a site that lets artists see if their work is among the 5.8 billion images in the Laion-5b dataset, which is used to train the Stable Diffusion and Midjourney AI generators. The team plans to add more training datasets to pore through in the future.
The entire project is designed to empower artists, at no cost to them. “The benefit of working with us is that we can serve or retract your data to all services on request, as opposed to chasing individual organizations down,” says Dryhurst.
Dryhurst says Spawning has been in touch with the developers of some of the most popular AI tools and received sympathetic responses. “I’m very optimistic that if we can establish a verified database of opt-in, opt-out wishes from artists that we can honor those wishes,” he says. “That’s the basic foundation on which a lot of good can come from these tools.”
Project goals
Dryhurst and Herndon have picked now to launch their project — and to start the discussion about ownership of data that trains AI — because of the huge public reaction to generative tools in recent months. “Incredible image-spawning systems like DALL-E have been helpful to bring this conversation mainstream, so now is a great time to intervene,” says Herndon via email.
Herndon adds, “It is also important to present a different narrative of how spawning is different to 20th century practices like sampling, in order to not derail a productive discussion about how we can deal with this new terrain fairly and with excitement.”
Spawning and sampling, the artists argue, are two different things entirely. “Spawning is a more reproductive process,” says Dryhurst. Rather than taking an element and remixing it to create new art, spawning is creating new artworks from a training data corpus.
The project isn’t aimed at stopping people putting, say, “A McDonalds restaurant in the style of Rembrandt” into DALL-E and gazing on the wonder produced. “Rembrandt is dead,” Dryhurst says, “and Rembrandt, you could argue, is so canonized that his work has surpassed the threshold of extreme consequence in generating in their image.” He’s more concerned about AI image generators impinging on the rights of living, mid-career artists who have developed a distinctive style of their own.
What Dryhurst doesn’t want to do is create a third-party rights police force. “We’re not looking to build tools for DMCA takedowns and copyright hell,” he says. “That’s not what we’re going for, and I don’t even think that would work.”
He also believes — contrary to what AI companies may fear — that artists will be more willing to accede to their work being used than you may think. “I believe more will ultimately opt in than out, but first we have to establish a common respect,” says Dryhurst. “A lot of good will come from getting everyone on that same page.”