Save now
off all Envato Elements plans this Cyber Sale.
Get in quick!

We independently review everything we recommend. When you buy through our links, we may earn a commission.

Your old photos could be training AI as Shutterstock licenses to Meta, Google, Amazon, and Apple

By Matic Broz
Your old photos could be training AI as Shutterstock licenses to Meta, Google, Amazon, and Apple

Takeaways

  • Early internet image hosts are licensing their vast content libraries to train AI models used by major tech companies.
  • This creates a new revenue stream for stock photo sites, with Shutterstock securing major deals with tech giants.
  • Using old online content for AI training raises concerns about exposing people’s past data without their consent.

Remember Photobucket, the once-popular image hosting site? Or Freepik, the stock photo repository? They could be looking at significant cash inflow in the age of generative AI. According to a new Reuters report, the companies are part of a growing wave of content providers cashing in as tech giants scramble to license massive amounts of data to train their AI models.

For now, the biggest winner seems to be Shutterstock, in this new AI data goldrush. This stock provider has already partnered with Nvidia for 3D models and OpenAI for their AI video model called Sora. But now, according to a person familiar with the deals, Shutterstock has struck agreements worth tens of millions of dollars each with Meta, Google, Amazon, Apple, and others to license its library of hundreds of millions of images, videos, and audio files to train generative AI.

The agreements initially ranged from $25 million to $50 million per tech giant, said Shutterstock CFO Jarrod Yahes, though most were later expanded amid a “flurry of activity” over the past two months as smaller players piled in.

Shutterstock’s competitor Freepik has inked similar deals at 2 to 4 cents per image, CEO Joaquin Cuenca Abela told Reuters, with five more in the pipeline. But this concerns me because I recently found out that more than half of Freepik images are AI-generated—and training AI on AI data is seldom a good idea.

It’s part of a frenzy gripping Big Tech as they rush to develop cutting-edge AI that can generate text, imagery, audio, and code in response to simple prompts—a potential paradigm shift for human-machine interaction.

With early generative AI models like ChatGPT initially trained on freely scraped online data, tech firms now face lawsuits over copyright concerns. They are hedging risks by tapping secure, licensed sources.

So far, they’ve struck deals worth tens of millions of dollars each to providers like news wires and a burgeoning industry of “ethical” AI data brokers who acquire the rights to real-world content like social media posts, podcasts, videos, and books.

One broker, Defined.ai, sells datasets marked as “ethically sourced” after getting consent to use the content and stripping personal identifiers. It licenses images for $1–$7 each, short videos for $2–$4, and longer films for $100-$300 per hour, according to CEO Daniela Braga.

But the underground data gold rush is surfacing privacy concerns too. Some experts warn resurrecting old online archives like Shutterstock rival Photobucket’s 13 billion photos and videos could expose people’s private data from decades past in generative AI outputs without their consent.

Photobucket CEO Ted Leonard dismisses such worries, citing new terms allowing it to sell user data to train AI. As generative AI booms, the battle over who owns the fuel to drive it is heating up.

Meet your guide

matic broz
Matic Broz

Matic Broz is stock media licensing expert and a photographer. He promotes proper and responsible licensing of stock photography, footage, and audio, and his writing has reached millions of creatives.

Keep reading