Vana plans to let users rent out their Reddit data to train AI | TechCrunch

Vana plans to let users rent out their Reddit data to train AI | TechCrunch

In the generative AI boom, data is the new oil. So why shouldn't you be able to sell yours?

From big tech firms to startups, AI makers are licensing ebooks, images, videos, audio and more from data brokers, all in pursuit of more trainable content (and more legally defensible) AI-powered products. Shutterstock has. The deals Meta, with Google, Amazon and Apple to provide millions of images for model training, while OpenAI has Signed contracts with many news organizations training their models on news archives.

In many cases, the individual creators and owners of this data have never seen a dime of cash change hands. called a startup. Vana Want to change it.

Anna Kazlauskas and Art Abal, who met in a class at the MIT Media Lab focused on building technology for emerging markets, co-founded Vana in 2021. Automation startup, Iambiq, out of Y Combinator. A corporate lawyer by training and education, Amal was an associate at the Cadmus Group, a Boston-based consulting firm, before heading Impact Sourcing at data annotation company Appen.

With Vana, Kazlauskas and Abal set out to build a platform that lets users “pool” their data — including chats, speech recordings and photos — into datasets that are then used to train creative AI models. can go. They also want to create more personalized experiences—for example, daily motivational voicemails based on your wellness goals, or an art-creating app that understands your style preferences—that By fitting public models to the data.

“Vana's infrastructure essentially creates a treasure trove of user-owned data,” Kazlauskas told TechCrunch. “It does this by allowing users to collect their personal data in a non-custodial way … Vana allows users to own AI models and use their data in AI applications.”

Here is that Vana Delivers its platform and API to developers.:

The Vana API combines cross-platform user personal data … to allow you to personalize your application. Your app gets instant access to the user's personalized AI model or underlying data, simplifying onboarding and eliminating compute cost concerns … We think users should keep their personal data in walled gardens, e.g. Instagram, Facebook and Google, should be able to be brought to your application, so you can create an amazingly personalized experience the first time a user interacts with your AI application.

Creating an account with Vana is quite easy. After verifying your email, you can associate data with digital avatars (such as selfies, self-descriptions and voice recordings) and explore apps built using Vana's platform and datasets. can. The app selection ranges from ChatGPT-style chatbots and interactive storybooks to a Hinge profile generator.

Image credit: Vana

Now why, you might ask—in this age of data privacy awareness and rising ransomware attacks—would anyone volunteer their personal information to an anonymous startup, much less a venture-backed one? be (Vana has raised $20 million to date from Paradigm, Polychain Capital and other backers.) Can any for-profit company really be trusted not to abuse or misuse any monetizable data? Do what he has his hands on?

Wanna Reddit DAO

Image credit: Vana

In response to this question, Kazlauskas stressed that the whole point of Wana is for users to “take back control of their data”, noting that Wana users have the option to control their data. Self-hosted and controlled data is shared with apps and developers instead of being stored on servers. He also argued that, since Vana makes money by charging users a monthly subscription (starting at $3.99) and charging devs “data transaction” fees (e.g. for transferring datasets for AI model training), the company are encouraged to exploit users and the wealth of personal data they bring with them.

“We want to build models that are owned and governed by users who all contribute their own data, and allow users to bring their data and models with them into any application,” Kazlauskas said.

now that Vana It's not selling user data to companies to train generative AI models (or so it claims), it wants to allow users to do it themselves if they start with their Reddit posts.

This month, Vana launched it. Reddit Data DAO (Digital Autonomous Organization), a program that aggregates multiple users' Reddit data (including their karma and post history) and lets them decide how that shared data is used. After joining with a Reddit account, submit Application For their data on Reddit and for uploading that data to the DAO, users get the right to vote on decisions like licensing shared data to generative AI companies for shared profit with other DAO members.

This is sort of an answer to Reddit. Recent Actions To commercialize data on your platform.

Reddit previously did not grant access to posts and communities for AI training purposes. But it changed course late last year ahead of its IPO. Since the policy change, Reddit has raised more than $203 million in licensing fees from companies including Google.

“The broad idea (with The DAO) is to liberate user data from the big platforms that want to store and monetize it,” Kazlauskas said. “This is a first and part of our effort to help people aggregate their data into user-owned datasets to train AI models.”

Unsurprisingly, Reddit — which isn't working with Vana in any official capacity — isn't happy about The DAO.

Reddit banned Vana. subreddit Dedicated to the discussion about The DAO. And a Reddit spokesperson accused Vana of “exploiting” its data export system, which is designed to comply with data privacy regulations like GDPR and the California Consumer Privacy Act.

“Our data arrangements allow us to police such entities, even on public information,” a spokesperson told TechCrunch. “Reddit does not share non-public, personal data with commercial entities, and when Redditors request us to export their data, they receive non-public personal data back from us in accordance with applicable laws. Reddit And direct partnerships between vetted organizations, clear terms and accountability, matters, and these partnerships and agreements prevent misuse and misuse of people's data.”

But does Reddit have any real reason to be concerned?

Kazlauskas envisions The DAO growing to the point where it affects how much Reddit charges users for its data. That's far-fetched, assuming it ever happens. The DAO has only 141,000 members, a tiny fraction of Reddit's 73-million-strong user base. And some of these members may be bots or duplicate accounts.

Then there is the matter of how to fairly distribute the payments that the DAO can receive from data buyers.

Currently, The DAO rewards its Reddit-compliant users with “tokens” — cryptocurrency. Karma. But karma may not be the best measure of quality contributions in a data set — especially in smaller Reddit communities where there are fewer opportunities to earn it.

Kazlauskas floats the idea that DAO members could choose to share their cross-platform and demographic data, potentially making the DAO more valuable and incentivizing signups. But that would require users to trust Vana even more to responsibly serve their sensitive data.

Personally, I don't see Wana reaching KDAO in a big way. There are many obstacles in the way. However, I think this won't be the last grassroots effort to gain control over the data that is increasingly being used to train generative AI models.

Like startups Spawning are working on ways to allow creators to implement methods that guide how their data is used for training while vendors such as Getty Images, Shutterstock and Adobe Experiment with compensation schemes.. But no one has cracked the code yet. Can do this too. to be Cracked? was given cut throat The nature Of a generative AI industry, this is certainly a tall order. But maybe someone will find a way — or policymakers will force it.

About the Author

Leave a Reply