Advertisement

SKIP ADVERTISEMENT

By Eli Tan

Reporting from San Francisco

Last July, Google made an eight-word change to its privacy policy that represented a significant step in its race to build the next generation of artificial intelligence.

Buried thousands of words into its document, Google tweaked the phrasing for how it used data for its products, adding that public information could be used to train its A.I. chatbot and other services.

We use publicly available information to help train Google’s language AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.

The subtle change was not unique to Google. As companies look to train their A.I. models on data that is protected by privacy laws, they’re carefully rewriting their terms and conditions to include words like “artificial intelligence,” “machine learning” and “generative A.I.”

Some changes to terms of service are as small as a few words. Others include the addition of entire sections to explain how generative A.I. models work, and the types of access they have to user data. Snap, for instance, warned its users not to share confidential information with its A.I. chatbot because it would be used in its training, and Meta alerted users in Europe that public posts on Facebook and Instagram would soon be used to train its large language model.

Those terms and conditions — which many people have long ignored — are now being contested by some users who are writers, illustrators and visual artists and worry that their work is being used to train the products that threaten to replace them.

“We’re being destroyed already left, right and center by inferior content that is basically trained on our stuff, and now we’re being discarded,” said Sasha Yanshin, a YouTube personality and co-founder of a travel recommendation site.

This month, Mr. Yanshin canceled his Adobe subscription over a change to its privacy policy. “The hardware store that sells you a paintbrush doesn’t get to own the painting that you make with it, right?” he said.

To train generative A.I., tech companies can draw from two pools of data — public and private. Public data is available on the web for anyone to see, while private data includes things like text messages, emails and social media posts made from private accounts.

Public data is a finite resource, and a number of companies are only a few years away from using all of it for their A.I. systems. But tech giants like Meta and Google are sitting on a trove of private data that could be 10 times the size of its public counterpart, said Tamay Besiroglu, an associate director at Epoch, an A.I. research institute.

That data could amount to “a substantial advantage” in the A.I. race, Mr. Besiroglu said. The problem is gaining access to it. Private data is mostly protected by a patchwork of federal and state privacy laws that give users some sort of licensing over the content they create online, and companies can’t use it for their own products without consent.

In February, the Federal Trade Commission warned tech companies that changing privacy policies to retroactively scrape old data could be “unfair or deceptive.”

A.I. training could eventually use the most personal kinds of data, like messages to friends and family. A Google spokesperson said a small test group of users, with permission, had allowed Google to train its A.I. on some aspects of their personal emails.

Google added in a statement that the change to its privacy policy “simply clarified that newer services like Bard (now Gemini) are also included. We did not start training models on additional types of data based on this language change.”

Some companies have struggled to balance their hunger for new data with users’ privacy concerns. In June, Adobe faced backlash on social media after it changed its privacy policy to include a phrase about automation that many of its customers interpreted as having to do with A.I. scraping.

Our Access to Your Content: We may will only access, view, or listen to your Content through both automated and manual methods, but only in limited ways, and only as permitted by law.

The company explained the changes with a pair of blog posts, saying customers had misunderstood them. On June 18, Adobe added explanations to the top of some sections of its terms and conditions.

“We’ve never trained generative A.I. on customer content, taken ownership of a customer’s work or allowed access to customer content beyond legal requirements,” Dana Rao, Adobe’s general counsel and its chief trust officer, said in a statement.

This year, Snap updated its privacy policy about data collected by My AI, its A.I. chatbot that users can have conversations with.

My AI is an experimental chatbot a chatbot built on generative AI technology designed with safety in mind. It’s a fun way to get information, but it remains an evolving feature so you should always independently check answers provided by My AI before relying on any advice, and you should not share any confidential or sensitive information. Generative AI is a developing technology that may provide responses that are biased, incorrect, harmful or misleading. So, you should not rely on its advice. You should also not share any confidential or sensitive information — if you do, it will be used by My AI.

When you interact with My AI, we use the information we collect to improve Snap’s products and personalize your experience the content you share and your location (if you have enabled location sharing with Snapchat) to improve Snap’s products, including enhancing My AI’s safety and security, and to personalize your experience, including ads.

A Snap spokesperson said the company gave “upfront notices” about how it used data to train its A.I. with the opt-in of its users.

In September, X added a single sentence to its privacy policy about machine learning and A.I. The company did not return a request for comment.

We use the information we collect to provide and operate X products and services. We also use the information we collect to improve and personalize our products and services so that you have a better experience on X, including by showing you more relevant content and ads, suggesting people and topics to follow, enabling and helping you discover affiliates, third-party apps, and services. We may use the information we collect and publicly available information to help train our machine learning or artificial intelligence models for the purposes outlined in this policy.

Last month, Meta alerted its Facebook and Instagram users in Europe that it would use publicly available posts to train its A.I. starting June 26, inciting some backlash. It later paused the plans after the European Center for Digital Rights brought complaints against the company in 11 European countries.

In the United States, where privacy laws are less strict, Meta has been able to use public social media posts to train its A.I. without such an alert. The company announced in September that the new version of its large language model was trained on user data that its previous iteration had not been trained on.

Meta has said its A.I. did not read messages sent between friends and family on apps like Messenger and WhatsApp unless a user tagged its A.I. chatbot in a message.

“Using publicly available information to train A.I. models is an industrywide practice and not unique to our services,” a Meta spokesperson said in a statement.

Many companies are also adding language to their terms of use that protects their content from being scraped to train competing A.I.

Adobe added this language in 2022:

No Modifications, Reverse Engineering, Artificial Intelligence/Machine Learning (AI/ML)

Except as expressly permitted in the Terms, you must not (and must not allow third parties to)… use the Services or Software, or any content, data, output, or other information received or derived from the Services or Software, to directly or indirectly create, train, test, or otherwise improve any machine learning algorithms or artificial intelligence system, including but not limited to any architectures, models, or weights.

Mr. Yanshin said that he hoped regulators could act fast in creating protections for small businesses like his against A.I. companies, and that traffic to his travel website had fallen 95 percent since it began competing with A.I. aggregators.

“People are going to sit around debating the pros and cons of stealing data because it makes a nice chatbot,” he said. “In three, four, five years’ time, there might not be entire segments of this creative industry because we’ll just be decimated.”

Advertisement

SKIP ADVERTISEMENT

By admin

Related Post