Reuters, New York City – The first trade association for the industry was established by seven content-licensing vendors of images, videos, music, and other datasets for use in AI system training, they announced on Wednesday.
According to a statement from the firms, the Dataset Providers Alliance (DPA) will support “ethical data sourcing” in AI system training, which includes protecting the intellectual property rights of content owners and upholding the rights of persons portrayed in datasets.
The United States music dataset company Rightsify, the German data marketplace Datarade, the Japanese stock photo supplier Pixta, and the image licencing platform Visual are among the founding members.
Recent years have seen the rise of generative AI technologies, which can imitate human creativity. This has led to a backlash from content creators and a series of copyright lawsuits against tech companies, including Microsoft-backed OpenAI (NASDAQ: MSFT), Google (NASDAQ: GOOGL), and Meta (NASDAQ: META), which makes ChatGPT.
Large amounts of content, much of which was free-scrapped from the internet without the owners’ permission or knowledge, have been fed to models by developers as a means of training them.
Tech companies, who maintain that the use is lawful, are also covertly funding access to private material collections to meet specific data requirements and protect themselves from legal and regulatory ramifications.
A new sector of businesses that bundle information and sell access to it for use by AI systems has emerged as a result of the expectation that demand for licenced data will increase if copyright owners win their legal battles.
Consequently, organisations have emerged to set moral guidelines for that industry. One such organisation is Fairly Trained, a nonprofit established last year that certifies models who have not utilised intellectual content without a licence.
The DPA focuses on the content of those transactions, requiring its members to pledge not to sell text data gathered by web crawling or audio that includes people’s voices without those people’s express consent, for example.
The NO FAKES Act, a U.S. bill that was introduced last year to impose penalties for producing unlicensed digital copies of people’s voices or likenesses, will be a major area of focus, according to Alex Bestall, CEO of Rightsify and its licencing subsidiary GCX, who spearheaded the group’s founding.
“Advocacy will be a big part of it because everyone’s taken their positions on AI and copyright, but a lot of these battles are yet to be solved and it’s going to take a while for them to be,” Bestall stated.
According to him, the DPA will also advocate for greater training data transparency laws, such as those found in the AI Act of the European Union and the Generative AI Copyright Disclosure Act, a bill that was submitted in the United States in April.