Artificial intelligence opens the gap wide, voraciously swallowing data scraped from web pages on the net – and out the other side comes something pretending to be fact. But what happens if the data chew contained right-wing extremism, conspiracy theories, and Russian propaganda?
Artificial intelligence is trained on huge amounts of data taken from the internet. It is this data that is the primary source of the AI’s “knowledge and world view”.
Several tech companies have kept secret what exactly they fed their AI with – OpenAI, for example, has not disclosed which data sets its popular tool ChatGPT has been trained on.
The research institute Allen Institute for AI has now, together with the newspaper Washington Post, analyzed Google’s C4 dataset to see which web pages the data has been taken from.
5G, anti-vaccine, and China
When SVT Nyheter looks at the list of websites, we find several Swedish sites.
Many news sites are included such as svt.se and sverigesradio.se. But also the Nazi Nordic Resistance Movement’s website Nordfront.
The left-wing extremist network Antifascistisk Aktion’s website is included, as well as the Radiation Protection Foundation, which spreads the message about the alleged health danger of wireless networks and 5G, and vaken.se, a site with large amounts of anti-vaccine content.
A selection of web pages whose data is included in Google’s C4 dataset. Photo: Facsimile
Google’s list of patents is the site from which the single most data has been retrieved, with Wikipedia in second place. But high on the list are also websites with links to the Kremlin and the Chinese Communist Party, such as the state-controlled Russian media house RT, which has been blocked in the EU since the start of the Ukraine war, and China Daily, which is owned by the Chinese Communist Party’s Publicity Department.
How do these types of sites affect the AI? Its views on the Holocaust, on vaccines, on the situation in Xinjiang – or the Ukraine war?
Source: svt
Recent Comments