Repost β’ @scientific_american As AI-generated content fills the Internet, itβs corrupting the training data for models to come. What happens when AI eats itself?
π Link in bio to learn more
π€ @sophiebushwick
βοΈ Emily Harwitz (@em_witz), Rahul Rao
ποΈ Chris Schodt (@cschodt), Kelso Harper (@kelsodune)
#AI #artificialintelligence #technology #chatgpt #gpt4 #machinelearning #deeplearning #languagemodels
π Link in bio to learn more
π€ @sophiebushwick
βοΈ Emily Harwitz (@em_witz), Rahul Rao
ποΈ Chris Schodt (@cschodt), Kelso Harper (@kelsodune)
#AI #artificialintelligence #technology #chatgpt #gpt4 #machinelearning #deeplearning #languagemodels
mytreehousevision
2024-04-17 21:13:26
what happens when ai eats itself generative ai needs to be trained on a ton of data and a lot of times developers get that data from the internet the problem is you may have noticed the internet is currently filling up with ai generated text and music and images and videos when ai trains on ai generated data it can introduce errors that build up with each iteration in a recent study researchers started with the language model trained on human produced content and then they fed it ai generated text over and over again by the tenth iteration of this when they asked it a question about english historical architecture it spewed out nonsense about jack rabbits this phenomenon is called model collapse this study used a pretty small model but researchers think that even large models like gpp four or stable diffusion could suffer from collapse when trained on just a small amount of ai generated data ai model struggled the most with data that's less common and when models collapse they're more likely to lose this rare data that's further from the norm so researchers fear that this could make the problem of ai bias against marginalized groups even worse one way to avoid model collapse could be to use only human curated datasets but in a world increasingly flooded with generated content ai could end up being the snake that swallows its own tail
Text generated automatically and there is a chance to be inaccurate