Synthetic AI Training Data Works Until It Spectacularly Doesn't
Synthetically real so it should synthetically work... right?

AI companies are using synthetic data created by computers because they’re running out of real data.
For data that’s accurate and predictable that’s great, but for anything else it’s totally not.
The "synthetic universe" composed of that synthetic data is just going to be a small faux subset of the real universe so I’m not sure if or how that's going to help with any "corner cases" from the actual universe... Imagine you have data from simulated driving where the simulation doesn't have the unexpected (and often DANGEROUS) corner cases- The model wouldn't even need additional training data because the regular data would already contain all the unexpected stuff!… but of course it doesn’t either.
Now move the topic to all the “unexpected questions” some user might type into any prompt. Ditto for sounds, images, and whatever the heck else are out there in the real world. Sure, use synthetic data. It’s only good for very limited things.