“By 2024, 60% of the data used for the development of artificial intelligence and analysis projects will be generated synthetically”. This is a Gartner prediction that you will find in almost every article, presentation, or press release related to synthetic data.
We repeat this quote here despite its ubiquity because it says a lot about the total addressable market for synthetic data.
Let’s unpack: first, describing synthetic data that is “synthetically generated” may seem tautological, but it’s also pretty clear: we’re talking about data that is artificial/false. Y created, rather than collected in the real world.
Then there is the core of the prediction: that synthetic data will be used in the development of most AI and analytics projects. Since such projects are on the rise, the correlation is that the synthetic data market will also grow.
Last but not least is the time horizon. In our startup world, 2024 is almost today, and the folks at Gartner already have a longer-term prediction: part of their team published research “Forget Your Real Data: Synthetic Data Is the Future of AI” .
“The future of AI” is the kind of promise investors like to hear, so it’s no surprise that the checks have been flowing to synthetic data startups.
In 2022 alone, MOSTLY AI raised a $25 million Series B round led by Molten Ventures; Datagen landed a $50 million Series B led by Scale Venture Partners, and Synthesis AI pocketed a $17 million Series A.
Synthetic data startups that have raised significant amounts of funding already serve a wide range of sectors, from banking and healthcare to transportation and retail. But they expect the use cases to continue to expand, both within new sectors and those where synthetic data is already commonplace.
To understand what’s happening, but also what will happen if synthetic data becomes more widely adopted, we spoke with several CEOs and venture capitalists in recent months. We learned about the two main categories of synthetic data companies, what industries they target, how to size the market, and more.
The tip of the iceberg
Quiet Capital founding partner Astasia Myers is one of the investors bullish on synthetic data and its applications. She declined to reveal whether she invested in this space, but said “there’s a lot to be excited about in the world of synthetic data.”
Why the excitement? “Because it gives teams faster access to data securely at a lower cost,” he told TechCrunch.
“We can simply say that the synthetic data TAM and the data TAM will converge.” Ofir Zuk (Chakon)
Access to large amounts of data has become critical for machine learning teams, and real data is often not up to the task, for a variety of reasons. This is the gap that synthetic data startups hope to fill.
There are two main contexts these startups focus on: structured data and unstructured data. The former refers to the type of data sets found in tables and spreadsheets, while the latter targets what we might call media files, such as audio, text, and visual data.
“It makes sense to distinguish between structured and unstructured synthetic data companies,” Myers said, “because the type of synthetic data applies to different use cases and therefore different buyers.”