November 7, 2023
By Bob O'Donnell
As someone who’s researched, written about, and closely tracked the evolution of generative AI (GenAI) and how it’s being deployed in real-world business environments, it never ceases to amaze me how quickly the landscape around us is changing. Ideas and concepts that seemed years away just a few months ago—such as the ability to run foundation models directly on client devices—are already here. At the same time, some of our early expectations around how the technology might evolve and be deployed are shifting as well—and the implications could be big.
In the case of basic technological development and deployment of GenAI, for example, there’s been a growing recognition that the two-step process of model training and model inferencing isn’t happening in the way we were led to believe. In particular, it turns out that only a handful of companies are building their own foundation models and training them from scratch. Instead, the vast majority of work being done is the customization of existing models.
While some might argue that the difference between the training and customization of things like large language models (LLMs) is one of semantics, in truth they imply a much bigger impact. For one, this trend is highlighting the fact that only the largest companies have the resources and money to not only build these models from scratch but also maintain and evolve them. It is companies like Microsoft, Google, Amazon, Meta, IBM, Salesforce—along with the companies they’re choosing to invest in and partner with, such as OpenAI, Anthropic, etc.—that are the ones doing the majority of the model creation work. Sure, there are plenty of startups and other smaller companies that are toiling away at creating their own foundation models, but there are increasing questions about how viable those types of business models are in the long run. In other words, the market is increasingly looking like yet another case of big tech companies getting bigger.
The reasons for this go beyond the typical factors of skill set availability, experience with the technology, and trust in big brand names. Indeed, because of the extensive reach and influence that GenAI tools are already starting to have (and which are predicted to expand even further), there are increasing concerns about legal issues and related factors. To put it simply, if large organizations are going to start depending on a tool that will likely have a profound impact on their business, they need to know that there’s a big company behind that tool that they can place the blame on in case something goes wrong. This is very different from many other new technology products that were often brought into organizations via startups and other small companies. The reach that GenAI is expected to have is simply too deep into an organization to be entrusted to anyone but a large, well-established tech company.
And yet, despite this concern, one of the other surprising developments in the world of GenAI has been the rapid adoption and usage of open-source models from places like Hugging Face. Both tech suppliers and businesses are partnering with Hugging Face at an incredibly rapid pace because of the speed at which new innovations are being introduced into the open models that they house.
So, how does one reconcile these seemingly incongruous, incompatible developments? It turns out that many of the models in Hugging Face are not entirely new ones but instead are customizations of existing models. So, for example, you can find things that leverage something like Meta’s open source and increasingly popular Llama 2 model as a baseline, but then are adapted to a particular use case. As a result, businesses can feel comfortable using something that stems from a large tech company but offers the unique value that other open-source developers have added to. It’s one of the many examples of the unique opportunities and benefits that the concept of separating the “engine” from the application—which GenAI is allowing developers to do—is now enabling.
From a market perspective, this means that the largest tech organizations will likely battle it out to produce the best “engines” for GenAI, but other companies and open-source developers can then leverage those engines for their own work. The implications of this, in turn, are likely to be large when it comes to things like pricing, packaging, licensing, business models, and the money-making side of GenAI. At this early stage, it’s unclear exactly what those implications will be. One likely development, however, is the separation of these core foundation model engines and the applications or model customizations that sit on top of them when it comes to creating products—certainly something worth watching.
Interestingly, this separation of models from applications might also impact how foundation models run directly on devices. One of the challenges of this exercise is that foundation models require a great deal of memory to function efficiently. Also, many people believe that client devices are going to need to run multiple foundation models simultaneously in order to perform all the various tasks that GenAI is expected to enable. The problem is, while PC and smartphone memory specs have certainly been on the rise over the last few years, it’s still going to be challenging to load multiple foundation models into memory at the same time on a client device. One possible solution is to select a single foundation model that ends up powering multiple independent applications. If this proves to be the case, it raises interesting questions about partnerships between device makers and foundation model suppliers and the ability to differentiate amongst them.
In addition to shifts in model training, there have been some intriguing developments in the world of inference. In particular, rapidly growing technologies like RAG (Retrieveal Augmented Generation) provide a powerful way to customize models leveraging an organization’s own data. Basically, the way RAG works is that it provides a mechanism to perform a typical query to an LLM, but the answer is generated from an organization’s own cache of original content. Putting it another way, RAG leverages the learned skills from a fully trained model in terms of what rules it uses to select the content. It then builds its response by combining its own logic and basic language understanding with the unique material of the organization running the tool.
The beauty of this approach is twofold. First, it offers a significantly easier and less resource-intensive way of a customizing a model. Second, it simultaneously reduces the potential for hallucinations and other content problems by generating its response from the custom data set only and not the much wider set of content used to first build and train the model. As a result, the RAG approach is being quickly adopted by many organizations and looks to be a key enabler for future developments. What’s also interesting about it is that it changes the nature of how inferencing is done and shifts the focus of where the computing resources are required from the cloud to the data center and/or client devices.
Of course, given the rapid evolution of the GenAI world, it’s certainly possible that much of what I’ve argued here may be irrelevant or a moot point by the middle of next year. Still, it seems clear that important shifts are already occurring, and it’s going to be important for industry players to start shifting their messaging around those changes. Switching from the focus on training and inferencing of models to one that highlights model customization, for example, seems overdue based on the realities of today’s marketplace. Along similar lines, providing more information around technologies like RAG and their potential influence on the inferencing process also seems critical to help educate the market.
There’s no longer much doubt about the impact that GenAI is expected to make on businesses of all sizes. The path to reach that level of impact and the pace at which it will be achieved, however, are still very undefined. In that light, any efforts that the tech industry can make to better educate people about how GenAI is evolving—including through better, more refined messaging—are going to be extremely important. The process won’t be easy, but let’s hope more companies are willing to take on the challenge.
Here's a link to the original column: https://www.techspot.com/news/100755-opinion-rapidly-evolving-state-generative-ai.html Bob O’Donnell is the president and chief analyst of TECHnalysis Research, LLC a market research firm that provides strategic consulting and market research services to the technology industry and professional financial community. You can follow him on LinkedIn at Bob O’Donnell or on Twitter @bobodtech. |