See how companies are responsibly integrating AI into production environments. This invitation-only event in SF explores the intersection of technology and business.Find out how to participate here.
During testing, the recently released Large-Scale Language Model (LLM) recognize that the relevance of the information being processed is evaluated and commented on; This has led to speculation that this response could be an example of: metacognition, understanding one's own thought processes. This recent LLM of his has sparked a discussion about the potential for self-awareness in AI, but the real story is in the sheer power of the model, providing an example of the new capabilities that will arise as LLMs grow. Masu.
Along with this, new capabilities and costs have increased, and are now reaching astronomical numbers.Just as the semiconductor industry has consolidated around a small number of companies that can deploy multi-billion dollar state-of-the-art chip manufacturing plants, so too will the AI field soon. Only the biggest tech giants rule — and its partners — can cover the development costs of modern fundamental LLM models such as GPT-4 and Claude 3.
The cost of training these modern models with capabilities that match, and in some cases exceed, human-level performance is rising. In fact, the training costs associated with it are latest model Approaching $200 million, it threatens to change the landscape of the industry forever.
![](https://venturebeat.com/wp-content/uploads/2024/05/image3.jpg?resize=1430%2C1010&strip=all)
If this rapid performance increase continues, not only will AI capabilities advance rapidly, but costs will also increase rapidly. Anthropic is one of the leaders in building language models and chatbots.at least to that extent Benchmark test results Show, their flagship product Claude 3 is probably definitely current leader In performance. Similar to GPT-4, it is considered a foundational model pre-trained on diverse and extensive data to develop a broad understanding of language, concepts, and patterns.
![](https://venturebeat.com/wp-content/uploads/2024/05/image2.jpg?resize=759%2C672&strip=all)
![](https://venturebeat.com/wp-content/uploads/2024/05/image2.jpg?resize=759%2C672&strip=all)
Dario Amodei, the company's co-founder and CEO, recently said: Dissaid a bad word The cost to train these models is approximately $100 million to train Claude 3. He added that the cost of the model, which is currently in training and is expected to be introduced in late 2024 or early 2025, is „close to $1 billion.“
![](https://venturebeat.com/wp-content/uploads/2024/05/image4.jpg?resize=1270%2C847&strip=all)
![](https://venturebeat.com/wp-content/uploads/2024/05/image4.jpg?resize=1270%2C847&strip=all)
To understand the reasons behind such cost increases, it is important to note that these models are becoming increasingly complex. Each new generation increases the number of parameters that enable more complex understanding and query execution, more training data, and greater amounts of computing resources required. Amodei believes it will cost $5 billion to $10 billion to train the latest models by 2025 or 2026. This prevents all but large corporations and their partners from building these foundation LLMs.
AI follows the semiconductor industry
In this way, the AI industry is following the same path as the semiconductor industry. In the late 20th century, most semiconductor companies designed and manufactured their own chips. As the industry followed Moore's Law, a concept that describes the exponential rate of improvement in chip performance, the cost of each new generation of equipment and fabrication plants to produce semiconductors increased proportionately.
Because of this, many companies ultimately choose to outsource the manufacturing of their products. AMD is a good example. The company manufactured its core semiconductors in-house, but in 2008 it decided to: spin off a manufacturing plantTo reduce costs, they are also called fabs.
Due to the required capital costs, only three semiconductor companies are currently building state-of-the-art fabs using the latest process node technologies: TSMC, Intel, and Samsung. Recent TSMC Said It will cost about $20 billion to build a new factory to produce cutting-edge semiconductors. Many companies, including Apple, Nvidia, Qualcomm, and AMD, outsource the manufacturing of their products to these factories.
Impact on AI – LLM and SLM
The impact of these cost increases will vary across AI environments, as not all applications require the latest, most powerful LLM. That also applies to semiconductors. For example, a computer's central processing unit (CPU) is often made using the latest high-end semiconductor technology. However, you don't need to build it using the fastest or most powerful technology because you're surrounded by other chips for memory and networking that run slower.
The AI parallel here is that a number of smaller LLM alternatives have emerged, such as Mistral and Llama3, which are More than a trillion It is believed to be part of GPT-4. Microsoft recently released its own Small Language Model (SLM), Phi-3.as report According to The Verge, it contains 3.8 billion parameters and was trained on a comparatively small dataset. LLMs like GPT-4.
Smaller size and training datasets help keep costs down, even if they don't provide the same level of performance as larger models. In this way, these SLMs are much like computer chips that support a CPU.
Nevertheless, smaller models may be suitable for certain applications, especially those that do not require complete knowledge across multiple data domains. For example, SLM allows you to fine-tune company-specific data and terminology to respond accurately and individually to customer inquiries. Alternatively, it can be trained with data from a specific industry or market segment, or used to generate comprehensive, customized research reports and answers to queries.
As Rowan Curran, Senior AI Analyst at Forrester Research said recently Regarding the different language model options, „You don't always need a sports car. Sometimes you want a minivan or a pickup truck. There's no reason for it to be one broad class of models that everyone uses for all use cases. Not.”
Fewer players means more risk
Just as rising costs have historically limited the number of companies that can manufacture high-end semiconductors, similar economic pressures are currently shaping the landscape for large-scale language model development. These rising costs threaten to limit AI innovation to a few powerful companies, inhibiting a broader range of creative solutions and reducing diversity in the field. High barriers to entry can prevent startups and small businesses from contributing to AI development, reducing the scope of ideas and applications.
To counter this trend, the industry needs to support smaller, specialized language models that provide important and efficient functionality for a variety of niche applications, as well as essential components of broader systems. there is. Promoting open source projects and collaboration is essential to democratizing AI development, allowing a wider range of participants to influence this evolving technology. By fostering an inclusive environment now, we can ensure that the future of AI, characterized by broad access and equitable innovation opportunities, maximizes the benefits across the global community.
Gary Grossman is Vice President of Technology Practice. edelman He is also the global leader of the Edelman AI Center of Excellence.
data decision maker
Welcome to the VentureBeat community!
DataDecisionMakers is a place where experts, including technologists who work with data, can share data-related insights and innovations.
If you want to read about cutting-edge ideas, updates, best practices, and the future of data and data technology, join DataDecisionMakers.
You may also consider Submit an article It's your own!