Specialized models: How AI follows the path of hardware evolution

Join leaders in San Francisco on January 10th for an exclusive evening of networking, insight, and conversation.request an invitation here.

The industry is moving to deploying smaller, more specialized, and therefore more efficient systems. AI model This reflects the transformation we have witnessed earlier in the world of hardware. This means employing graphics processing units (GPUs), tensor processing units (TPUs), and other hardware accelerators for more efficient computing.

There is a simple explanation in both cases, and it comes down to physics.

CPU tradeoffs

The CPU was built as a general computing engine designed to perform arbitrary processing tasks. Sorting data, performing calculations, and controlling external devices. They handle a wide range of memory access patterns, computational operations, and control flow.

However, this generality comes at a price. CPU hardware components support a wide range of tasks and decisions about what the processor should do at a given time. This requires more silicon for the circuit, more energy to power it, and, of course, more time to perform those operations.

VB event

AI Impact Tour

Access to the AI Governance Blueprint – Request an invitation to the January 10th event.

learn more

This trade-off, while providing versatility, inherently reduces efficiency.

This directly explains why specialized computing has become increasingly the norm over the past 10-15 years.

GPU, TPU, NPU, oh

Today is Conversation about AI I didn't see any mention of GPUs, TPUs, NPUs, or various forms of AI hardware engines.

These special engines are worth waiting for. not generalized That is, it performs fewer tasks than a CPU, but is much more efficient because it is less versatile. They dedicate more transistors and energy to the task at hand, performing the actual computing and data access, and to the general task (and the various decisions related to what to compute/access at any given time). Reduce support for

They are much simpler and more economical, allowing the system to run more computing engines in parallel and perform more operations per unit of time and unit of energy.

Parallel shifts in language models at scale

Parallel evolution is unfolding in the following areas: large language model (LLM).

Similar to CPUs, popular models such as GPT-4 excel due to their versatility and ability to perform surprisingly complex tasks. However, its generality always comes at a cost in the number of parameters (there are rumors that there are trillions of parameters in the entire ensemble of models) and the associated computational costs and memory to evaluate all the operations required for inference. It comes from access costs.

This has led to specialized models like CodeLlama that can perform coding tasks with high accuracy (and potentially even higher accuracy) at a much lower cost. As another example, Llama-2-7B can perform typical language manipulation tasks such as entity extraction well, and at a much lower cost. Mistral, Zephyr, etc. are all capable small models.

This trend reflects a shift from sole reliance on the CPU to a hybrid approach that incorporates specialized computational engines such as GPUs into modern systems. GPUs excel at tasks that require parallel processing of simple operations such as AI, simulation, and graphics rendering, which form the bulk of the computing requirements in these domains.

Simpler operation requires fewer electrons

In the LLM world, the future lies in deploying many simpler models. Most AI tasks, reserve larger, more resource-intensive models for tasks whose functionality is really needed. And the good news is that many enterprise applications, such as working with unstructured data, text classification, and summarization, can all be run with smaller, specialized models.

The underlying principle is simple. Simple operations require fewer electrons and are more energy efficient. This is not just a technical choice. It is a command determined by fundamental principles of physics. The future of AI is therefore not about building ever-larger general models, but about embracing the power of specialization for sustainable, scalable, and efficient AI solutions.

Luis Sese CEO, OctoML.

data decision maker

Welcome to the VentureBeat community!

DataDecisionMakers is a place where experts, including technologists who work with data, can share data-related insights and innovations.

If you want to read about cutting-edge ideas, updates, best practices, and the future of data and data technology, join DataDecisionMakers.

You may also consider Submit an article It's your own!