Mentors of Digital Innovation
cio-blog-banner.png

CIO Two Cents Blog

The ‘CIO Two Cents’ blog features insights from Yvette Kanouff, partner at JC2 Ventures. Learn what’s on the mind of CIOs at this moment in time.


Intelligent Inference

VOLUME 1 - ISSUE 16 ~ February 20, 2025

 

In this edition of the “CIO Two Cents” newsletter, I explore how smaller language models, efficient tools, and inference chips are revolutionizing AI with cost savings, enhanced performance, and future-ready innovations.
— Yvette Kanouff, partner at
JC2 Ventures

The JC2 Ventures team (John J. Chambers, Shannon Pina, John T. Chambers, me, and Pankaj Patel)

The JC2 Ventures team:
(John J. Chambers, Shannon Pina, John T. Chambers, me, and Pankaj Patel)

 

Ready for the highlights? Dive into the key takeaways below!

(1)

Smaller language models (SLMs) can be just as powerful as larger ones, offering significant cost savings and enhanced efficiency.

(2)

Inference chips are revolutionizing AI, with the market projected to hit USD $90.6B by 2030.

(3)

AI optimized clouds and open standards are crucial for leveraging future innovations while maintaining accuracy, security, and and compliance.

 

Artificial Intelligence costs are consuming many of my conversations these days. As companies have adopted AI strategies, the training and run-time costs can be brutal. I’m curious what strategies and products will evolve to help with these costs, so I thought I’d mention some of my current favorites.

SLMs – Bigger isn’t always better. This holds true for AI language models. More parameters in an AI model require more compute, longer processing, and more cost. However, this doesn’t always translate to improved accuracy. The key is to choose the most applicable parameters and train with the right data. Every AI use case doesn’t need a large language model, especially when SLMs or custom language models can save compute when trained on very specific data. Smaller models are showing great applicability and accuracy.

Model Optimization Tools – I am excited about some of the model optimizations that provide better efficiency and outcomes. Options like pruning unused or under-used results or workflows, quantization to focus on weighting factors, and model compression through knowledge distillation to transfer select data to smaller, more efficient models are all effective strategies for improving and making inference more cost-effective.

Efficient Models, Model Merging – The thought of creating efficiency by combining various models into one, optimized model is a great trend. This provides developers the option to take the best of many models and save on training time and resources.

Inference Chips – I have mentioned in the past that I am a fan of inference chips which are separate from math-optimized training chips and provide huge energy savings and speed optimizations. AWS CEO Matt Garman estimates that inference constitutes about half of the work done on AI computing servers today. There is no lack of competition to Nvidia in the inference market. Groq, Cerebras, Positron AI, and Amazon are all touting groundbreaking results with inference chips to compete with Nvidia. The AI inference chip market size is projected to reach USD $90.6B by 2030.

AI Optimized Clouds – There are many cloud solutions that optimize hardware and caching mechanisms for your models. Parallel processing GPUs and optimized inference servers are available. 

Tool and Cloud Neutrality – Just as we saw when many companies tied themselves to one cloud provider with tooling, the same applies to AI. By sticking with open standards and non-tied development, companies are likely to leverage future efficiency innovations across various vendors.

I find it fascinating how conversations have shifted from LLMs to tooling and optimization. Getting optimal results with AI is a critical topic. With so many model choices, how can we utilize AI and cost-effective results with maximum accuracy, security, and compliance? That said, let’s not lose focus on what’s most important – the outcomes that help our businesses, services, or products.  

What are some of your favorite tools and optimization mechanisms?

 

Image of the Moment

 

Photo: iStock

 

Your Thoughts on Intelligent Inference

 
 

John ChambersJC2 Ventures