TBPN
← Back to Blog

VAST Data, CoreWeave, Broadcom, and the Hidden AI Infrastructure Stack

Discover the hidden AI infrastructure stack powering the AI revolution — VAST Data, CoreWeave, Broadcom, and the picks-and-shovels companies that matter most.

VAST Data, CoreWeave, Broadcom, and the Hidden AI Infrastructure Stack

When people talk about the AI revolution, they talk about models. GPT-5, Claude 4, Gemini Ultra, Llama 4 — the frontier models get the headlines, the Twitter discourse, and the venture capital. But here's a truth that infrastructure engineers have known for years and investors are only now fully grasping: models are only as good as the infrastructure they run on.

Behind every ChatGPT response, every Midjourney image, every Claude analysis, there's a vast and mostly invisible infrastructure stack that makes it all possible. GPU clusters worth hundreds of millions of dollars. Storage architectures designed for workloads that didn't exist five years ago. Networking fabrics that move data at speeds that would have seemed absurd a decade ago. Power systems that consume as much electricity as small cities. Cooling systems that prevent billions of dollars in hardware from melting.

This is the picks-and-shovels layer of the AI gold rush. And while the model companies capture the public imagination, the infrastructure companies are capturing an increasing share of the value. Many of them are obscure. Some are private. A few are publicly traded and quietly delivering returns that rival or exceed the model companies. Understanding this layer is essential for anyone who wants to understand where AI is actually heading — because the infrastructure bottlenecks will define AI's next phase more than any model breakthrough.

VAST Data: Why AI Needs a New Storage Architecture

VAST Data is a company that most people outside the data infrastructure world have never heard of. Within that world, it's one of the most consequential companies of the decade. Founded in 2016 and valued at over $9 billion in its latest funding round, VAST has built a unified storage platform that is rapidly becoming the standard for AI data infrastructure.

The Storage Problem That AI Created

Traditional enterprise storage was built for two primary workloads: transactional databases (structured data, fast reads/writes, consistent performance) and file storage (documents, images, videos, large sequential reads). These workloads were well-understood, and storage architectures were optimized for them over decades.

AI workloads broke these assumptions. Training a large language model requires:

  • Massive sequential reads — ingesting terabytes to petabytes of training data, often stored across multiple formats (text files, databases, image archives, video repositories)
  • Random access at scale — during training, models need to randomly sample from the dataset, requiring storage that can handle millions of small random reads per second
  • High-throughput writes — checkpointing (saving model state during training) requires writing hundreds of gigabytes to storage quickly and reliably, because a training run that fails without a recent checkpoint can cost millions in wasted compute
  • Multi-format data access — the same data needs to be accessible as files (for data scientists), objects (for data pipelines), and database records (for metadata queries), often simultaneously

Traditional storage architectures handled these requirements by using different systems for each workload — a file system for data access, an object store for data lakes, a database for metadata. This created data silos, duplication, and operational complexity that scaled linearly with the size of AI deployments.

VAST's Unified Architecture

VAST Data's insight was that AI workloads need a single storage system that handles all of these patterns simultaneously. Their platform — the VAST Data Platform — unifies file, object, and database access into a single namespace with consistent performance characteristics across all access patterns.

The technical architecture is built on several innovations:

  • Disaggregated Shared Everything (DASE). VAST separates storage media from storage processing, allowing both to scale independently. You can add more storage capacity without adding more processing power, and vice versa.
  • All-flash at scale. Unlike traditional tiered storage (fast SSD for hot data, slow HDD for cold data), VAST stores everything on flash memory. Their data reduction algorithms make this economically viable by achieving 5-10x data reduction ratios.
  • Global namespace. A single logical storage system that can span multiple data centers, multiple regions, and multiple cloud providers. Data appears in one place regardless of where it's physically stored.
  • Built-in data management. Data cataloging, versioning, access control, and lifecycle management are built into the storage layer rather than bolted on top.

Why VAST Matters for AI's Future

As AI models get larger and training datasets grow from terabytes to petabytes to exabytes, storage becomes a binding constraint. You can buy more GPUs. You can rent more cloud compute. But if your storage system can't feed data to those GPUs fast enough, they sit idle — burning electricity and money while producing nothing.

VAST's customer list reads like a who's who of AI: Meta, Microsoft, NVIDIA, and dozens of other companies building and deploying large AI systems. The company's growth rate — reportedly over 100% year-over-year revenue growth through 2025 — reflects the urgency of the storage problem in AI infrastructure.

CoreWeave: From Crypto Mining to AI's GPU Cloud

The story of CoreWeave is one of the most remarkable pivots in recent tech history. Founded in 2017 as a cryptocurrency mining operation, CoreWeave recognized that the GPU infrastructure they'd built for mining could be repurposed for a far larger market: AI compute. That pivot has turned a crypto mining company into a cloud infrastructure provider valued at approximately $35 billion.

The GPU Cloud Thesis

CoreWeave's founding insight was that the major cloud providers — AWS, Azure, and GCP — were not optimized for GPU-intensive workloads. These hyperscalers built their infrastructure for general-purpose computing: web applications, databases, API services. GPUs were available but not prioritized. Availability was inconsistent. Pricing was opaque. The software stack wasn't optimized for the specific needs of ML training and inference.

CoreWeave built a cloud infrastructure specifically designed for GPU workloads:

  • GPU-first architecture. Every design decision — from data center layout to networking to cooling — is optimized for dense GPU deployments. CoreWeave's data centers pack more GPU compute per square foot than any hyperscaler.
  • InfiniBand networking. AI training workloads require ultra-low-latency, ultra-high-bandwidth networking between GPUs. CoreWeave deploys InfiniBand interconnects throughout their infrastructure, providing the networking performance that multi-node training requires.
  • Kubernetes-native. CoreWeave's platform is built on Kubernetes, making it familiar to modern engineering teams and enabling sophisticated workload orchestration without proprietary lock-in.
  • Transparent pricing. Unlike hyperscalers that use complex pricing schemes with reserved instances, spot pricing, and committed use discounts, CoreWeave offers straightforward per-GPU-hour pricing.

The $35 Billion Valuation Question

CoreWeave's valuation has attracted both enthusiasm and skepticism. The bull case: AI compute demand is growing exponentially, CoreWeave has locked in long-term contracts with major AI companies (reportedly including Microsoft), and their GPU-optimized infrastructure gives them a genuine performance and efficiency advantage over hyperscalers for AI workloads.

The bear case: CoreWeave is a capital-intensive business that requires massive upfront investment in hardware that depreciates rapidly. Their primary supplier (NVIDIA) has enormous pricing power. And the hyperscalers are rapidly improving their own GPU offerings — AWS, Azure, and GCP have all made major investments in GPU-optimized infrastructure in 2025-2026.

The truth is probably somewhere in between. CoreWeave has carved out a real niche serving AI workloads that the hyperscalers handle poorly. Whether that niche remains large enough to justify a $35 billion valuation depends on how quickly the hyperscalers close the gap — and how quickly AI compute demand grows to create room for multiple winners.

Broadcom: The Networking and Custom Silicon Giant

Broadcom is the company that connects everything to everything. Their networking chips, custom silicon, and infrastructure software touch virtually every data center on earth. And the AI revolution has turned Broadcom from a steady infrastructure play into one of the best-performing semiconductor stocks of the past two years.

Networking: The AI Bottleneck No One Talks About

Here's a fact that doesn't get enough attention: most AI training is network-bound, not compute-bound. Modern AI training distributes computation across hundreds or thousands of GPUs. These GPUs need to communicate constantly — sharing intermediate results, synchronizing gradients, coordinating data loading. The speed of this communication directly determines the efficiency of the training run.

Broadcom's Tomahawk and Jericho switching ASICs are the backbone of data center networking. Their chips power the switches that connect GPU clusters, storage systems, and the broader data center network. As AI clusters grow larger — from hundreds to thousands to tens of thousands of GPUs — the networking requirements grow quadratically. More GPUs means more inter-GPU communication, which means more switching capacity, which means more Broadcom chips.

Custom Silicon: The ASIC Opportunity

Broadcom is also a major player in custom AI silicon. While NVIDIA dominates the GPU market for general-purpose AI compute, several of the largest AI companies are developing custom chips optimized for their specific workloads. Google has its TPUs (designed with Broadcom). Meta is developing its MTIA chips. Other major tech companies have their own custom silicon programs.

Broadcom's custom silicon division designs and manufactures these chips, leveraging their deep expertise in semiconductor design, packaging, and manufacturing relationships with TSMC. This business is growing rapidly as more companies recognize that custom silicon optimized for their specific AI workloads can deliver significant cost and performance advantages over general-purpose GPUs.

The VMware Acquisition Impact

Broadcom's $69 billion acquisition of VMware in 2023 added enterprise software to Broadcom's portfolio, making the company a bridge between hardware infrastructure and software infrastructure. VMware's virtualization technology is the foundation of most enterprise data centers, and Broadcom is integrating AI management capabilities into the VMware platform — enabling enterprises to deploy and manage AI workloads alongside traditional workloads in their existing infrastructure.

The acquisition has been controversial — Broadcom raised VMware prices significantly, driving some customers to competitors. But it has also positioned Broadcom as a one-stop-shop for AI infrastructure: networking chips, custom silicon, and the software that manages it all.

The Supporting Cast: Companies That Complete the Stack

VAST Data, CoreWeave, and Broadcom are the headliners, but the AI infrastructure stack includes several other critical companies that deserve attention.

Arista Networks: Data Center Switching at Scale

Arista Networks builds the high-performance Ethernet switches that connect AI infrastructure. While InfiniBand (from NVIDIA/Mellanox) dominates within GPU clusters, Arista's switches handle the broader data center network — connecting GPU clusters to storage, to the internet, and to each other. As AI data centers grow, Arista's addressable market grows with them. The company has reported that AI-related orders now represent over 25% of their revenue.

Vertiv: Cooling and Power Infrastructure

Vertiv may be the most unsexy company in the AI stack — and one of the most essential. They build the cooling systems, power distribution, and thermal management infrastructure that keeps AI data centers running. A single NVIDIA H100 GPU generates roughly 700 watts of heat. A cluster of 10,000 H100s generates 7 megawatts of heat — enough to warm a small town. Without Vertiv's cooling systems, these clusters would overheat in minutes.

The shift toward liquid cooling for AI data centers has been a major growth driver for Vertiv. Traditional air cooling can't handle the heat density of modern GPU clusters, and Vertiv has become a leading provider of direct-to-chip liquid cooling solutions that are now standard in new AI data center builds.

Celestica: Server Assembly and Integration

Celestica is a contract electronics manufacturer that assembles and integrates the servers, networking equipment, and storage hardware that goes into AI data centers. While companies like NVIDIA design the chips and Dell or HPE sell the servers, Celestica does much of the actual manufacturing and assembly work. Their revenue has grown significantly as AI infrastructure buildout has accelerated. Think of them as the Foxconn of AI infrastructure — not a household name, but essential to the supply chain.

Why Infrastructure Companies Matter More as Models Scale

Here's the counter-intuitive insight that many investors and observers miss: as AI models get bigger and more capable, the value shifts from models to infrastructure.

In the early days of AI, models were the scarce resource. Only a handful of companies had the expertise and data to train frontier models. Infrastructure was abundant — you could rent GPUs from AWS, use off-the-shelf storage, and scale with standard networking.

Today, the situation is reversed. Model architectures are increasingly well-understood, and the gap between frontier and open-source models is shrinking. The scarce resources are now compute, storage, power, and networking. The companies that control these resources have pricing power, high barriers to entry, and growing demand. The model companies compete with each other on capabilities; the infrastructure companies sell to all of them.

This is the classic picks-and-shovels thesis, and it's playing out in AI exactly as it has in previous technology cycles. During the cloud computing boom, AWS (infrastructure) captured more value than the vast majority of applications built on top of it. During the mobile revolution, Apple and Qualcomm (hardware) captured more value than most app developers. The pattern repeats because infrastructure has natural monopoly characteristics — high capital requirements, economies of scale, and network effects — that create durable competitive advantages.

Understanding the AI infrastructure stack isn't just an academic exercise — it's essential for anyone making investment, career, or strategic decisions in tech. Whether you're wearing your TBPN hoodie while researching semiconductor stocks or following the TBPN daily show for infrastructure analysis, this layer of the stack deserves your attention.

The Infrastructure Bottlenecks That Will Define AI's Next Phase

Looking ahead, several infrastructure constraints will shape what AI can and can't do over the next 2-3 years.

Power and Energy

The most binding constraint on AI growth is electrical power. New AI data centers require hundreds of megawatts of reliable power — and the lead time to build new power generation and transmission capacity is measured in years, not months. We'll explore this in depth in our companion piece on AI and energy.

GPU Supply

Despite NVIDIA's aggressive capacity expansion, GPU supply remains constrained for frontier AI training. TSMC's advanced packaging capacity (CoWoS) is the bottleneck — they can't package chips as fast as the market demands them. This constraint is expected to ease gradually through 2027 but will continue to limit the pace of AI infrastructure buildout.

Networking Bandwidth

As GPU clusters scale from thousands to tens of thousands of chips, networking bandwidth becomes the limiting factor for training efficiency. The industry is moving from 400 Gbps to 800 Gbps and eventually 1.6 Tbps interconnects, but each generation requires new switches, new cables, and new optical transceivers — adding cost and complexity to every data center build.

Skilled Labor

Building and operating AI infrastructure requires specialized skills that are in extremely short supply. Data center engineers, network architects, cooling specialists, and AI infrastructure operators are among the most sought-after professionals in tech. The talent constraint is often overlooked but is very real — you can buy the hardware, but you need humans who know how to deploy and manage it at scale.

Physical Space

AI data centers require large, purpose-built facilities with adequate power, cooling, and network connectivity. Suitable locations are limited — you need proximity to power generation, access to cooling water (for liquid cooling), fiber connectivity, and favorable zoning. The competition for data center sites has driven up real estate prices in key markets and created planning backlogs in popular jurisdictions.

For anyone tracking the AI infrastructure stack — whether you're an investor, an engineer, or just someone who wants to understand the technology underneath the headlines — the picks-and-shovels layer is where the real action is. The TBPN team covers these companies regularly during their daily show, often digging into the infrastructure stories that mainstream tech media overlooks. Stay sharp and stay informed with a TBPN mug by your side during those deep-dive research sessions.

Frequently Asked Questions

What is the "picks and shovels" thesis in AI investing?

The picks-and-shovels thesis draws an analogy to the California Gold Rush, where the most reliable profits went not to gold miners but to the companies selling mining equipment. Applied to AI, it suggests that the companies providing AI infrastructure — GPUs, storage, networking, power, and cooling — may capture more durable value than the AI model companies themselves. Infrastructure companies sell to all AI builders, face less winner-take-all competition, benefit from high barriers to entry, and have recurring demand regardless of which specific AI models or applications succeed.

Why is VAST Data significant for AI infrastructure?

VAST Data has built a unified storage platform that addresses the unique data requirements of AI workloads. Traditional storage architectures separate file storage, object storage, and databases into different systems, creating data silos and operational complexity. VAST unifies these into a single platform that can handle the massive sequential reads, random access patterns, and high-throughput writes that AI training demands. Their customer list includes major AI companies, and their growth rate reflects the urgency of the storage challenge in AI infrastructure.

How did CoreWeave go from crypto mining to a $35 billion AI cloud company?

CoreWeave was founded in 2017 as a cryptocurrency mining operation, accumulating significant GPU infrastructure for mining. As cryptocurrency mining margins declined and AI compute demand surged, CoreWeave pivoted to providing GPU cloud services optimized for AI workloads. Their infrastructure — designed for dense GPU deployments with InfiniBand networking and Kubernetes-native orchestration — proved well-suited for AI training and inference. Long-term contracts with major AI companies (reportedly including Microsoft) and the explosive growth in AI compute demand drove their valuation to approximately $35 billion.

What role does Broadcom play in AI infrastructure?

Broadcom plays two critical roles. First, their networking ASICs (Tomahawk and Jericho product lines) power the switches that connect GPU clusters, storage systems, and broader data center networks. As AI clusters grow, networking requirements grow quadratically, driving demand for Broadcom's switching chips. Second, Broadcom designs custom AI silicon for major tech companies that want chips optimized for their specific workloads — an alternative to NVIDIA's general-purpose GPUs. Their $69 billion VMware acquisition adds enterprise infrastructure software to the mix, positioning Broadcom as a comprehensive AI infrastructure provider.