Together AI adds enterprise-grade autoscaling, RBAC, observability dashboards, and self-healing node repair to GPU Clusters as company pursues $1B funding roundTogether AI adds enterprise-grade autoscaling, RBAC, observability dashboards, and self-healing node repair to GPU Clusters as company pursues $1B funding round

Together AI Upgrades GPU Clusters With Autoscaling and Self-Healing Features

2026/03/11 01:34
3 min read
For feedback or concerns regarding this content, please contact us at crypto.news@mexc.com

Together AI Upgrades GPU Clusters With Autoscaling and Self-Healing Features

Lawrence Jengar Mar 10, 2026 17:34

Together AI adds enterprise-grade autoscaling, RBAC, observability dashboards, and self-healing node repair to GPU Clusters as company pursues $1B funding round.

Together AI Upgrades GPU Clusters With Autoscaling and Self-Healing Features

Together AI has rolled out a significant infrastructure upgrade to its GPU Clusters platform, adding autoscaling, role-based access control, full-stack observability, and self-healing node repair capabilities. The enhancements arrive as the AI cloud company reportedly pursues $1 billion in fresh funding, according to reports from earlier this month.

The timing isn't coincidental. Enterprise customers running distributed training workloads across hundreds of GPUs need more than raw compute—they need infrastructure that doesn't require babysitting.

Autoscaling Targets GPU Waste

The new autoscaling feature, powered by the Kubernetes Cluster Autoscaler, monitors for GPU-constrained workloads and automatically provisions or decommissions nodes based on real-time demand. For teams running variable inference workloads or bursty training jobs, this means no more paying for idle hardware during quiet periods.

Static GPU provisioning has been a persistent pain point. Organizations either overprovision (expensive) or underprovision (performance bottlenecks during demand spikes). Together's approach lets clusters expand during peak load and contract when demand subsides.

Self-Healing Addresses Hardware Reality

GPU hardware fails. In large fleets, it's not a question of if but when. For distributed training, a single unstable node can invalidate hours of compute time.

Together's solution: self-serve health checks that users can trigger before launching major training jobs. Tests range from basic DCGM diagnostics to multi-node NCCL and InfiniBand bandwidth tests. When a node does fail, a three-click self-repair process automatically cordons, drains, and recreates the node—bringing clusters back to healthy status within minutes rather than hours.

Acceptance tests now run automatically during provisioning. Clusters won't be marked ready until they pass.

Enterprise Access Controls

The RBAC implementation introduces "Projects" as isolation boundaries for teams. Two default roles split responsibilities cleanly: Admins get full control plane access for cluster creation and deletion, while Members can access GPU worker nodes and run workloads without touching infrastructure provisioning.

This matters for organizations where platform engineers need to lock down infrastructure while giving ML researchers freedom to experiment.

Observability Gets Native

Every GPU Cluster project now includes a dedicated Grafana instance with pre-built dashboards. Telemetry covers GPU utilization via DCGM metrics, InfiniBand and NIC-level networking data, storage I/O performance, and Kubernetes orchestration health. The feature is currently in private preview.

Market Context

Together AI has been building momentum in the GPU-as-a-service space. The company launched self-service GPU infrastructure in September 2025 and introduced Instant GPU Clusters at NVIDIA GTC 2025 in March of that year. The platform supports NVIDIA Hopper (H100) and Blackwell (B200) GPUs, with Instant Clusters scaling up to 64 GPUs and Dedicated Clusters reaching 1,000 GPUs.

With a reported $7.5 billion market cap and a potential billion-dollar funding round in progress, Together is positioning itself as a serious alternative to hyperscaler GPU offerings—targeting teams that want bare-metal performance without the operational overhead of managing their own hardware.

The new features are available immediately to existing Together GPU Clusters customers.

Image source: Shutterstock
  • together ai
  • gpu infrastructure
  • ai computing
  • cloud infrastructure
  • enterprise ai
Market Opportunity
NodeAI Logo
NodeAI Price(GPU)
$0.02996
$0.02996$0.02996
-0.36%
USD
NodeAI (GPU) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact crypto.news@mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Stablecoins firm as Mastercard enables stablecoin settlement

Stablecoins firm as Mastercard enables stablecoin settlement

The post Stablecoins firm as Mastercard enables stablecoin settlement appeared on BitcoinEthereumNews.com. What Mastercard’s Crypto Partner Program is and how it
Share
BitcoinEthereumNews2026/03/12 10:44
South Africa launches HIV vaccine trial

South Africa launches HIV vaccine trial

South Africa HIV vaccine trial efforts are advancing after researchers launched the first locally developed HIV vaccine study on the continent.   South Africa expands
Share
Furtherafrica2026/03/12 09:30
UK Looks to US to Adopt More Crypto-Friendly Approach

UK Looks to US to Adopt More Crypto-Friendly Approach

The post UK Looks to US to Adopt More Crypto-Friendly Approach appeared on BitcoinEthereumNews.com. The UK and US are reportedly preparing to deepen cooperation on digital assets, with Britain looking to copy the Trump administration’s crypto-friendly stance in a bid to boost innovation.  UK Chancellor Rachel Reeves and US Treasury Secretary Scott Bessent discussed on Tuesday how the two nations could strengthen their coordination on crypto, the Financial Times reported on Tuesday, citing people familiar with the matter.  The discussions also involved representatives from crypto companies, including Coinbase, Circle Internet Group and Ripple, with executives from the Bank of America, Barclays and Citi also attending, according to the report. The agreement was made “last-minute” after crypto advocacy groups urged the UK government on Thursday to adopt a more open stance toward the industry, claiming its cautious approach to the sector has left the country lagging in innovation and policy.  Source: Rachel Reeves Deal to include stablecoins, look to unlock adoption Any deal between the countries is likely to include stablecoins, the Financial Times reported, an area of crypto that US President Donald Trump made a policy priority and in which his family has significant business interests. The Financial Times reported on Monday that UK crypto advocacy groups also slammed the Bank of England’s proposal to limit individual stablecoin holdings to between 10,000 British pounds ($13,650) and 20,000 pounds ($27,300), claiming it would be difficult and expensive to implement. UK banks appear to have slowed adoption too, with around 40% of 2,000 recently surveyed crypto investors saying that their banks had either blocked or delayed a payment to a crypto provider.  Many of these actions have been linked to concerns over volatility, fraud and scams. The UK has made some progress on crypto regulation recently, proposing a framework in May that would see crypto exchanges, dealers, and agents treated similarly to traditional finance firms, with…
Share
BitcoinEthereumNews2025/09/18 02:21