


 冬日暖陽(yáng)2024 2024-06-22 發(fā)布于內(nèi)蒙古



Precision cooling at the chip level

January 30, 2024 

By  Sebastian Moss 

譯 者 說

AIDC時(shí)代,算力基礎(chǔ)設(shè)施對(duì)能源的渴望會(huì)愈發(fā)迫切,算力芯片的散熱會(huì)逐步從風(fēng)冷轉(zhuǎn)向液冷。GB200 NVL72的面世,無疑會(huì)加速業(yè)界液冷生態(tài)的日趨完善。




JetCool's CEO on how thousands of tiny jets could be coming to a data center near you


As chip temperatures and rack densities rise, a plethora of companies have come forward to pitch their vision of the future.


Cooling demands of artificial intelligence and other high-density workloads have outstripped the capabilities of air systems, requiring some form of liquid cooling.

“當(dāng)你思考液冷的前景時(shí),我們會(huì)看到三種不同的技術(shù)類別,”JetCool 首席執(zhí)行官Bernie Malouin 解釋道。

“When you think about the landscape of liquid cooling, we see three different technical categories,” JetCool CEO Bernie Malouin explained.


“There’s single phase immersion, dipping in the oil. And that's interesting, but there are some limitations on chip power - for a long time, they've been stuck at 400W. There are some that are trying to get that a little bit better, but not as much as is needed.”



The second category is two-phase dielectrics: “We see those handling the higher [thermal design point (TDP)] processors, so those can get to 900-1,000W. Those are fit technologically for the future of compute, but they’re held back by the chemicals.”


Many two-phase solutions use perfluoroalkyl substances (PFAS), otherwise known as forever chemicals, which are linked to human health risks, and face restrictions in the US and Europe. Companies like ZutaCore have pledged to shift to other solutions by 2026, but the move has proved slow.

Malouin 說:“許多客戶所關(guān)心諸如此類的問題,因?yàn)閾?dān)心這類液體工質(zhì)的安全性,故而向JetCool尋求解決辦法”?!翱蛻魮?dān)心這類液體工質(zhì)的供應(yīng)可持續(xù)性?!?/span>

“It’s a concern for a lot of our customers, they're coming to us instead because they're worried about the safety of those fluids,” Malouin said. “They're concerned about the continued availability of those fluids.

然后是第三類:直接冷板式液冷(DLC,Direct Liquid Cooling)。我們是其中之一公司,也有其他一些公司在做類似產(chǎn)品。

And then there’s the third category: Direct Liquid Cooling (DLC) cold plates. “We’re one of them,” Malouin said. “There are others.”


DLC cold plates are one of the oldest forms of IT liquid cooling - simply shuttling cold liquid to metal plates mounted directly on the hottest components. They have long been used by the high-performance computing community, but JetCool believes that the concept is due for a refresh.

其冷卻噴嘴不是讓流體流經(jīng)表面,而是直接將流體工質(zhì)輸送至芯片表面?!癑etCool直接與主要芯片制造商合作,如英特爾、AMD、Nvidia 等,將由一千個(gè)微型流體噴射器組成的陣列,通過智能地布置分配,給特定處理器上的熱源散熱?!?/span>

Instead of passing fluid over a surface, its cooling jets route fluid directly at the surface of a chip. “We have these arrays of a thousand tiny fluid jets, and we work directly with the major chipmakers - the Intels, AMDs, Nvidias - and we intelligently landscape these jets to align to where the heat sources are on a given processor.”

Malouin 表示,“微型對(duì)流冷卻方法并不是將整個(gè)芯片視為一個(gè)具有單一冷卻要求的整體,而是嘗試平衡不同的熱負(fù)荷以及芯片堆棧特定部分的不同散熱要求”。

Rather than treating the entire chip as a whole with a singular cooling requirement, the microconvective cooling approach “tries to balance the disparate heat loads, disparate thermal requirements of certain parts of that chip stack,” Malouin said.

“當(dāng)您開始考慮真正集成的封裝時(shí),芯片核心或許能夠運(yùn)行在更高的溫度,但隨后您可能會(huì)配置高帶寬內(nèi)存(HBM) 部分,這些部分雖然功耗不高,但溫度限制較低。”

“When you start thinking about really integrated packages, the cores themselves might be able to run a little higher temperature, but then you might have high bandwidth memory (HBM) sections that aren't as power hungry, but have a lower temperature limit.”


每個(gè)組件部分可實(shí)現(xiàn)差異化的冷卻速度,而不是嘗試針對(duì)高功率核心和溫度敏感的HBM進(jìn)行設(shè)計(jì)?!斑@讓你能夠?qū)⑦@些部分分開,并在需要的地方進(jìn)行精確冷卻,”Malouin 說。

Instead of trying to design for the high-power cores and the temperature-sensitive HBM, each section can be cooled at a slightly different rate. “This allows you to decouple those things and allows you to have precision cooling where you need,” Malouin said.


While Malouin believes that facility-level liquid cooling is the future of data centers, the company also has a self-contained system for those looking to dip their toe in cooler waters, with a Dell partnership focused on dual socket deployments.


Two small pumping modules provide the fluid circulation and an air heat exchanger ejects heat at the other end of the Smart Plate system.

“當(dāng)我們添加這些泵時(shí),會(huì)增加一些電力消耗,但不需要風(fēng)扇在較高的轉(zhuǎn)速下運(yùn)行,因此可以使噪音降低15-20分貝。在關(guān)掉泵時(shí),每臺(tái)服務(wù)器會(huì)消耗大約 100 瓦的功率?!?Malouin 聲稱。

'When we add these pumps, you add some electrical draw, but you don't need the fans to be running nearly as hard, so it makes it 15-20 decibels quieter - and in net, we pull out about 100W per server after we've taken the penalty off of the pumps,' Malouin claimed.


When you go to 10 racks or more, going to the facility level makes more sense, he said. Asked about the preferred inlet temperature, Malouin said the system was flexible but added, 'we actually really like the warm fluids.'

他說:“當(dāng)前的設(shè)施為我們提供 60°C (140°F) 及以上的進(jìn)口冷卻溫度。而且我們?nèi)栽跐M負(fù)荷的情況下冷卻這些設(shè)備。”這種情況目前還不常見,但Malouin相信,由于熱能再利用的潛力,溫暖的海水將在歐洲等地越來越受歡迎。

He said: 'We have facilities today that are feeding us inlet cooling temperatures that are 60°C (140°F) and over. And we're still cooling those devices under full load.' That's not common just yet, but Malouin believes that warmer waters will grow in popularity in places like Europe due to the heat reuse potential.

在美國(guó),JetCool是能源部 COOLERCHIPS 項(xiàng)目的一部分,該項(xiàng)目旨在大幅改進(jìn)數(shù)據(jù)中心冷卻系統(tǒng)。

Back in the US, the company is part of the Department of Energy's COOLERCHIPS project, aimed at dramatically advancing data center cooling systems.

JetCool獲得100萬美元以上獎(jiǎng)項(xiàng)的重點(diǎn)不僅在于冷卻潛力,還在于誘人的次要優(yōu)勢(shì):“我們已經(jīng)讓硅芯片本質(zhì)上提高了8%到10%的電效率,”Malouin 聲稱。

The focus of JetCool's $1m+ award is not just on the cooling potential, but a tantalizing secondary benefit: 'We have instances where we've made the silicon intrinsically between eight and 10 percent more electrically efficient,' Malouin claimed.


'That has nothing to do with the cooling system power usage, but with leakage.'

Malouin 并不是指冷卻系統(tǒng)的泄漏,而是指半導(dǎo)體泄漏電流的量子現(xiàn)象,它會(huì)嚴(yán)重影響芯片的性能。

Malouin doesn't mean leakage of the cooling system, but rather the quantum phenomenon of semiconductor leakage currents that can significantly impact a chip's performance.


The recent history of data center cooling has tended to assume that allowing temperatures to rise higher will save energy because less is used in cooling. 

結(jié)果,瑞典研究機(jī)構(gòu)RISE喬恩·薩默斯 (Jon Summers) 的研究發(fā)現(xiàn),硅中的漏電流限制了運(yùn)行溫度較高的收益。

Results, including research by Jon Summers at the Swedish research institute RISE, are finding that leakage currents in the silicon limit the benefits of running hotter.


'A big part of our COOLERCHIPS endeavor is to substantiate that through more rigorous scientific evidence and extrapolate it to different environments to see where it holds or where it doesn't go.'


Looking even further ahead, Malouin sees an opportunity to get deeper into the silicon. “In some cases, it might actually be integrated as an embedded layer within the silicon, and then coupling that to a system that's outside that's doing some heat reuse. When we think about that holistically, we think that there's a real opportunity for a step change in data center efficiency.”

該公司表示,目前它能夠支持Nvidia GPU 最高900W負(fù)載,并且目前正在冷卻使用1,500W功率的未公開的“定制”芯片?!白罱K,如果想在未來和現(xiàn)在都運(yùn)行生成式人工智能,就必須考慮液體冷卻?!?/span>

For now, the company says that it is able to support the 900W loads of the biggest Nvidia GPUs and is currently cooling undisclosed 'bespoke’ chips that use 1,500W.

“Ultimately, you're really going to have to look at liquid cooling if you want to run not just the future of generative AI, but if you want to run the now of generative AI.”

深 知 社


Carlson Chen

DKV(DeepKnowledge Volunteer)計(jì)劃成員



DKV(DeepKnowledge Volunteer)精英成員


    轉(zhuǎn)藏 分享 獻(xiàn)花(0



    請(qǐng)遵守用戶 評(píng)論公約

    類似文章 更多