China Focus: Data center engineers working silently behind booming AI services-Xinhua

China Focus: Data center engineers working silently behind booming AI services

Source: Xinhua

Editor: huaxia

2026-05-04 19:02:45

HOHHOT, May 4 (Xinhua) -- As artificial intelligence (AI) increasingly integrates into people's daily lives, many have grown accustomed to its smooth services: large models answering questions in seconds, map navigation apps precisely planning routes, and AI assistants quickly ordering takeout...

Behind these services, taken for granted by many, lies the silent work of data center engineers. At the Horinger data center cluster on the southern outskirts of Hohhot in north China's Inner Mongolia Autonomous Region, for example, they stand watch day and night, acting like stewards to ensure that computing power remains as stable and accessible as tap water and electricity.

The Horinger data center cluster is one of China's top ten data center clusters, bringing together roughly 50 large-scale data centers from Huawei, China Mobile, big state-owned banks, and others. Its total computing power capacity has exceeded 125,000 petaflops, with intelligent computing power for AI accounting for 96 percent.

Inside the China Mobile Hohhot Data Center, the machine rooms still hum deep into the night. Here also sits a large liquid-cooled intelligent computing center, where training and inference for various cutting-edge large models take place. The servers housed in the racks employ cold-plate liquid cooling technology, in which cold plates filled with a special coolant sit directly against the chips to remove heat, consuming less energy than traditional air cooling.

Engineer Hou Xiaowen walks past rows of server racks, with her eyes scanning the indicator lights on the power supply cabinets, her ears tuned into the circulation sounds of the liquid cooling system. As an infrastructure network operation engineer, her responsibility is to coordinate the maintenance of power supply, cooling, and liquid cooling facilities to ensure the safe operation of IT services.

"Servers running at high speeds generate enormous heat, and cooling is the bottom line for ensuring stable operations," Hou said.

During each inspection round, her step count easily exceeds 10,000. But in her view, the most exhausting part isn't the walking, but handling sudden failures.

Once during a holiday, the data center experienced a power outage. Hou and her team immediately activated emergency protocols, ensuring stable output from generators and UPS power supplies while doing everything possible to maintain continuous cooling. In the end, the data center's operations emerged unscathed.

"We ensure the safety of data center infrastructure. Once power and cooling issues arise, they affect the entire IT operation," Hou explained. In her work, 24/7 monitoring and duty shifts are the norm, and her phone remains on year-round.

If Hou safeguards the "heart and lungs" of the data center, then computing network operation engineer Zhao Yifan manages its "brain", or the computing servers that carry large model training and AI inference.

"Large model training relies entirely on these servers. My job is to prevent anything from going wrong," Zhao said.

An intelligent computing cluster is massive in scale, complex in connectivity, and has numerous potential failure points, making root cause analysis exponentially more difficult. Large model training demands extreme continuity, and a single shutdown can cause tremendous losses.

Zhao said that he and his team prioritize preventive maintenance, identifying hidden risks in advance and conducting repairs during service gaps to minimize losses. Thanks to their work, the China Mobile Hohhot Data Center can reliably train trillion-parameter large models and has set a record of 22 consecutive days of uninterrupted training.

In his 14 years with the company, Zhao has witnessed the leapfrog development of the computing industry. From single machine room to scaled clusters, from traditional air cooling to liquid cooling, from computing-electricity coordination to green energy storage, data centers have become the core foundation of the digital economy.

"Our work constantly faces new challenges, but I don't resist them. You learn as you go, and once you've solved a complex failure, you've learned something new. It's very rewarding," Zhao added.

The computing network composed of multiple data centers also requires engineers' round-the-clock vigilance. At a computing resource monitoring and scheduling platform in Horinger, electronic screens in the hall display real-time data on computing loads, resource allocation, and cross-regional scheduling.

Lan Xiaoting leads a team of engineers to work in this hall. The scheduling platform has been integrated with similar platforms in Beijing, Wuhu, Guizhou, Chongqing, and other locations across the country. Computing service providers can sell computing power over the network, while buyers can select computing services as easily as they shop online. After settlement, the intelligent scheduling system automatically matches the most suitable computing supplier for delivery.

The precision of computing network scheduling exceeds imagination. The latency from Horinger to the Beijing-Tianjin-Hebei region must remain stable within 5 milliseconds. By comparison, a human blink takes 100 to 400 milliseconds.

To achieve this, Horinger has built 400-Gbps all-optical networks in Beijing, Hefei and other locations, with data reaching Beijing in 5 milliseconds and covering major cities nationwide in 20 milliseconds.

"Unified computing supply and sales make computing power as convenient and accessible as water and electricity," Lan said.

Most people don't know these engineers exist, and that means AI services are functioning normally. As Zhao puts it, as long as no one thinks of them, it proves the system is stable, and computing is flowing smoothly. That is the value and sense of achievement they hold dear.