So Far Away and Yet So Close: High Availability Meets Remote Management

Over the past week or so, Microsoft and some of its allies have put on
a strong push on promoting server enhancements geared to Windows crash
prevention. In a pitch to network managers, NEC, for example, is
positioning its emerging lineup of fault tolerant (FT) servers as an
alternative to clustering in any situation where high system uptime is
deemed “mission-critical.”

About a year from now, NEC will introduce a three-CPU FT server to the
US market, according to Mike Mitsch, NEC’s director of enterprise
computing. Available since last fall in one- and two-processor
configurations, the NEC Express5800/ft320La server comes with remote
management features that are “unique in the industry,” Mitsch
contended, during a presentation to Windows systems managers last
Thursday.

NEC is already selling a three-processor FT server in Japan. The US
edition will be a blade server. “We want to use the three-processor
architecture to differentiate our blade server from other (future)
blade servers,” he said.

NEC was also on hand earlier last week at Microsof’s WinHEC show,
along with the two other players now populating the small universe of
Intel-based FT hardware. Microsoft’s announcement of intentions to
produce Bluetooth-enabled peripherals drew the lion’s share of
industry notice at WinHEC, a trade show targeting hardware developers.

During conference sessions at WinHEC, though, Microsoft officials
talked about some of the issues that have long plagued network
managers, including blue screens and reboots.

Mario Garzia of Microsoft released results on a survey reportedly
conducted by Microsoft among 4,000 servers operated by 20
customers. Only 65 percent of all Windows NT reboots recorded in the
survey were planned reboots. Also according to the results, however,
the proportion of unplanned reboots shrank to 3 percent on Windows XP
servers.

“The perception is that software is the cause of most failures of
Windows systems. This is changing as the operating system is becoming
more reliable. Hardware is a more significant problem,” according
to Microsoft Program Manager Sandy Arthur, another speaker at WinHEC.

The Microsoft execs also acknowledged issues with the core OS, as well
as with application failures, third-party filter and device drivers,
hardware reconfigurations, antivirus software, and “operator
errors,” for instance

In his talk to network managers on Thursday, Mitsch also pointed to
problems in some Intel-based hardware, blaming “three-month”
design cycles as the big culprit. “There can be progressive
hardware degradation,” he said.

These days, many network managers think applications like e-mail, Web
access, and database application need 24/7 availability, he suggested.

Mitsch added that one of NEC’s customers used to try to keep Windows
NT from crashing by doing “preventive reboots” every night.

Also according to Mitsch, though, NEC’s servers manage to provide
high availability by combining elements such as a redundant
“lockstep” CPU architecture, “hardened” device drivers
from its partner Stratus, and “NEC’s own software” for local and
remote system management.

“Lockstepped” CPUs are synchronized to a single clock, and their
instruction streams are supposed to be the same. The 5800 also
features redundant I/0 ports, memory, hard disk drives, and power
supply components, in pedestal and rackmount form factors.

NEC released the one-CPU version of the 5800 in the US last September,
and the two-CPU edition in November. Both these machines were also
rolled out in the Japanese market first.

“NEC and Stratus co-developed the fault tolerant server. Stratus
produced the overall design. NEC manufactures the CPUs,” said
Mitsch. The 5800 uses 800 MHz Pentium III processors.

Marathon produces a competing FT Intel-based server, the Endurance
6200L, which uses two CPUs in a four-server crossbar configuration.

For its part, Stratus sells the ftServer 3200, a machine almost
identical to the NEC Express5800. So far. Stratus has concentrated
mainly on existing FT markets such as finance, government, and
telecommunications. Outside of its long-time line-up of Unix FT
servers, Stratus also offers the ftServer 5200, a larger Intel-based
FT server which uses Pentium III Xeon processors.

NEC, though, is eyeing a number of industries that are new to the FT
concept, including retail and small business, for example. The chief
difference between the Express5800 and Stratus’ ftServer 3200 is on
the service side.

Stratus is bundling its own remote monitoring and management services
with ftServer 3200. The 3200 monitors its own operations, reporting
any exceptions to a customer assistance center. Service
representatives there use the company’s Stratus Service Network (SSN)
for remote trouble-shooting and management.

With its FT server, on the other hand, NEC is rolling in system
management software designed for use by other management service
out-sourcers, as well as by corporate Windows administrators.

For one thing, system management and maintenance can be performed over
either one or both of the 5800’s dual redundant links, according to
Mitsch. From remote locations, administrators can upgrade firmware or
switch between CPUs, for example. SNMP-compliant network management
software is included, too.

NEC, Stratus and Marathon are all claiming “five nines” (99.999
percent) uptime for their products, in contrast to the 99.9 percent
uptime often attributed to clusters. Mitsch maintained that the
Express5800 copies memory between processors while the system is
running, for virtually uninterrupted service. According to an IDC
white paper, server downtime of more than 5 minutes per year (the
level associated with 99.9 percent availability) is unacceptable
to 80 percent of IT sites currently using clustering.

Two NEC customers, though, are actually using the 5800 in conjunction
with clustering, Mitsch said.

Meanwhile, many software programs still aren’t enabled for clustering,
according to Mitsch. “Microsoft SharePoint won’t be ready for
clustering for quite some time,” he told the network managers.

NEC’s current FT product runs Microsoft Windows 2000 Advanced
Server. “We are not sure yet where we’ll stand with the .NET
servers, because Microsoft plans to offer five different versions of
the OS,” he pointed out.

According to Mitsch, the forthcoming three-CPU model will use a
“voted” model, in which the least capable CPU in the trio will
be voted out. CPUs with lower error rates will be more likely to
withstand the vote, as will “older” CPUs. The third CPU will
remain available to step in, though, in case one of the other CPUs
goes down.

With current pricing that starts at about $17,000, the 5800 is much
less costly than FT systems from Unix competitors. Still, some users
remain doubtful that any Windows system can provide the uptime they
need, based on their own prior experiences with NT.

“Is there any way I can just send up all my crash reports to
Microsoft by default?” asked one network manager.

“What’s the difference between now and the days of Windows NT?
Microsoft kept promising higher system availability all the time back
then, too,” another administrator observed.


»


See All Articles by Columnist
Jacqueline Emigh

Latest Articles

Follow Us On Social Media

Explore More