Here is a decent diagram from the non-login side of Sunsolve that shows an exploded view of the E25k, with parts labeled.
Note that the IO Board is labeled “hsPCI Assembly.” The original diagram can be found on this page.
Here is a decent diagram from the non-login side of Sunsolve that shows an exploded view of the E25k, with parts labeled.
Note that the IO Board is labeled “hsPCI Assembly.” The original diagram can be found on this page.
I am going to attempt to explain a little bit about system administration on Sun Microsystems current high-end
server offerings, the SunFire 15000 and the SunFire E25k. They are mostly the same hardware. The primary exception is the presence of UltraSPARC IV dual-core processors in the E25k, while the 15k
uses the older, single-core UltraSPARC III chips.
You may have noticed the difference in nomenclature. 15000 versus E25k. This is primarily a hold-over from Sun’s previous generation high-end system, the Enterprise 10000. This system was typically called the “E10k,” though the badges on the front read “Enterprise 10000.” When the SunFire moniker came around, Sun dropped the “Enterprise” nomenclature completely. Still, most people couldn’t get out of the habit of calling the SunFire 15000 “E15k.” Going forward with the E25k, Sun as much as admitted defeat, and simply called the system “E25k.” It is much easier to say anyway.
I’m writing this article mostly from memory, however I may paste in a few sanitized examples from running hardware. As far as I know, I’m not revealing any information that you couldn’t otherwise get from going to http://docs.sun.com and looking up the appropriate information.
First and foremost, I’d like to say that these machines are impressive. They are large. They are loud. They
have a lot of blinking lights. They use a lot of power.
The E25k is newer and more capable, so I will list its specs instead of the 15k. However, the 15k specs are not much different.
| Maximum Number of CPUs | 72 dual-core UltraSPARC IV+ |
| CPU Speed | 1.5 GHz |
| L2 Cache Per CPU | 2 MB |
| L3 Cache Per CPU | 32 MB |
| Maximum RAM | 1152GB |
| Power Supplies | 12 ea 30A Single-Phase 200 – 240 VAC |
| Input Power | Approx 26,000 W |
| Heat Output/Cooling Required | Approx 90,000 BTU/h |
| Weight | 2513.7 lb. |
… and that is just the system cabinet. That configuration contains no disks to speak of…
There are actually 4 small disks, but they are not directly useable by the system. I’ll explain later…
The proper term for the system cabinet is “platform,” though we often use the word “frame.” Sun uses “platform” in their documentation, so I will do the same.
The platform has 18 slots for “expander” boards. Nine in the front, and nine in the back. The brain of the box is the cross-bar switched centerplane. The centerplane is properly named the “Sun Fireplane Interconnect,” and is pretty amazing on its own. The peak bandwidth of the centerplane is 43 Gigabytes per second (gigabytes NOT gigabits).
The centerplane board was one of the biggest weaknesses of the older E10k design. E10k “system” boards, containing CPU, memory, and I/O were plugged directly into the E10k centerplane. We had to change the centerplane in one E10k twice, due directly to bent pins on the centerplane. Both times, it was a major operation, and required expenditure of the biggest resource these machines are designed to prevent: downtime.
The E25k expander boards (EX) solve this problem by acting as a sort of middle board. The expander plugs into the center plane. The system boards (SB), containing CPU and RAM and the I/O boards (IO) containing 4 individual PCI cassettes each, plug into the expander boards.
Expander boards in the platform are numbered EX0 – EX17. System boards in the platform are numbered SB0 – SB17. I/O boards in the platform are numbered IO0 – IO17.
The way it works is the IO0 and SB0 both plug into EX0, which plugs into the centerplane. EX17 is directly behind SB0. The slots are numbered from right-to-left.
FRONT BACK<---------------------------------------->SB0 SB17 ---> EX0 ---> CP <--- EX17 <---IO0 IO17<---------------------------------------->
Each SB holds 4 x dual-core CPUs and up to 64 GB of RAM.
Each IO board contains 2 independant I/O controllers. Each controller provides two PCI busses. So, there are 4 PCI slots on each IO board. Each PCI slot is in the form of a cassette that you caneject (individually) while the system is running (with some limitations).
A Dynamic System Domain is a physically distinct Solaris system carved from inside the platform.
It is common to just call them “domains.” Each domain requires:
A domain can be as small as one system board, or as many as 18. This also means that you can have a minimum of one domain in the platform and a maximum of 18… If you wanted all 72 CPUs (144 cores) in one big system, you would make a single big domain.
The reason they’re called dynamic, is that (with some limitations) you can add or remove boards to a running domain without shutting it down. When you need more CPUs, you simply attach another board to the domain. This is achieved through a process called Dynamic Reconfiguration.
This brings us to one limitation of the E25k. Domain granularity. You have to keep all of the memory and CPUs on a single SB in the same domain. You cannot split the system board. The same goes for IO boards. All 4 PCI slots MUST go into the same domain. You cannot allocate 2 PCI slots for DOMAIN A and 2 PCI slots for DOMAIN B.
Each platform has two built-in System Controllers (SC0 and SC1). They are configured to be redundant. One is called MAIN and the other is SPARE. You can force fail-over between them if you need to work on one for any reason. The SCs are basically Sun Ultra10 workstations, built into the platform. There are special slots in the centerplane for the SCs, one in front, one in back. They handle things like:
I’ve given a brief overview of some of the E25k basics, but we’ve only scratched the surface. In Part II, I’ll go
into more detail of how to use the System Controllers to perform several platform operations. You can find more information at Sun’s web site http://www.sun.com or at Sunsolve. Sunsolve may require a service contract to see the Sun System Handbook these days, but it might not. It would be worth a try anyway because the Sun System Handbook is full of useful information.
That when you’re working on an Enterprise class Solaris system, you should check that you’re booting off of the disk you really think you’re booting off of.
I ran into this problem while patching an old E10k domain. During the patch process, the first thing we do is to take a snap-shot of many system configuration files (such as /etc/vfstab) and capture the output of running several commands (such as vxprint -htA).
The second thing that we do is to break the root mirror by detaching the plex called “rootvol-02.” This un-patched copy of root gives us a path to fall-back to, if things go terribly wrong. Granted, before using such a copy, we would have to boot from the network in order to manually un-encapsulate that copy… Perhaps we should be manually un-encapsulating it before we patch, but I digress…
I patched the machine as normal, and rebooted. Then, “panic” set in:
WARNING: Error writing ufs log stateWARNING: ufs log for / changed state to ErrorWARNING: Please umount(1M) / and run fsck(1M)WARNING: Error writing master during ufs log rollWARNING: ufs log for / changed state to ErrorWARNING: Please umount(1M) / and run fsck(1M)Cannot mount root on /pseudo/vxio@0:0 fstype ufs panic[cpu24]/thread=140a000: vfs_mountroot: cannot mount root 0000000001409970 genunix:vfs_mountroot+70 (0, 0, 0, 200, 145ba30, 0) %l0-3: 000000000144f400 000000000144f400 0000000000002000 0000000001496690 %l4-7: 000000000149c400 0000000001412e80 000000000144fc00 0000000001452c000000000001409a20 genunix:main+90 (1409ba0, f105bd68, 1409ec0, 38f84d, 2000, 350) %l0-3: 0000000000000001 000000000140a000 0000000001414028 0000000000000000 %l4-7: 0000000078002000 0000000000392000 00000000014a41a0 00000000010688c8 skipping system dump - no dump device configuredrebooting...Resetting...
Which is what you normally see if you try to boot from a stale VxVM mirror copy, cause by the following:
1. The plex has been dis-associated from the volume “rootvol”
2. /etc/vfstab still references /dev/vx/dsk/rootvol as the root device
3. /etc/system still has all sorts of references to root liviing on a VxVM device.
And this struck me as odd, since the system should have booted from the patched root plex. You know, the one that was still valid. Before I went into panic mode, I decided to poke around a little, and discovered that we just had some wires crossed:
SUNW,Ultra-Enterprise-10000, using Network ConsoleOpenBoot 3.2.181, 4096 MB memory installed, Serial #10921789.Ethernet address 0:0:be:a6:a7:3d, Host ID: 80a6a73d. ok devaliasvx-disk02 /sbus@58,0/QLGC,isp@0,10000/sd@4,0:avx-disk01 /sbus@58,0/QLGC,isp@0,10000/sd@0,0:adisk /sbus@5d,0/SUNW,socal@1,0/sf@1,0/ssd@0,0:anet /sbus@5d,0/SUNW,qfe@0,8c10000ttya /ssp-serialssa_b_example /sbus@40,0/SUNW,soc@0,0/SUNW,pln@b0000000,XXXXXX/SUNW,ssd@0,0:assa_a_example /sbus@40,0/SUNW,soc@0,0/SUNW,pln@a0000000,XXXXXX/SUNW,ssd@0,0:aisp_example /sbus@40,0/QLGC,isp@0,10000/sd@0,0net_example /sbus@40,0/qec@0,20000/qe@0,0name aliases ok printenv boot-deviceboot-device = vx-disk01 net
A cursory review of the pre-mirror-split vxprint -ht output showed this:
dm disk01 c0t0d0s2 sliced 2888 71124291 -dm disk02 c0t4d0s2 sliced 2888 71124291 - v rootvol - ENABLED ACTIVE 62737524 ROUND - rootpl rootvol-01 rootvol ENABLED ACTIVE 62737524 CONCAT - RWsd disk02-01 rootvol-01 disk02 0 62737524 0 c0t4d0 ENApl rootvol-02 rootvol ENABLED ACTIVE 62737524 CONCAT - RWsd disk01-01 rootvol-02 disk01 0 62737524 0 c0t0d0 ENA
So, out boot disk is vx-disk01 == c0t0d0s2 == rootvol-02. Which is the stale copy we split-off!
The fix was simple. All I had to do was this:
ok boot vx-disk02 -sBoot device: /sbus@58,0/QLGC,isp@0,10000/sd@4,0:a File and args: -s,orry, variable 'scsi_option' is not defined in the 'kernel' SunOS Release 5.9 Version Generic_118558-21 64-bitCopyright 1983-2003 Sun Microsystems, Inc. All rights reserved.Use is subject to license terms.
I really didn’t need the “-s” argument on the end, but it gave me the opportunity to stop the boot half-way up if I needed to.
The moral of the story is that part of being a good systems administrator is preparation. It is very easy to start working on a system, thinking “Oh, I don’t need to make backups of this or that setting/config file, because I’m just making minor changes.” Seeing this message could easily have led me to believe that the root filesystem was currupted to the point of needing to restore the machine from tape. It could have lead to hours of down-time, and an irate customer. But, since I’d prepared for things like that to go wrong, I had the system up and running again at the cost of only one extra reboot.