Intro
When we last left-off talking about the E25k, I’d gone over the questions of what is the thing, what’s goes into it, how much does it weigh, how fast is it, etc. Hopefully, this time around, I’ll be able to explain a in a bit more detail about how you run the thing. I’m not going to tell you how to do the initial platform setup. Normally, Sun charges customers quite a bit of money for this service. I’ve done it three times without them, but I like their hardware, and I want to see them continue to operate as a business. So, if you want to know how to get away without paying Sun for the install, good luck to you, but I won’t help.
The Sun Documentation
Some times, Sun likes to hide things from me. The dynamic nature of the World Wide Web is its most cussed feature. Everything can change in the blink of an eye, and nothing is ever where you saw it last. The normal Sun document repository only has documentation for SMS versions through 1.2 in the obvious place. The current version is 1.6. No mention is even made of the E25k.
Lest you fall into maddening despair, the documents in question are located here for now, and are available in many languages which you don’t speak.
Domains
One of the things about domains that I forgot to mention last time is what they aren’t. Domains are not virtual machines like VMWare. They are not software partitions like Sun’s Solaris 10 Zones. Domains are more like hardware partitions. Each domain requires that you dedicate hardware resources (like SBs, IO boards, network cards, disk drives, etc) to that domain, and only that domain. You cannot share a 4 CPU System Board between two domains.
Also, the 15k – 25k systems handle domains a little differently from the E10k. On the E10k, you could create up to 16 domains. Each domain had some associated house-keeping data on the SSP that it needed, including a firmware image. This firmware image had to be generated at the time of domain creation, and sometimes you had to call Sun to get this generation to work correctly.
The 15k – 25k systems are capable of 18 domains, all of which are configured at the factory. You never have to “create” a domain. It already exists. This is achieved by the SMS software creating firmware images and the other associated house-keeping stubs for domains labeled A – R. Again, these domains always exist, even if there are no boards assigned to them. Obviously, you can’t boot a domain that doesn’t have the necessary hardware (System Board, IO Board, Network Card, SCSI Interface connected to at least one hard disk).
I1 and I2 networks
There are two built-in networks that are internal to the platform. They are called the I1 Management Network and I2 Management network. I1 is used for the System Controllers to communicate house-keeping data with the individual domains. Each SC has an IP address in the I1 range, and each domain has an IP address in the I1 range. The I2 network is reserved for house-keeping data that passes from System Controller to System Controller. Each of the two SCs has an IP address in the I2 range. It is sufficient to use RFC 1918 Private Addresses for both of these ranges.
The Sun Fire E25K/E20K Systems Site Planning Guide contains a nice worksheet for you to plan your network and domain layout.
On to the actual commands!
Platform control is performed by logging onto the System Controller via Secure Shell (SSH), and issuing the appropriate commands. This shouldn’t come as any surprise to those of you who are already UNIX systems people, but the E25k is a UNIX system. You don’t get a GUI because you really don’t need a GUI to get your work done. The hostview GUI that was available on the E10k is gone. It never worked well to begin with.
showplatform
As I have said before, domains are collections of System Boards and IO Boards. We will use two main commands to view platform status, showplatform and showboards.
The output from showplatform is quite verbose, so I will trim some of it:
$ showplatformPLATFORM:=========Platform Type: Sun Fire E25K
CSN:====Chassis Serial Number: xxxxxxxxxx
COD:====Chassis HostID: xxxxxxxxxxxxxProc RTUs installed: 0PROC Headroom Quantity: 0Proc RTUs reserved for domain A: 0Proc RTUs reserved for domain B: 0Proc RTUs reserved for domain C: 0...
Available Component List for Domains:=====================================Available Component List for domain spiderman: No System boards No IO boards
Available Component List for domain batman: No System boards No IO boards...
Domain Ethernet Addresses:==========================Domain ID Domain Tag Ethernet AddressA spiderman 0:0:be:ff:ff:58B batman 0:0:be:ff:ff:59C superman 0:0:be:ff:ff:5aD hulk 0:0:be:ff:ff:5bE zaphod 0:0:be:ff:ff:5cF tardis 0:0:be:ff:ff:5dG montmorency 0:0:be:ff:ff:5eH yoda 0:0:be:ff:ff:5fI tick 0:0:be:ff:ff:60J spoon 0:0:be:ff:ff:61K wallace 0:0:be:ff:ff:62L gromit 0:0:be:ff:ff:63M crabtree 0:0:be:ff:ff:64N zelda 0:0:be:ff:ff:65O link 0:0:be:ff:ff:66P mario 0:0:be:ff:ff:67Q peach 0:0:be:ff:ff:68R - 0:0:be:ff:ff:69
Domain configurations:======================Domain ID Domain Tag Solaris Nodename Domain StatusA spiderman spiderman Running SolarisB batman batman Running SolarisC superman superman Running SolarisD hulk hulk Running SolarisE zaphod - Keyswitch StandbyF tardis tardis Running SolarisG montmorency montmorency Running SolarisH yoda - Keyswitch StandbyI tick tick Running SolarisJ spoon - Keyswitch StandbyK wallace wallace Running SolarisL gromit gromit Running SolarisM crabtree - Powered OffN zelda zelda Running SolarisO link - Keyswitch StandbyP mario mario Running SolarisQ peach peach Running SolarisR - - Powered Off
The most interesting parts of this are the second section and the last two sections. The second section lists the chassis serial number. This is very useful when you have to call Sun about a problem with your E25k. The second-to-last section shows that there are Ethernet MAC addresses assigned to a
ll domains A – R, even though domain R hasn’t really been configured.
The last section shows the status of each domain, its domain “Tag,” and its Solaris hostname. The domain tag is an alias to the domain letter name. It’s not always easy to refer to the domains by their letter name, so we can name them something more convenient with the addtag command. There is no requirement that the domain tag be the same as the Solaris nodename. We could, for instance change the domain tag of domain “A” to “production” and the Solaris nodename column would still show “spiderman.”
showboards
Often, it is helpful to find out which system boards are assigned to which domain. We have the showboards command for that:
$ showboardsRetrieving board information. Please wait.Location Pwr Type of Board Board Status Test Status Domain-------- --- ------------- ------------ ----------- ------SB0 On CPU Active Passed tickSB1 On CPU Active Passed marioSB2 On CPU Active Passed peachSB3 On CPU Active Passed zeldaSB4 Off CPU Assigned Unknown spoonSB5 On CPU Active Passed gromitSB6 On CPU Active Passed wallaceSB7 Off CPU Assigned Unknown spoonSB8 On CPU Active Passed montmorencySB9 On CPU Active Passed tickSB10 On CPU Active Passed tardisSB11 On CPU Active Passed montmorencySB12 On CPU Active Passed tardisSB13 On CPU Active Passed montmorencySB14 On CPU Active Passed hulkSB15 On CPU Active Passed supermanSB16 On CPU Active Passed batmanSB17 On CPU Active Passed spidermanIO0 On HPCI+ Active Passed tickIO1 On HPCI+ Active Passed marioIO2 On HPCI+ Active Passed peachIO3 On HPCI+ Active Passed zeldaIO4 Off HPCI+ Assigned Unknown crabtreeIO5 On HPCI+ Active Passed gromitIO6 On HPCI+ Active Passed wallaceIO7 On HPCI+ Assigned Unknown spoonIO8 On HPCI+ Assigned Unknown supermanIO9 On HPCI+ Active Passed tickIO10 Off HPCI+ Assigned Unknown yodaIO11 On HPCI+ Active Passed montmorencyIO12 On HPCI+ Active Passed tardisIO13 On HPCI+ Assigned Unknown zaphodIO14 On HPCI+ Active Passed hulkIO15 On HPCI+ Active Passed supermanIO16 On HPCI+ Active Passed batmanIO17 On HPCI+ Active Passed spiderman
Sometmes, it is more helpful to have this table sorted by domain name, so with a little bit of finesse, we get the following:
$ showboards | grep "^SB" |awk '{print $NF, $1}' | sortbatman SB16gromit SB5hulk SB14mario SB1montmorency SB11montmorency SB13montmorency SB8peach SB2spiderman SB17spoon SB4spoon SB7superman SB15tardis SB10tardis SB12tick SB0tick SB9wallace SB6zelda SB3
Dynamic Reconfiguration
We can see from this output that there are several domains with multiple SBs assigned. This is one of the strengths of the platform. Using Dynamic Reconfiguration (DR), we can do things like add CPUs and RAM to a system that is bogged down, while the system is running. By adding IO boards, we can add multiple paths to disks, or extra network interface cards, etc.
These operations are accomplished through three commands: addboard, deleteboard, and moveboard. Here is the output of a moveboard command that combines the functionality of deleteboard and addboard. In this case, we will remove the board from domain G (montmorency), and add it to domain A (spiderman) while both domains are running. Since we know that montmorency has three system boards currently assigned to it, we won’t (usually) interrupt domain functionality to it when we remove the board.
$ moveboard -c configure -d spiderman SB11request delete capacity (4 cpus)request delete capacity (2097152 pages)request delete capacity SB11 donerequest offline SUNW_cpu/cpu352request offline SUNW_cpu/cpu353request offline SUNW_cpu/cpu354request offline SUNW_cpu/cpu355request offline SUNW_cpu/cpu352 donerequest offline SUNW_cpu/cpu353 donerequest offline SUNW_cpu/cpu354 donerequest offline SUNW_cpu/cpu355 doneunconfigure SB11unconfigure SB11 donenotify remove SUNW_cpu/cpu352notify remove SUNW_cpu/cpu353notify remove SUNW_cpu/cpu354notify remove SUNW_cpu/cpu355notify remove SUNW_cpu/cpu352 donenotify remove SUNW_cpu/cpu353 donenotify remove SUNW_cpu/cpu354 donenotify remove SUNW_cpu/cpu355 donenotify capacity change (4 cpus)notify capacity change (2097152 pages)notify capacity change SB11 donedisconnect SB11disconnect SB11 donepoweroff SB11poweroff SB11 doneSB11 disconnected from domain: GSB11 unassigned from domain: GSB11 assigned to domain: Aassign SB11assign SB11 donepoweron SB11poweron SB11 donetest SB11test SB11 doneconnect SB11connect SB11 doneconfigure SB11configure SB11 donenotify online SUNW_cpu/cpu352notify online SUNW_cpu/cpu353notify online SUNW_cpu/cpu354notify online SUNW_cpu/cpu355notify add capacity (4 cpus)notify add capacity (2097152 pages)notify add capacity SB11 done
DR isn’t perfect. Far from it. It really works, but there are a few things to look out for. Primarily, I’ve never seen an addboard operation fail. You can always add to a hot domain. However, I’ve often seen a deleteboard operation fail. Some times, Solaris has memory allocated that it doesn’t want to turn over. Some times, it can turn the memory over, but only after you quiesce the domain (basically, it freezes the domain for 5 minutes or so while it moves the locked memory to another SB). While a quiescent domain is technically up, it isn’t really running. If your domain is a database server, your application servers that depend on it for operation may give up by then, which is the same thing as “downtime,” but Sun likes to pretend it isn’t. If the DR operation will require you to quiesce a domain, moveboard or deleteboard will warn you ahead of time.
IO Boards are particularly difficult to remove. Some times Veritas Volume Manager grabs hold of a disk drive that you don’t want it to, and will not let it go. Some times, you have plumbed-up an Ethernet interface and forgotten about it.
The most fool-proof way to perform a deleteboard is to shut do
wn the domain from which you wish to remove the board first, then issue the deleteboardcommand. In order to accomplish this, the domain’s virtual keyswitch must bet set to either “Off” or “Standby.”
setkeyswitch
Each domain is equipped with a virtual keyswitch. The keyswitch has three settings:
| Keyswitch Setting
| Function |
| off
| SBs and IO boards are powered off. |
| standby
| SBs and IO boards are powered on, but system is still functionally “off.” |
| on
| System runs Power On Self Test, then OBP is loaded. Once OBP is loaded, system can be booted. $ setkeyswitch on is functionally equivalent to bringup on the E10k.
|
Console Access
Traditional UNIX servers typically use their serial port as the console device. This is not normally the case with UNIX workstations that usually have a keyboard, mouse, and monitor attached. But there are no serial ports on 25k SBs or IO boards. How, then do we connect to the consoles of our domains?
The answer is the console command. It works just like a normal serial console. Using the ~~# sequence is usually enough to dump the domain back to the ok> prompt, and ~~. disconnects you from the console session.
Conclusion
That is all I have time for now. Part 3 will be here soon.