It is allergy season in North Alabama.

And when allergy season arrives, I start taking Claratin and Sudafed to help convince my body that it does, if fact need to keep breathing, no matter how many unpleasant things are floating in the air. This is generally very effective in that I do manage to keep breathing. It does unfortunately carry the unfortunate side effect of making me constantly sleepy. I can never seem to get enough sleep during allergy season.

Now, I’m a tinkerer by nature. I like to take things apart, and with the exception of the sandwich maker, I typically like to put them back together again. Some things I modify to suit a better purpose, and some things get reprogrammed.

But what I really, really wanted to do at 06:45 this morning was to reprogram my alarm clock by means of two .45-caliber, 230-grain, Jacketed Hollow Point slugs.

Published in: on September 19, 2006 at 3:58 pm Leave a Comment

Simplicity is the hallmark of truth.

Those were the words of introduction from my first Computer Science instructor, Dr. Doran, on the first day of the first computer class I ever took.

Lately, I’ve been reading up on the Ruby programming language. I’ve done a few web projects with Ruby on Rails, but those have all been more like works of journalism (and sometimes, outright plagiarism) than technical prowess.

One of the things that Ruby and Rails promote is the idea that they will both do a lot of work for you for free if you simply follow their conventions. To clarify, here is a small snippet from the Pragmatic Programmers book Programming Ruby:

Ruby uses a convention to help it distinguish the usage of a name: the first characters of a name indicate how the name is used. Local variables, method parameters, and method names should all start with a lowercase letter or with an underscore. Global variables are prefixed with a dollar sign ($), while instance variables begin with an “at” sign (@). Class variables start with two “at” signs (@@). Finally, class names, module names, and constants should start with an uppercase letter.

Just a few pages before this paragraph, we find the following sentence:

Ruby syntax is clean.

This it says, is because you don’t have to end each statement with a semi-colon.

I am now convinced that Ruby is a Japanese plot to make me crazy. The trouble with all of these little “conventions” is how do you remember them all? I guess part of my problem is that I don’t write enough code to burn them into my brain. That is probably why I’m most effective writing Perl. There aren’t a whole lot of rules, and the ones that are there can usually be ignored, forgotten about, or broken with wild abandon. Not knowing enough about Perl to write it well or efficiently is usually not an impediment to getting something done with it.

Published in: on September 15, 2006 at 2:12 pm Leave a Comment

Sun High-End Server Administration Part II

Intro

When we last left-off talking about the E25k, I’d gone over the questions of what is the thing, what’s goes into it, how much does it weigh, how fast is it, etc. Hopefully, this time around, I’ll be able to explain a in a bit more detail about how you run the thing. I’m not going to tell you how to do the initial platform setup. Normally, Sun charges customers quite a bit of money for this service. I’ve done it three times without them, but I like their hardware, and I want to see them continue to operate as a business. So, if you want to know how to get away without paying Sun for the install, good luck to you, but I won’t help.

The Sun Documentation

Some times, Sun likes to hide things from me. The dynamic nature of the World Wide Web is its most cussed feature. Everything can change in the blink of an eye, and nothing is ever where you saw it last. The normal Sun document repository only has documentation for SMS versions through 1.2 in the obvious place. The current version is 1.6. No mention is even made of the E25k.

Lest you fall into maddening despair, the documents in question are located here for now, and are available in many languages which you don’t speak.

Domains

One of the things about domains that I forgot to mention last time is what they aren’t. Domains are not virtual machines like VMWare. They are not software partitions like Sun’s Solaris 10 Zones. Domains are more like hardware partitions. Each domain requires that you dedicate hardware resources (like SBs, IO boards, network cards, disk drives, etc) to that domain, and only that domain. You cannot share a 4 CPU System Board between two domains.

Also, the 15k – 25k systems handle domains a little differently from the E10k. On the E10k, you could create up to 16 domains. Each domain had some associated house-keeping data on the SSP that it needed, including a firmware image. This firmware image had to be generated at the time of domain creation, and sometimes you had to call Sun to get this generation to work correctly.

The 15k – 25k systems are capable of 18 domains, all of which are configured at the factory. You never have to “create” a domain. It already exists. This is achieved by the SMS software creating firmware images and the other associated house-keeping stubs for domains labeled A – R. Again, these domains always exist, even if there are no boards assigned to them. Obviously, you can’t boot a domain that doesn’t have the necessary hardware (System Board, IO Board, Network Card, SCSI Interface connected to at least one hard disk).

I1 and I2 networks

There are two built-in networks that are internal to the platform. They are called the I1 Management Network and I2 Management network. I1 is used for the System Controllers to communicate house-keeping data with the individual domains. Each SC has an IP address in the I1 range, and each domain has an IP address in the I1 range. The I2 network is reserved for house-keeping data that passes from System Controller to System Controller. Each of the two SCs has an IP address in the I2 range. It is sufficient to use RFC 1918 Private Addresses for both of these ranges.

The Sun Fire E25K/E20K Systems Site Planning Guide contains a nice worksheet for you to plan your network and domain layout.

On to the actual commands!

Platform control is performed by logging onto the System Controller via Secure Shell (SSH), and issuing the appropriate commands. This shouldn’t come as any surprise to those of you who are already UNIX systems people, but the E25k is a UNIX system. You don’t get a GUI because you really don’t need a GUI to get your work done. The hostview GUI that was available on the E10k is gone. It never worked well to begin with.

showplatform

As I have said before, domains are collections of System Boards and IO Boards. We will use two main commands to view platform status, showplatform and showboards.

The output from showplatform is quite verbose, so I will trim some of it:

$ showplatformPLATFORM:=========Platform Type: Sun Fire E25K

CSN:====Chassis Serial Number: xxxxxxxxxx

COD:====Chassis HostID: xxxxxxxxxxxxxProc RTUs installed: 0PROC Headroom Quantity: 0Proc RTUs reserved for domain A: 0Proc RTUs reserved for domain B: 0Proc RTUs reserved for domain C: 0...

Available Component List for Domains:=====================================Available Component List for domain spiderman: No System boards No IO boards

Available Component List for domain batman: No System boards No IO boards...

Domain Ethernet Addresses:==========================Domain ID   Domain Tag        Ethernet AddressA           spiderman         0:0:be:ff:ff:58B           batman            0:0:be:ff:ff:59C           superman          0:0:be:ff:ff:5aD           hulk              0:0:be:ff:ff:5bE           zaphod            0:0:be:ff:ff:5cF           tardis            0:0:be:ff:ff:5dG           montmorency       0:0:be:ff:ff:5eH           yoda              0:0:be:ff:ff:5fI           tick              0:0:be:ff:ff:60J           spoon             0:0:be:ff:ff:61K           wallace           0:0:be:ff:ff:62L           gromit            0:0:be:ff:ff:63M           crabtree          0:0:be:ff:ff:64N           zelda             0:0:be:ff:ff:65O           link              0:0:be:ff:ff:66P           mario             0:0:be:ff:ff:67Q           peach             0:0:be:ff:ff:68R           -                 0:0:be:ff:ff:69

Domain configurations:======================Domain ID   Domain Tag        Solaris Nodename       Domain StatusA           spiderman         spiderman              Running SolarisB           batman            batman                 Running SolarisC           superman          superman               Running SolarisD           hulk              hulk                   Running SolarisE           zaphod            -                      Keyswitch StandbyF           tardis            tardis                 Running SolarisG           montmorency       montmorency            Running SolarisH           yoda              -                      Keyswitch StandbyI           tick              tick                   Running SolarisJ           spoon             -                      Keyswitch StandbyK           wallace           wallace                Running SolarisL           gromit            gromit                 Running SolarisM           crabtree          -                      Powered OffN           zelda             zelda                  Running SolarisO           link              -                      Keyswitch StandbyP           mario             mario                  Running SolarisQ           peach             peach                  Running SolarisR           -                 -                      Powered Off

The most interesting parts of this are the second section and the last two sections. The second section lists the chassis serial number. This is very useful when you have to call Sun about a problem with your E25k. The second-to-last section shows that there are Ethernet MAC addresses assigned to a
ll domains A – R, even though domain R hasn’t really been configured.

The last section shows the status of each domain, its domain “Tag,” and its Solaris hostname. The domain tag is an alias to the domain letter name. It’s not always easy to refer to the domains by their letter name, so we can name them something more convenient with the addtag command. There is no requirement that the domain tag be the same as the Solaris nodename. We could, for instance change the domain tag of domain “A” to “production” and the Solaris nodename column would still show “spiderman.”

showboards

Often, it is helpful to find out which system boards are assigned to which domain. We have the showboards command for that:

$ showboardsRetrieving board information. Please wait.Location    Pwr    Type of Board   Board Status  Test Status   Domain--------    ---    -------------   ------------  -----------   ------SB0         On     CPU             Active        Passed        tickSB1         On     CPU             Active        Passed        marioSB2         On     CPU             Active        Passed        peachSB3         On     CPU             Active        Passed        zeldaSB4         Off    CPU             Assigned      Unknown       spoonSB5         On     CPU             Active        Passed        gromitSB6         On     CPU             Active        Passed        wallaceSB7         Off    CPU             Assigned      Unknown       spoonSB8         On     CPU             Active        Passed        montmorencySB9         On     CPU             Active        Passed        tickSB10        On     CPU             Active        Passed        tardisSB11        On     CPU             Active        Passed        montmorencySB12        On     CPU             Active        Passed        tardisSB13        On     CPU             Active        Passed        montmorencySB14        On     CPU             Active        Passed        hulkSB15        On     CPU             Active        Passed        supermanSB16        On     CPU             Active        Passed        batmanSB17        On     CPU             Active        Passed        spidermanIO0         On     HPCI+           Active        Passed        tickIO1         On     HPCI+           Active        Passed        marioIO2         On     HPCI+           Active        Passed        peachIO3         On     HPCI+           Active        Passed        zeldaIO4         Off    HPCI+           Assigned      Unknown       crabtreeIO5         On     HPCI+           Active        Passed        gromitIO6         On     HPCI+           Active        Passed        wallaceIO7         On     HPCI+           Assigned      Unknown       spoonIO8         On     HPCI+           Assigned      Unknown       supermanIO9         On     HPCI+           Active        Passed        tickIO10        Off    HPCI+           Assigned      Unknown       yodaIO11        On     HPCI+           Active        Passed        montmorencyIO12        On     HPCI+           Active        Passed        tardisIO13        On     HPCI+           Assigned      Unknown       zaphodIO14        On     HPCI+           Active        Passed        hulkIO15        On     HPCI+           Active        Passed        supermanIO16        On     HPCI+           Active        Passed        batmanIO17        On     HPCI+           Active        Passed        spiderman

Sometmes, it is more helpful to have this table sorted by domain name, so with a little bit of finesse, we get the following:

$ showboards | grep "^SB" |awk '{print $NF, $1}' | sortbatman SB16gromit SB5hulk SB14mario SB1montmorency SB11montmorency SB13montmorency SB8peach SB2spiderman SB17spoon SB4spoon SB7superman SB15tardis SB10tardis SB12tick SB0tick SB9wallace SB6zelda SB3

Dynamic Reconfiguration

We can see from this output that there are several domains with multiple SBs assigned. This is one of the strengths of the platform. Using Dynamic Reconfiguration (DR), we can do things like add CPUs and RAM to a system that is bogged down, while the system is running. By adding IO boards, we can add multiple paths to disks, or extra network interface cards, etc.

These operations are accomplished through three commands: addboard, deleteboard, and moveboard. Here is the output of a moveboard command that combines the functionality of deleteboard and addboard. In this case, we will remove the board from domain G (montmorency), and add it to domain A (spiderman) while both domains are running. Since we know that montmorency has three system boards currently assigned to it, we won’t (usually) interrupt domain functionality to it when we remove the board.

$ moveboard -c configure -d spiderman SB11request delete capacity (4 cpus)request delete capacity (2097152 pages)request delete capacity SB11 donerequest offline SUNW_cpu/cpu352request offline SUNW_cpu/cpu353request offline SUNW_cpu/cpu354request offline SUNW_cpu/cpu355request offline SUNW_cpu/cpu352 donerequest offline SUNW_cpu/cpu353 donerequest offline SUNW_cpu/cpu354 donerequest offline SUNW_cpu/cpu355 doneunconfigure SB11unconfigure SB11 donenotify remove SUNW_cpu/cpu352notify remove SUNW_cpu/cpu353notify remove SUNW_cpu/cpu354notify remove SUNW_cpu/cpu355notify remove SUNW_cpu/cpu352 donenotify remove SUNW_cpu/cpu353 donenotify remove SUNW_cpu/cpu354 donenotify remove SUNW_cpu/cpu355 donenotify capacity change (4 cpus)notify capacity change (2097152 pages)notify capacity change SB11 donedisconnect SB11disconnect SB11 donepoweroff SB11poweroff SB11 doneSB11 disconnected from domain: GSB11 unassigned from domain: GSB11 assigned to domain: Aassign SB11assign SB11 donepoweron SB11poweron SB11 donetest SB11test SB11 doneconnect SB11connect SB11 doneconfigure SB11configure SB11 donenotify online SUNW_cpu/cpu352notify online SUNW_cpu/cpu353notify online SUNW_cpu/cpu354notify online SUNW_cpu/cpu355notify add capacity (4 cpus)notify add capacity (2097152 pages)notify add capacity SB11 done

DR isn’t perfect. Far from it. It really works, but there are a few things to look out for. Primarily, I’ve never seen an addboard operation fail. You can always add to a hot domain. However, I’ve often seen a deleteboard operation fail. Some times, Solaris has memory allocated that it doesn’t want to turn over. Some times, it can turn the memory over, but only after you quiesce the domain (basically, it freezes the domain for 5 minutes or so while it moves the locked memory to another SB). While a quiescent domain is technically up, it isn’t really running. If your domain is a database server, your application servers that depend on it for operation may give up by then, which is the same thing as “downtime,” but Sun likes to pretend it isn’t. If the DR operation will require you to quiesce a domain, moveboard or deleteboard will warn you ahead of time.

IO Boards are particularly difficult to remove. Some times Veritas Volume Manager grabs hold of a disk drive that you don’t want it to, and will not let it go. Some times, you have plumbed-up an Ethernet interface and forgotten about it.

The most fool-proof way to perform a deleteboard is to shut do
wn the domain from which you wish to remove the board first, then issue the deleteboardcommand. In order to accomplish this, the domain’s virtual keyswitch must bet set to either “Off” or “Standby.”

setkeyswitch

Each domain is equipped with a virtual keyswitch. The keyswitch has three settings:

Keyswitch Setting Function
off SBs and IO boards are powered off.
standby SBs and IO boards are powered on, but system is still functionally “off.”
on System runs Power On Self Test, then OBP is loaded. Once OBP is loaded, system can be booted. $ setkeyswitch on is functionally equivalent to bringup on the E10k.

Console Access

Traditional UNIX servers typically use their serial port as the console device. This is not normally the case with UNIX workstations that usually have a keyboard, mouse, and monitor attached. But there are no serial ports on 25k SBs or IO boards. How, then do we connect to the consoles of our domains?

The answer is the console command. It works just like a normal serial console. Using the ~~# sequence is usually enough to dump the domain back to the ok> prompt, and ~~. disconnects you from the console session.

Conclusion

That is all I have time for now. Part 3 will be here soon.

Published in: on September 14, 2006 at 6:20 pm Leave a Comment