The Upgrade Follies: Communications Manager 9.x to 11.x

If you’ve been in the Cisco voice game for more than a second, you’ve probably done a Call Manager upgrade or two. In my case, I lost count around version 4.1 to 4.3. My record for the longest upgrade that I was a part of is 72 hours…straight! It was painful but such was life back in those days.

With the advent of virtualized Cisco voice and all of its associated parts the upgrades have definitely improved, but that does not mean that the gotchas aren’t still lurking.

For one of my current customer projects I am upgrading a virtualized 9.x environment to 11.x. Unity Connection was upgraded first using CLI. CUC luckily doesn’t have a whole lot of gotchas assuming the engineer that built it originally used an OVA template and followed the correct steps. All you need to do is apply the new “keys” COP file (ciscocm.version3-keys.cop.sgn) and you are good to go. Maybe it isn’t quite that easy, but you get the picture. Communications Manager (Call Manager) is a far different animal. For the sake of automation, I’m doing the CM upgrade using Prime Collaboration Deployment (PCD). PCD is a separate application server that comes with your upgrade order on Cisco’s Product Upgrade Tool (PUT). PCD basically allows you to take a CM cluster (including IM&Presence) and script the upgrade so that you can basically click once and wait. If only it were that easy…

There are a couple gotchas that I’ve learned the hard way. I’ll list them below, hopefully they can help you out during your next upgrade.

  1. Licensing. If you are going to do an upgrade, do your homework. I could write an entire post on licensing but for a 9.x to 11.x upgrade it really just involves doing clean up. If you are getting that ugly little licensing warning from Cisco when you log into your system, clean it up before attempting to upgrade, you’ll save yourself from a lot of pain later.
  2. Disk Space. I get it, hard drive space is cheap, but OVAs are not future proof. The original mid-size build OVA for CM 9.x specified an 80 gig virtual drive. The 80 gig drive model is not not supported by fresh 11.x installs. What bites engineers in the ass is a pesky little storage location known as the common partition. When the 11.x upgrade script first verifies that an upgrade is possible, it checks the amount space available in that common partition, if less than 25 gigs space is available the upgrade will fail. There is an excellent Cisco Support Forums post about this failure here.  So what do you do if the above scenario is true?

There are 3 options…

  1. You can go into your server’s TFTP directory and manually remove old crap that you don’t need. If you happen to remove crap that you do need, bad things WILL happen, so keep that in mind….
  2. You can apply the following COP file ciscocm.free_common_space_v1.3.k3.cop.sgn. This little beauty will remove any software tied to the inactive partition of the system which may include software tied to the common partition, thus giving you your required space. This is a really cool idea/theory, but I’ve had mixed results.
  3. This is the scariest sounding one, but its actually not that bad. There is a second COP file that you can run. The ciscocm.vmware-disk-size-reallocation-1.0.cop.sgn; This file will allow you to resize your virtual disk in VMWare ESXi and make CM ok with it (I tend to expand 80 gig disks to 160 as long as the physical systems allow it). **Changing the size of the disk without the COP file basically guarantees you a rebuild from scratch and possibly a resume` generating event.** As I said, its not that bad, but there is a catch. In a couple of different cases, your results may vary.

1. If your CM system is running on a snapshot within ESXi, your virtual disk size adjustment option will be grayed out, this will inevitably cause you to panic and wonder if a PCD migration is your only option, its not. You can actually delete the snap shot and once you do that you should be able to change your disk space. Once you do this, restart your VM and let it go through its process. It will reboot 2 times during the start-up process as it expands the disk in the “BIOS” and then in the initial CM boot process If you’ve done it correctly you’ll see your partitions aligned and your new disk size in both the CLI and web console.

2. In some cases the allocation of your virtual disk may be as large as the blocks with in the disk controller will allow it to be. If this is the case, you have two options. Option 1 (see option 1 above). Options 2, migrate instead of upgrade using PCD. It will take more time but you and/or your customer’s data should be safe and back where it belongs.

Whew… that was a long post. I hope it helps someone. For those of you new to upgrades, you’re lucky, they keep getting easier. If you have questions, leave a comment and we can have a discussion. Thanks for reading!

 

-Justin

Advertisements

Cisco Prime Collaboration Provisioning – Canceling Failed Orders & Wait State Orders

Cisco’s Prime Collaboration Provisioning (formerly Cisco Unified Provisioning Manager) tends to invoke one of two distinct responses from Collaboration folks; seething hate or mild disdain. While I have several historical reasons to be a part of one or both of these camps, I actually like PCP and when it is installed correctly and used for what it was designed to be used for, I find it to be an effective and handy application.

With all of that said, if you’ve used PCP for than a minute, you know that it can be fickle when it comes to pushing through bulk orders and completing tasks. You’ve probably had a bulk order that just sat there and sat there and never completed. Maybe you looked in the job log, I hope you did, but if you didn’t you should next time. If your order has permanently seized, you’ll see a “wait” statement. If you see this “wait” statement, your job will not complete. Basically the wait statement means that PCP encountered a problem that it does not have a definition for. If it had a definition, the job would end with an error and all would be good (albeit still with an error, but complete). For example, if you have an LDAP synchronized CUCM but for whatever reason the user ID that PCP has and the user ID that CUCM has are different PCP will try to create a user in CUCM and CUCM will tell it “no”. No, should be an error, but not to PCP in this case. On a side note if you look at a successful job in progress you’ll see a “sleep” statement. Sleep means that the prerequisites are complete and PCP is just waiting for all of the requested changes to be completed in the downstream system before it completes its job. In PCP terms “wait” is bad “sleep” is good.

When you run into one of these “permanently seized” errors, you can reboot the box and hope that whatever gremlins caused the order to fail are dead and gone, but shouldn’t there be a better way? There is.

**There is a timeout value on the wait statements, so if you want to wait for the order the fail naturally you can, but during a deployment, time is almost never your friend.**

I found this little bit of joy on a Cisco forum a couple of years ago and I think it is a good tool to keep in your back pocket when working with PCP. This command does require root access, but you create a password for the root user when you install/setup PCP so that isn’t a problem.

/opt/cupm/sep/ipt/bin/AbortOrders.sh globaladmin <password> <order_number> -forced

The command above cancels all parts of a fouled/never ending order. To run this command you must SSH to the PCP box using root credentials. You must also have access to the globaladmin credentials, but hopefully that won’t be a problem.

Once you cancel the order in question, you can go back and fix whatever the problem may have been and reattempt the job.

I hope this long winded bit of information helps someone out there. If you have questions or comments, leave them below.

-Justin