Mar 092015
 

A New Dawn

Starting in January 2015, Avaya has changed it’s official policy with regards to Microsoft Hotfix updates to AACC servers. Prior to this policy update, all Microsoft Hotfixes were approved for installation only when tested and approved specifically by Avaya. There were numerous Hotfixes that were not approved and if those Hotfixes were installed, Avaya could (and sometimes did) decline to support the customer site. As of the January 2015 policy update, only those Hotfixes specifically listed by Avaya as not compatible are restricted from installation.

What this means for the traditional customer is that the standard IT Security policy of installing the latest Microsoft Hotfixes to ensure OS security is now part of the approved processes for Avaya Aura Contact Center Servers. As long as the Hotfix was released prior to the last published date of the bulletin, and as long as Avaya has not discovered a specific fault, the Hotfix is supported for installation on AACC systems.

As of this blog post, all Microsoft Hotfixes released by Microsoft on or before 10 Feb 2015 are approved for installation on Avaya Aura Contact Center, if the AACC is Release 6.4 SP14. Service Pack 14 was released mid-December 2014. For older systems (AACC SP13 or earlier, or any NES CC or Symposium systems), the older policy remains in force. Only those specifically tested and approved by Avaya are allowed to be installed, and for extremely old systems (NES CC or Symposium) installed on Windows 2003 Server or earlier operating systems, the Microsoft end of life is relevant.

Avaya Aura Contact Center runs on Windows 2008 Server R2 with specific server hardware engineering requirements. [Avaya credentials required] For more information about server specifications, please refer to the linked documentation or contact your support partner for assistance in ensuring hardware compliance.

Take Away

From a partner support perspective, this makes checking compatibility a much simpler endeavor– as long as the system is on SP14 or later, if the Hotfix isn’t listed then it’s OK to install. So the business partner need only look to see if any patches were installed after the “released before” date on the bulletin and only check those (or look for a limited number of specifically restricted hotfixes.)

From a customer support perspective, this ensures that AACC server OS security is capable of being much more current than it ever has been before in the history of the AACC product line.

This is great news for all concerned!

Recommendations

First, consult your support partner. Take their direction over anything you read on the internet. Installation of Service Packs for AACC is (these days) virtually a full dot release upgrade instead of the simple patch window we used to have with early AACC Service Packs or NES CC Service Updates. My experience is that instead of having a 2-5 hour window, windows are now consistantly 4-7 hours, and potentially much longer if the system is Highly Available. And that doesn’t even take into account the pre-upgrade engineering that is necessary to ensure you don’t upgrade and then find yourself exceeding the hardware requirements on the AACC’s Windows 2008 Server hardware.

Second, if you are on anything prior to AACC 6.4 Service Pack 14, you should update to SP14 ASAP. This addresses many of the most common and well known issues on the AACC. Similarly, if you are on anything prior to AACC 6.x you should upgrade now. Windows 2003 Server will soon reach end of life. This will obsolete NES CC6 and NES CC7 even more so than it is obsolete now (since those systems are “functionally stable” and there are no “corrective content” plans for this manufacture discontinued product version.) There are many reasons why you should upgrade, but to keep this focused on OS Security and Microsoft Hotfix compatibility, Windows 2008 Server will continue to receive additional Hotfix content. Windows 2003 Server, and earlier, will not.

Third, in the process of upgrading to SP14, you or your support partner should carefully review the readme to determine all of the known issues and known fixes for associated systems. There are engineering considerations on the PBX, PBX patches, Callpilot versioning (if you have ACCESS ports) and other considerations that should be taken into account. Some considerations aren’t part of the standard PBX DEPLIST, and by updating the DEPLIST the PBX patch required by the AACC Readme gets removed, resulting in recurring maintenance issues.

Jun 112014
 

https://support.avaya.com/downloads/download-details.action?contentId=C20145311538319080_3&productId=P0599&releaseId=UNIStim%205.x

Updated my Google doc table of IP Phone firmware:

Nov 072013
 

Issue

When IP Phones enter a reboot loop, attempt to “upgrade”, fail, then reboot again, or
When IP Phones enter a reboot loop, attempt to “upgrade”, fail with “FW authentication failure”, then reboot again

Environment

Avaya CS1000

UNIStim 5.0 or earlier

Avaya IP Phone 1100, Avaya IP Phone 1200

Cause

UNIStim firmware is digitally signed.

Signature has an expiration date.

UNIStim versions prior to 5.0 had shorter expiration dates.

New IP Phone hardware will not load firmware with expired signatures.

Source: http://downloads.avaya.com/css/P8/documents/100152833

Solution

Use UNIStim 5.1 or later firmware.

Avaya has applied a digital signature with a 10 year expiration date to UNIStim 5.1 and later.

UNIStim 5.5.1 (C8T) released in Aug 2013.

I updated my Google drive table of UNIStim firmware releases.

 

Nov 032013
 

Recently worked on an AACC (Avaya Aura Contact Center) where the partitioning of the server was determined to be the cause of the problem. While Disk Management (diskmgmt.msc) is easily accessible from START>>RUN, a screenshot is not quite as portable as text. To that end (and as a recommendation for addition to the Nortel Enterprise Audit Tool, or NEAT, used to survey Contact Center servers for Avaya engineering), I put together a script to query WMI (Windows Management Instrumentation) for the necessary information.

WMI Objects:

  • Win32_DiskDrive
  • Win32_DiskDriveToDiskPartition
  • Win32_DiskPartition
  • Win32_LogicalDiskToPartition

Using WMI queries against these objects you can derive:

  • Win32_DiskDrive => Physical Device ID (.\\.\PHYSICALDRIVE0\)
  • Win32_DiskPartition => Partition Device ID (Disk #0, Partition #1) and a derived type (e.g., Simple Volume? Primary Partition? Extended Partition/Logical Drive?)
  • Win32_LogicalDiskToPartition => Logical Drive Device ID (D:)

For quick “automated” checks of a system to verify compliance with engineering guidelines, this is a must.

Sample output:

\\.\PHYSICALDRIVE0,Disk #0, Partition #2,Basic,True,C:,Primary Partition
\\.\PHYSICALDRIVE0,Disk #0, Partition #3,Basic,False,D:,Extended Partition/Logical Drives
\\.\PHYSICALDRIVE0,Disk #0, Partition #3,Basic,False,F:,Extended Partition/Logical Drives
\\.\PHYSICALDRIVE0,Disk #0, Partition #3,Basic,False,G:,Extended Partition/Logical Drives
\\.\PHYSICALDRIVE0,Disk #0, Partition #3,Basic,False,T:,Extended Partition/Logical Drives

and

\\.\PHYSICALDRIVE0,Disk #0, Partition #2,Dynamic,True,C:,Simple Volume?
 \\.\PHYSICALDRIVE0,Disk #0, Partition #3,Dynamic,True,D:,Simple Volume?
 \\.\PHYSICALDRIVE0,Disk #0, Partition #3,Dynamic,True,F:,Simple Volume?
 \\.\PHYSICALDRIVE0,Disk #0, Partition #3,Dynamic,True,G:,Simple Volume?

The cool thing is that the script is applicable for all systems going back to Windows 2000 (Symposium 4 if I recall correctly) when the WMI query objects were instantiated in the OS by Microsoft.

Oct 032013
 

Recently, a number of customers have been experiencing a rash of heartbeat issues between CS1000 components. In this article, I’m going to walk through some of the troubleshooting I’ve done recently and match symptoms to cause.

CS1000 RUDP Heartbeat behavior

CS1000 RUDP Heartbeat behavior

Thump, Thump: The Heartbeat

There are several heartbeats between different CS1000 components. Call Servers (CPUs) have their own hearbeat, an highly available redundant system will have a heartbeat over the high-speed pipe (HSP). CS using Geographical Redundancy (GR) Callpilot and Avaya Aura Contact Center (AML implementations) have a heartbeat between each of themselves and the Active CS. CS and IPMGs (IP Media Gateways, i.e., Media Gateway Controllers aka MGCs, or Voice Gateway Media Cards aka VGMCs aka MC32s) have a heartbeat.

Ports used

The RUDP Heartbeat uses port 15000 for source/destination. The 60 byte packet has a data payload of 6 bytes. While I haven’t worked out the meaning of the contents of all 6 bytes, I have worked out that the first 4 bytes are a sequence number that increments every successful round trip and the last 2 bytes are used as a send/receive flag. (i.e., 0x02ff for the originating side, 0x0100 for the responding side.)

RUDP

Reliable User Datagram Protocol (RUDP) was developed to support IP communication needs increasing reliability without the TCP overhead.

Heartbeat Process

The CS sends an RUDP Heartbeat over Port 15000 to the far end device (e.g., an MGC) to port 15000. The far end device repeats the pattern back to the originating device. This repeats every 1000 milliseconds.

The far end device sends an RUDP Heartbeat over Port 15000 to the CS to port 15000. The CS repeats the pattern back to the originating device. This repeats every 1000 milliseconds.

Each RUDP Heartbeat uses it’s own sequence number and each successful heartbeat causes the originating device to increment its sequence number by one. e.g., the CS sends its payload, the MGC replies back, the CS increments its sequence number and 1000 milliseconds later send another RUDP Heartbeat to the MGC. Meanwhile, on a separate 1000 millisecond timer and using a different sequence number, the MGC sends its payload, the CS replies back and the MGC increments its sequence number, sending another RUDP Heartbeat 1000 milliseconds after the previous one.

This means that there should be four (60 byte) packets passed between any two devices engaged in a RUDP Heartbeat exchange: A heartbeat from each side, as well as a response from each side.

NOTE: I don’t have a lab system to test what number is used to start the sequence– but I suspect that the number is either randomly generated or derived from the current unix time value. There is also probably additional handshaking that goes on during RUDP Heartbeat setup, I certainly see traffic on various ports during periods when I get SRPT0308 and ELAN009/ELAN014 messages indicating session closure/restart. Without documentation though, or a lab and lots of time to decrypt the process, this will remain a mystery for the foreseeable future.

Sherlocking the problem (How to Troubleshoot)

HBdebug

HBdebug can be used to enable additional diagnostic information output to the PDT rd logs (rd, rdall). While the documentation claims it will output heartbeat diagnostics, I have seen SRPT0308 and ELAN009 events without a corresponding increase in rd log data.

However, I have also seen SRPT016 and ELAN009 which provide additional diagnostic information. Based on this difference in behavior, I’ve come to an educated guess about what it means when you get an ELAN009 with HBdebug diag info and what it means when you don’t get the info. Without HBdebug diag info, the most likely cause of the error message is a Firewall (i.e., Stateful Packet Inspection closing RUDP Port 15000 heartbeat sessions for “idle session” reasons. Even though the RUDP Heartbeat transmits at least 120-240 bytes per second back and forth between the two devices.)

Packet Captures

Packet captures covering the event are helpful to Avaya for troubleshooting root cause. When performing a packet capture, it is best to obtain them from a mirrored port of the CS and another mirrored port of the remote device experiencing the connectivity problems. While I’ve become proficient at reading the pcap log, it will take time for any new troubleshooter to become familiar with what’s in the logs and to be able to use them to self-diagnose the problem.

I like using the Wireshark filter string ((ip.src==10.10.10.10 || ip.src==10.10.20.20) && (ip.dst==10.10.10.10 || ip.dst==10.10.20.20) && udp.port==15000) — If 10.10.10.10 is the CS and 10.10.20.20 is the remote MGC, this will find any packets coming from either of them that are also destined for either of them while also being on UDP port 15000. Adjust IP addresses as needed and the UDP port as needed (15000, 32779 or 32780).

Network Analysis

Get a network topology & configs. Look at intermediate devices like Firewalls, WAN networks and for everything from physical layer issues up to network prioritization (QoS).

Why it happens (Root causes)

  • ELAN heartbeat traffic is supposed to be treated as Real Time traffic. When you’re bridging the ELAN across multiple physical locations, or routing it across a WAN, it’s best to give this traffic Expedited Forwarding (DSCP 46) so that it is routed as one of the highest priority packets in your network. The Heartbeat is “network control” after all.
  • WAN (carrier) may not support QOS. Packet loss, jitter and latency can all be treated as “lost” heartbeat packets.
  • Firewalls may not support QOS. For intra-site Firewalls, the QOS tags may be dropped at the Firewall– for this the network team needs to evaluate prioritization rules on ingress from the firewall (either side) to re-tag the packets. The Firewall may not support QOS, but if the equipment on both sides is prioritizing the traffic sent to the Firewall and re-tagging it on egress, it doesn’t make as much of a difference.
  • Firewalls may block traffic (Access Control Lists)– engage the network security team to make sure all necessary packets are allowed.
  • Speed/Duplex mismatches (or Autonegotiation failures)– Physical layer trumps everything. Trace your cables and make sure everything is connected. Perform network testing to identify any potential
  • Congestion, high network utilization, broadcast storms, etc.– Make sure there is sufficient bandwidth to serve your applications. This is rarely an issue in LAN environments, but is certainly an issue for numerous WANs.
  • Outage- equipment goes down, cables are removed/cut, WANs suffer failures reducing available bandwidth.

Other survivability considerations

  • Geographically redundant systems use RUDP Heartbeat.
  • Highly Available systems use RUDP Heartbeat over HSP, and will fail over to ELAN if possible.
  • Each CS/CPU in a CS1000E system uses RUDP Heartbeat with every other device in the environment that participates in Heartbeat activity. Unlike legacy CS1000M systems, the redundant CPU is not inactive.
  • IPMGs (VGMCs & MGCs) use RUDP Heartbeat to each CS. IPMGs and CSs also participate in an IPMG heartbeat process (using UDP Port 32779 & 32780 with a different payload structure than the RUDP Heartbeat sent over Port 15000.)
  • Branch Office CS participate in RUDP Heartbeat with the Main Office CS.
  • Secondary NRS participate in ICMP Heartbeat with Primary NRS, as do different UCM and SS systems.
  • Callpilot & AACC communicate over AML IP Phones
  • Callpilot & AACC communicate with each other of a ACCESS port connection.
  • IP Phones participate in RUDP heartbeat with the SS (as noted above, this is somewhat configurable).

Symptoms to Cause

  • Alarm pattern: CS1000 SRPT0308, ELAN009 / No reboot of remote device – Firewall Stateful Packet Inspection timer closes RUDP Heartbeat between devices.
  • Alarm pattern: CS1000 ELAN009, SRPT016 IPMG DOWN / Remote device reboots – Use LastResetReason on IPMG. Most likely due to loss of network connectivity due to packet loss, latency or jitter. Use HBdebug to perform further diagnoses and/or obtain a packet capture from both ends. Network analysis may be required to resolve.
  • IP Phones reboot – use usiQueryResetReason to obtain last reset reason. If caused by RUDP Heartbeat retry exhaust, evaluate RUDP Retry settings on linuxbase (UCM) and network cause.

Useful commands & notes

  • IP Phone RUDP status display & toggling state – Mute Up Down Up Down Up * 2 – Current RUDP state appears and one softkey is available to switch state, another softkey is available to exit. (Not available on all UNIStim releases.)
  • Linux/VGMC pbxLinkShow – Show link state, including RUDP information
  • PDT/VGMC rudpShow – Show RUDP information
  • VGMC usiQueryResetReason – Show reason for last reboot of an IP Phone
  • IPMG LastResetReason – Show reason for last reboot of an MGC
  • SS/Linux usiGetPhoneRudpSettings (7.5 or later) – Show the Retry & Timeout settings for RUDP Heartbeat between IP Phones and TPS (Terminal Proxy Server, located on the SS)
  • VGMC/SS/IPMG TPS.INI contains a retry limited called rudpWindowSize

Reference

  • Troubleshooting Guide for Distributors contains a section entitled “IPMG Call Server heartbeat mechanism” which talks more fully about the heartbeat mechanism between the CS and IPMGs. It also provides several examples of outages and the alarm pattern.
  • Search for RUDP in the Documentation for other references

Sep 192013
 

LLDP-MED for Avaya CS1000 IP PhonesIncreasing boot efficiency is one of those things I’m working on. My personal or work PC, my IP Phone, systems I manage. The less time I have to spend sitting around waiting for something to boot up is more time doing something productive. On the PC, that involves looking at your startup folder, your registry run folders and removing any unnecessary services from automatic startup.

For Avaya CS1000 IP Phones, that involves looking at the config and determining which features can be added or removed to achieve an optimal boot up sequence.

Although my 4st post is not live yet (when it is, it will be here), in it I cover Link Layer Discovery Protocol (LLDP) and how it applies to Avaya CS1000 IP Phone deployment. On of the biggest inefficiencies I’ve found in CS1000 IP Phone deployments is where customers leave LLDP enabled but don’t use it.

ZzzWaiting for LLDP-MED (Link Layer Discovery Protocol, Media Endpoint Discovery) can add as much as 30 seconds delay to the boot process… So disable it if you’re not using it!

With stickiness, you can configure the Phone to not use LLDP on bootup, or you can disable it manually at each phone by turning it off.

On the other hand, if use LLDP you might increase boot efficiency by distributing the configuration of the IP Phones and reducing dependency upon DHCP. If you want to configure the Voice VLAN but don’t use LLDP, your options are to manually configure each IP Phone or use the VLAN-A option to assign a Voice VLAN ID.

Avaya CS1000 IP Phone, DHCP provisioning behaviorIf you use DHCP though, you’re going to be querying the DHCP server (or multiple servers) multiple times.

It’s certainly faster than waiting for LLDP-MED to time out, but using LLDP-MED is faster than multiple DHCP queries (Although talking a fraction of the delay caused by LLDP-MED being enabled and unused.)

It’s also a good idea to reduce the number of retries to allow the IP Phone to failover to an alternate signaling server (i.e. Connect Server) more quickly.

Take away:

  • If you’re not using a feature, disable it. Your phones will boot faster and you’ll recover more quickly from maintenance windows or disaster.
  • Nortel-i2004-B,s1ip=10.10.10.10;p1=4100;a1=1;r1=3;s2ip=10.10.10.20;p2=4100;a2=1;r2=3;vq=y;st=y;lldp=n;vvsource=a;

Sep 192013
 

Tagged & Unregistered Frame Diagram

Recently found myself troubleshooting Untagged and Unregistered frame filtering on an Avaya Ethernet Routing Switch. This is a quick tutorial for future discussions with other engineers about how filtering works on an Avaya Ethernet Switch.

A quick description of tagged vs registered:

  • A tagged frame is a frame which has an 802.1q VLAN ID tag on the packet.
  • A registered frame is a frame which has an 802.1q VLAN ID tag on the packet that matches the VLAN MEMBERS configuration on the port.

In the above diagram, if the packet egressing from the first data switch is tagged with VLAN 10 (PVID or Primary VLAN ID 10), then the packet is both tagged and registered when it ingresses on the second data switch. However, if the packet is tagged with VLAN 20, the packet is tagged but unregistered when it ingresses on the second data switch.

Use of untagged filtering or unregistered filtering is for environments where the administrator wishes to protect against mistakes in recabling network devices or vlan configuration mistakes. One of the historical issues that has happened in the past with Nortel Ethernet Routing Switches is that if an unregistered frame is received and the receiving data switch does not know what to do with it, it may divert the packet to VLAN 1 (the default VLAN ID on all data switches). This can result in accidental broadcasts of extraneous packets.

A few examples of network changes that might result in problems:

  • Accidentally connecting a server to the second data switch trunk port– the server packets are filtered by filter-untagged-frame enable.
  • Accidentally connecting a different data switch to the second data switch trunk port– if the vlan membership and PVID settings are different, the server packets can be filtered by filter-untagged-frame or filter-unregistered frame.
  • The first data switch is factory reset or the trunk port is configured with VLAN membership that need not be extended to the second data switch– if the 802.1q VLAN ID on the packet is not one of the VLAN member IDs on the ingress port, then the packet will be filtered by filter-unregistered-frame. If the first data switch port is configured as a non-trunk port (any other tagging configuration: tagging untagpvidonly, tagging untagall, tagging tagpvidonly) and the packet is untagged and sent to the second data switch, then the filter-untagged-frame setting may cause the packet to be filtered on ingress.

Michael McNamara posted an article back in 2007 discussing an issue with Avaya IP Phones where filter-unregistered-frame enable caused problems with IP Phone registration. The relevant excerpt is as follows:

The option (vlan ports 1-46 filter-unregistered-frames disable) was added after an issue was discovered when trying to upgrade the firmware on the IP phones. The filter-unregistered-frames is enabled by default and should be disabled to avoid and issues with upgrading the firmware on the IP phones. We are attempting to investigate further with Nortel and our voice vendor…

Take away:

  • The way I read the documentation, and I haven’t had a lot of opportunity to thoroughly test this, filtering is ingress only. This means you’re consuming bandwidth with improper egress port configuration. Depending on your environment, this wasted bandwidth might be minor or major.
  • Use of VLAN ID is strongly frowned upon in a multiple VLAN environment. Don’t assign PVID 1 or VLAN ID to any ports (i.e., remove VLAN 1 from all ports, because VLAN ID 1 is on everything in factory default configurations.) If you’re not doing any VLANing or any Layer 3 routing (i.e., simply Layer 2 switching only), then VLAN ID 1 is fine– but as soon as you start tagging packets on that switch you should stop using VLAN ID 1.
  • Filtering is best used on trunk ports to prevent broadcast storms or extraneous packets from being broadcast into a data switch (and the impact of extraneous packets from diverted unregistered frames is minimized if you do not use VLAN ID 1.)

What to do with this information:

  • Review all ports for VLAN membership or PVID of 1– if found and if you are a multi-VLAN environment, convert VLAN ID 1 to another VLAN ID on all ports then remove VLAN ID 1 from all ports (including PVID 1).
  • Review all ports set with tagging tagAll (including multi-link trunk (MLT) ports)
    • Compare VLAN membership and PVID settings on each side of a trunk connection.
    • Make sure VLAN membership matches on both sides.
    • Make sure the PVID ID on each side is a VLAN member on the opposite side.
    • Make sure the PVIDs match on both sides (I can’t think of any valid design reasons why PVIDs might not match.)
  • Review all trunk ports (tagging tagAll) and set filter-untagged-frame enable
  • Review all trunk ports (tagging tagAll) and set filter-unregistered-frame enable
  • Review all non-trunk ports (access ports) and set filter-untagged-frame and filter-unregistered-frame based on the network design
    • Note: If the port will not be receiving 802.1q tagged packets on ingress, then filter-untagged-frame and filter-unregistered-frame should be disabled.

Apr 032013
 

  • LD 48
  • Enabling Application Module Link (AML) traces
    • enxp <msgi/msgo> <aml_id> <exclude_priority> … <exclude_priority>
      Example:
      enxp msgo 16 1
      enxp msgi 16 1
    • enl <msgi/msgo> <aml_id>
      Example:
      enl msgi 16
      enl msgo 16
  • Disabling
    • dis <msgi/msgo> <aml_id>
    • dsxp <msgi/msgo> <aml_id>

 

Mar 072013
 

I’ve been dealing with a number of complicated cases recently for work requiring that I review mass quantity of multi-meg trace files (text files outputting raw log information). The reason for this is because the customer that I’m working with is using a CTI application (Chrysalis Cloudburst) to control call routing and perform screen-pop functionality at the agent’s desktop when the call arrives. As part of that process, Cloudburst accepts the call from the IVR application and then performs a CTI route via MLSM (Meridian Link Services Manager).

--------    Output from mlsm.exe        Tue Feb 19 09:13:43.781 2013   (13:43.781)
ITR Route
                  03 1f 00 00 00 00 1e 34 a5 8b 00 00    header (12 bytes)       
                                             95 01 05    Subtype (Route)      
                                    96 04 00 00 2f c1    Call Id       
                                          4b 02 66 26    CDN (6626)      
                                          31 02 66 a3    Terminating DN (6603)      
                                             67 01 00    Conditional (unconditional)      
=======================================================================================
--------    Input                       Tue Feb 19 09:13:43.781 2013   (13:43.796)
ITS Route
                  03 22 00 00 00 00 1e 35 a5 8b 00 00    header (12 bytes)       
                                             95 01 05    Subtype (Route)      
                                    96 04 00 00 2f c1    Call Id       
                                          4b 02 66 26    CDN (6626)      
                                          31 02 66 a3    Terminating DN (6603)      
                                             67 01 00    Conditional (unconditional)      
                                             aa 01 00    Status (Successful request)      
=======================================================================================

When you’re looking for twenty or thirty trace entries across three to twelve trace files, representing hundreds (or potentially thousands) of trace entries… it gets to be both time consuming and tedious. I decided to code a parse utility to make my life easier. Part that process is converting the decimal call ID (in the example above, Decimal 12225) to a hex call ID and properly formatting that hex value (hex 2fc1) so that it can be searched for in the trace files.

While I have no intention of releasing the full parse utility to the public (if you really want a copy, contact me via LinkedIn, or comment on this blog and we’ll discuss it), I thought it might be helpful for those “do it yourselfers” who might want a hint at how to overcome one of the challenges I had to overcome to get this utility functional.

I’m using VBScript running as a cscript:

'**
'* function convertCIDtoCallID integerCallID
'* returns stringFormattedHexCallID
'*
'* Converts a decimal call ID to a hex string call ID with spaces between each byte (for use in searching AML logs)
'*

function convertCIDtoCallID( iCID )
 ' convert decimal Call ID to hex, then convert to lowercase
 hCID = lcase(hex(iCID))
 sRetVal = ""

 ' if the number of characters in the hex Call ID is odd, then we need a leading 0 inserted
 if (len(hCID)%2)=1 then
  hCID = "0" & hCID
 end if

 ' for each character from 1 to the length of the hCID string
 for i=1 to len(hCID)
  ' group hex values in groups of two characters
  if ((i mod 2) = 1) and (i>2) then
   ' insert a space between hex value groups
   sRetVal = sRetVal & " "
  end if
  ' append the current character to the return value
  sRetVal = sRetVal & mid(hCID,i,1)
 next

 ' return the formatted hex call ID string
 convertCIDtoCallID = sRetVal

end function

What this does is convert a decimal value to hex (12225 to 2fc1) and then formats it the way it might appear in an AML trace (2f c1). It also automatically adjusts for scenarios where the hex value is 3 hex digits (hex fc1 = dec 4033) to something that can be searched for (0f c1) instead of mishandling (fc 1, which would not be found in the traces.)

This should save me hours in reviewing log files.

Aug 092012
 

Zone Bandwidth Strategy

One of the things I run into regularly with deployments is an issue where the installer improperly configures the Zone Bandwidth Strategy. This information can be checked by printing the interzone and intrazone configurations in LD 117.

=> prt zone
Use the PRT INTERZONE or PRT INTRAZONE

=> prt interzone
 ______________________________________________________________________________________________________________________
 |Near end  |Far end   |State| Type  |Stra|MO/ |QoS| Bandwidth | Sliding |  Usage  |  Quota  |Peak|  Calls  |  Alarm  |
 |          |          |     |       |tegy|BMG/|Fac|Configured |   max   |         |         |    |         |         |
 |          |          |     |       |    |VTRK|tor|           |         |         |         |    |         |         |
 |----------|----------|-----|-------|----|----|---|-----------|---------|---------|---------|----|---------|---------|
 |Zone|VPNI |Zone|VPNI |     |       |    |    | % |    kbps   |   kbps  |   kbps  |   kbps  | %  |   Cph   |   Aph   |
 |----|-----|----|-----|-----|-------|----|----|---|-----------|---------|---------|---------|----|---------|---------|
 |   1|     |    |     | ENL |SHARED |  BB|  MO|   |    1000000|         |      380|        0|   0|         |         |
 |--------------------------------------------------------------------------------------------------------------------|
 |   5|     |    |     | ENL |SHARED |  BB|VTRK|   |    1000000|         |      380|        0|   0|         |         |
 |--------------------------------------------------------------------------------------------------------------------|
 |  42|     |    |     | ENL |SHARED |  BB|  MO|   |    1000000|         |        0|        0|   0|         |         |
 |--------------------------------------------------------------------------------------------------------------------|

=> prt intrazone
 _________________________________________________________________________
 |Zone|State| Type  |Strategy|MO/ | Bandwidth |  Usage  |  Quota  | Peak |
 |    |     |       |        |BMG/|    kbps   |   kbps  |   kbps  |   %  |
 |    |     |       |        |VTRK|           |         |         |      |
 |----|-----|-------|--------|----|-----------|---------|---------|------|
 |   1| ENL |SHARED |   BQ   |  MO|    1000000|      380|        0|    0 |
 |-----------------------------------------------------------------------|
 |   5| ENL |SHARED |   BQ   |VTRK|    1000000|      380|        0|    0 |
 |-----------------------------------------------------------------------|
 |  42| ENL |SHARED |   BQ   |  MO|    1000000|        0|        0|    0 |
 |-----------------------------------------------------------------------|
  Number of Zones configured = 3

=>

The Strategy column for intrazone and interzone can be configured as either:

  • BQ aka Best Quality aka G.711 codec selection
  • BB aka Best Bandwidth aka G.729 codec selection

BB should be used when a WAN interface is handling VoIP with the following limitations/restrictions:

  • Bandwidth shaping or rate limiting restrictions
  • Bandwidth availability limitations (either the pipe is too small, or other usage + VoIP traffic exceeds available bandwidth)
  • No QoS policies configured

Even if you believe your WAN can support the bandwidth demands of VoIP + other usage, if you are experiencing any quality issues with VoIP, it’s worth changing your bandwidth strategy to BB (i.e., Best Bandwidth, G.729) as part of your troubleshooting efforts. I also recommend that you configure the available bandwidth to match the actual available bandwidth.

Interzone

As the word root suggests, the Interzone Strategy is used for all VoIP calls that leave the zone wherein the caller is configured. This strategy is also used for calls that are inter-site.

Intrazone

As the word root suggests, the Intrazone Strategy is used for all VoIP calls within the zone.

Purpose of Zones

Zones provide a way of controlling the behavior of VoIP and QoS within a site (and between sites). Zones work best when used to represent physical locations.

Example

Zone 1 is set up for all phones within a site (on the LAN). BQ strategy is used for intrazone and BB is used for interzone.

Zone 2 is set up for all phones at a branch office. BQ strategy is used for intrazone and BB is used for interzone.

Zone 3 is set up for all softphones connected via VPN (remote employees). BB strategy is used for both intrazone and interzone.

If a caller in Zone 1 calls anyone in Zone 1, they use G.711 (BQ). If they call anyone outside of Zone 1 (a VPN user softphone, or the branch office) they use the G.729 (BB strategy) codec.

If a caller in Zone 2 calls anyone in Zone 2, they use G.711 (BQ), but calls out of Zone 2 use G.729 (BB). Just like Zone 1.

If a caller in Zone 3 calls anyone they use G.729 (BB).

Mixed VoIP/TDM environments

TDM resources are sometimes configured to be in a separate zone by installers. I’ve never understood this, because following the logic above, if all the TDM equipment (Media Gateways, MGCs, MC32s, IP Trunks, etc.) are put in Zone 4 and everyone uses BB for interzone, then all calls between Zone 1 and TDM resources in Zone 4 will use the G.729 codec (BB) (even if those Zone 4 resources are physically located in the same site as Zone 1).

From my way of thinking, all physical TDM equipment should be configured in the zone appropriate to its physical location in the network. Zone 1 VoIP users should share the zone with Zone 1 TDM equipment/users. But, if there is TDM resources at the branch office, they would go into Zone 2 (using the above example).

Virtual Trunking would be a special case, in that Virtual Trunks are not used by IP Phones, so putting them in a separate zone has no negative impact (and the fact that they show up as a zone type in the Intrazone/Interzone print out tells us it was Nortel’s Design Intent that they be put in their own zone.)