Mar 092015

A New Dawn

Starting in January 2015, Avaya has changed it’s official policy with regards to Microsoft Hotfix updates to AACC servers. Prior to this policy update, all Microsoft Hotfixes were approved for installation only when tested and approved specifically by Avaya. There were numerous Hotfixes that were not approved and if those Hotfixes were installed, Avaya could (and sometimes did) decline to support the customer site. As of the January 2015 policy update, only those Hotfixes specifically listed by Avaya as not compatible are restricted from installation.

What this means for the traditional customer is that the standard IT Security policy of installing the latest Microsoft Hotfixes to ensure OS security is now part of the approved processes for Avaya Aura Contact Center Servers. As long as the Hotfix was released prior to the last published date of the bulletin, and as long as Avaya has not discovered a specific fault, the Hotfix is supported for installation on AACC systems.

As of this blog post, all Microsoft Hotfixes released by Microsoft on or before 10 Feb 2015 are approved for installation on Avaya Aura Contact Center, if the AACC is Release 6.4 SP14. Service Pack 14 was released mid-December 2014. For older systems (AACC SP13 or earlier, or any NES CC or Symposium systems), the older policy remains in force. Only those specifically tested and approved by Avaya are allowed to be installed, and for extremely old systems (NES CC or Symposium) installed on Windows 2003 Server or earlier operating systems, the Microsoft end of life is relevant.

Avaya Aura Contact Center runs on Windows 2008 Server R2 with specific server hardware engineering requirements. [Avaya credentials required] For more information about server specifications, please refer to the linked documentation or contact your support partner for assistance in ensuring hardware compliance.

Take Away

From a partner support perspective, this makes checking compatibility a much simpler endeavor– as long as the system is on SP14 or later, if the Hotfix isn’t listed then it’s OK to install. So the business partner need only look to see if any patches were installed after the “released before” date on the bulletin and only check those (or look for a limited number of specifically restricted hotfixes.)

From a customer support perspective, this ensures that AACC server OS security is capable of being much more current than it ever has been before in the history of the AACC product line.

This is great news for all concerned!


First, consult your support partner. Take their direction over anything you read on the internet. Installation of Service Packs for AACC is (these days) virtually a full dot release upgrade instead of the simple patch window we used to have with early AACC Service Packs or NES CC Service Updates. My experience is that instead of having a 2-5 hour window, windows are now consistantly 4-7 hours, and potentially much longer if the system is Highly Available. And that doesn’t even take into account the pre-upgrade engineering that is necessary to ensure you don’t upgrade and then find yourself exceeding the hardware requirements on the AACC’s Windows 2008 Server hardware.

Second, if you are on anything prior to AACC 6.4 Service Pack 14, you should update to SP14 ASAP. This addresses many of the most common and well known issues on the AACC. Similarly, if you are on anything prior to AACC 6.x you should upgrade now. Windows 2003 Server will soon reach end of life. This will obsolete NES CC6 and NES CC7 even more so than it is obsolete now (since those systems are “functionally stable” and there are no “corrective content” plans for this manufacture discontinued product version.) There are many reasons why you should upgrade, but to keep this focused on OS Security and Microsoft Hotfix compatibility, Windows 2008 Server will continue to receive additional Hotfix content. Windows 2003 Server, and earlier, will not.

Third, in the process of upgrading to SP14, you or your support partner should carefully review the readme to determine all of the known issues and known fixes for associated systems. There are engineering considerations on the PBX, PBX patches, Callpilot versioning (if you have ACCESS ports) and other considerations that should be taken into account. Some considerations aren’t part of the standard PBX DEPLIST, and by updating the DEPLIST the PBX patch required by the AACC Readme gets removed, resulting in recurring maintenance issues.

Jun 112014

Updated my Google doc table of IP Phone firmware:

Nov 122013


  • Ports, Cards or entire Shelves disable during midnight routine.
  • NWS messages generate during midnight routine
% NWS301 8 0 : -1 -2 -3 -4 -5 -6
% NWS101 1 : 24
% NWS211 24 : 0 1


  • Avaya CS1000, all releases
  • Nortel Meridian-1, all releases
  • Digital phones only


  • Cabling issues and/or unplugged phones cause “continuity test” failures during midnight routines.
  • After sufficient number of port-based continuity tests fail a card reports a failure
  • After sufficient number of card-based continuity tests report failures the shelf reports a failure


  • If a phone is removed from the jack, restore or de-program
  • If a phone cabling issue exists, fix


  • I saw this for the first time when I was working for HellerEhrman. The site’s telecom tech would deploy phones where needed, moving phones from existing workspaces to new workspaces and document in a personal document all unused terminal numbers (TNs) for later re-use but did not de-program them. This was “speedier” for them than removing & reprogramming TNs. Doing this allowed them to save the time of programming the entire TN, they just plugged a new phone in, re-enabled the port, changed the DN and they were good.
  • However, users began reporting phones were disabling during midnight routine and had to be manually re-enabled next business morning.
  • Issue escalated to me (Firm-wide Telecom team).
  • I’d never seen this particular issue before and did not know root cause.
  • I performed routine troubleshooting and recommended several corrective actions, including routine maintenance (cleaning up TNs, etc.) but having no authority over the site tech (not being able to force them to do the recommended work and I did not know that the absence of routine maintenance was the proximate cause) I was told to escalate to Nortel (via our Service Provider).
  • Service provider had not seen it before and escalated to Nortel
  • Nortel indicated performance of routine maintenance. i.e., clean up all programmed TNs that were not going to be put back into service or reconnect a phone to any TN that needed to remain.
  • Issue resolved.

I’ve seen a couple of these tickets recently at my place of employment. So far each one appears to be the same cause/solution. I’ll post a comment later if I learn anything new.

Nov 072013


When IP Phones enter a reboot loop, attempt to “upgrade”, fail, then reboot again, or
When IP Phones enter a reboot loop, attempt to “upgrade”, fail with “FW authentication failure”, then reboot again


Avaya CS1000

UNIStim 5.0 or earlier

Avaya IP Phone 1100, Avaya IP Phone 1200


UNIStim firmware is digitally signed.

Signature has an expiration date.

UNIStim versions prior to 5.0 had shorter expiration dates.

New IP Phone hardware will not load firmware with expired signatures.



Use UNIStim 5.1 or later firmware.

Avaya has applied a digital signature with a 10 year expiration date to UNIStim 5.1 and later.

UNIStim 5.5.1 (C8T) released in Aug 2013.

I updated my Google drive table of UNIStim firmware releases.


Nov 032013

Recently worked on an AACC (Avaya Aura Contact Center) where the partitioning of the server was determined to be the cause of the problem. While Disk Management (diskmgmt.msc) is easily accessible from START>>RUN, a screenshot is not quite as portable as text. To that end (and as a recommendation for addition to the Nortel Enterprise Audit Tool, or NEAT, used to survey Contact Center servers for Avaya engineering), I put together a script to query WMI (Windows Management Instrumentation) for the necessary information.

WMI Objects:

  • Win32_DiskDrive
  • Win32_DiskDriveToDiskPartition
  • Win32_DiskPartition
  • Win32_LogicalDiskToPartition

Using WMI queries against these objects you can derive:

  • Win32_DiskDrive => Physical Device ID (.\\.\PHYSICALDRIVE0\)
  • Win32_DiskPartition => Partition Device ID (Disk #0, Partition #1) and a derived type (e.g., Simple Volume? Primary Partition? Extended Partition/Logical Drive?)
  • Win32_LogicalDiskToPartition => Logical Drive Device ID (D:)

For quick “automated” checks of a system to verify compliance with engineering guidelines, this is a must.

Sample output:

\\.\PHYSICALDRIVE0,Disk #0, Partition #2,Basic,True,C:,Primary Partition
\\.\PHYSICALDRIVE0,Disk #0, Partition #3,Basic,False,D:,Extended Partition/Logical Drives
\\.\PHYSICALDRIVE0,Disk #0, Partition #3,Basic,False,F:,Extended Partition/Logical Drives
\\.\PHYSICALDRIVE0,Disk #0, Partition #3,Basic,False,G:,Extended Partition/Logical Drives
\\.\PHYSICALDRIVE0,Disk #0, Partition #3,Basic,False,T:,Extended Partition/Logical Drives


\\.\PHYSICALDRIVE0,Disk #0, Partition #2,Dynamic,True,C:,Simple Volume?
 \\.\PHYSICALDRIVE0,Disk #0, Partition #3,Dynamic,True,D:,Simple Volume?
 \\.\PHYSICALDRIVE0,Disk #0, Partition #3,Dynamic,True,F:,Simple Volume?
 \\.\PHYSICALDRIVE0,Disk #0, Partition #3,Dynamic,True,G:,Simple Volume?

The cool thing is that the script is applicable for all systems going back to Windows 2000 (Symposium 4 if I recall correctly) when the WMI query objects were instantiated in the OS by Microsoft.

Oct 182013

In a previous post, I talked about an experience I had in documenting for co-workers how to set up the CS1000E. The root cause of that documentation was the excessive amount of time I spent cleaning up after a field person who refused to install this properly and the subsequent complaints from customers on why the phone system went down during maintenance windows.

Having the ability to add system redundancy (or resiliency) does not necessarily mean that a customer requires said redundancy, but sometimes the lack of the redundancy is not a factor in the customer’s thinking. Sometimes, knowing you can prevent something (if you pay the associated costs) is not worth the money or time.

This is a different decision from arguing that something doesn’t work a particular way– and today I ran into this problem with a customer who did not have the necessary redundant network connections and experienced an outage as a result.

In this particular case, either the cable and/or data switch port went bad. Had the customer installed the necessary redundancy, the failure of a single port would not have been noticed and the system would have kept on trucking.

As part of the post event discussion, I walked them through how the architecture supported additional redundancy and the extent to which that redundancy can be expanded. I decided to work up a diagram to more fully explain what I was talking about.

This diagram shows a CPPM CS (Call Processor Pentium Mobile – Call Server) connected to the passthrough port on the MGC. The passthrough port permits you to simulate increased CS redundancy to the ELAN network by passing though to either active MGC ELAN interface.

The downside of this connection is that if the MGC undergoes maintenance, or the cable goes bad, you still have a single-point-of-failure.

I would do this primarily only when the environment is also going to deploy redundant TLAN/ELAN data switches for increased network resiliency. Otherwise, connecting the CS directly to the ELAN network makes more architectural sense to me. (That way if you’re installing loadware on the MGC associated with the CS, you don’t cause outages to the entire system when the MGC is rebooted– although there are architectural decisions that can be made to work around some of that as well but we’re not going to cover every possible scenario in this article. Please feel free to comment below to engage in a discussion if you have questions or want to share your observations.)

The diagram also shows the redundant connections from the MGC (faceplate & rear-chassis) connected to a redundant data network. NOTE: I do not show the data switch connectivity with the rest of the network. That’s sort of beyond the scope of the CS1000 resiliency feature. You can, I’m sure, get the gist of it from this article.

Oct 032013

Recently, a number of customers have been experiencing a rash of heartbeat issues between CS1000 components. In this article, I’m going to walk through some of the troubleshooting I’ve done recently and match symptoms to cause.

CS1000 RUDP Heartbeat behavior

CS1000 RUDP Heartbeat behavior

Thump, Thump: The Heartbeat

There are several heartbeats between different CS1000 components. Call Servers (CPUs) have their own hearbeat, an highly available redundant system will have a heartbeat over the high-speed pipe (HSP). CS using Geographical Redundancy (GR) Callpilot and Avaya Aura Contact Center (AML implementations) have a heartbeat between each of themselves and the Active CS. CS and IPMGs (IP Media Gateways, i.e., Media Gateway Controllers aka MGCs, or Voice Gateway Media Cards aka VGMCs aka MC32s) have a heartbeat.

Ports used

The RUDP Heartbeat uses port 15000 for source/destination. The 60 byte packet has a data payload of 6 bytes. While I haven’t worked out the meaning of the contents of all 6 bytes, I have worked out that the first 4 bytes are a sequence number that increments every successful round trip and the last 2 bytes are used as a send/receive flag. (i.e., 0x02ff for the originating side, 0x0100 for the responding side.)


Reliable User Datagram Protocol (RUDP) was developed to support IP communication needs increasing reliability without the TCP overhead.

Heartbeat Process

The CS sends an RUDP Heartbeat over Port 15000 to the far end device (e.g., an MGC) to port 15000. The far end device repeats the pattern back to the originating device. This repeats every 1000 milliseconds.

The far end device sends an RUDP Heartbeat over Port 15000 to the CS to port 15000. The CS repeats the pattern back to the originating device. This repeats every 1000 milliseconds.

Each RUDP Heartbeat uses it’s own sequence number and each successful heartbeat causes the originating device to increment its sequence number by one. e.g., the CS sends its payload, the MGC replies back, the CS increments its sequence number and 1000 milliseconds later send another RUDP Heartbeat to the MGC. Meanwhile, on a separate 1000 millisecond timer and using a different sequence number, the MGC sends its payload, the CS replies back and the MGC increments its sequence number, sending another RUDP Heartbeat 1000 milliseconds after the previous one.

This means that there should be four (60 byte) packets passed between any two devices engaged in a RUDP Heartbeat exchange: A heartbeat from each side, as well as a response from each side.

NOTE: I don’t have a lab system to test what number is used to start the sequence– but I suspect that the number is either randomly generated or derived from the current unix time value. There is also probably additional handshaking that goes on during RUDP Heartbeat setup, I certainly see traffic on various ports during periods when I get SRPT0308 and ELAN009/ELAN014 messages indicating session closure/restart. Without documentation though, or a lab and lots of time to decrypt the process, this will remain a mystery for the foreseeable future.

Sherlocking the problem (How to Troubleshoot)


HBdebug can be used to enable additional diagnostic information output to the PDT rd logs (rd, rdall). While the documentation claims it will output heartbeat diagnostics, I have seen SRPT0308 and ELAN009 events without a corresponding increase in rd log data.

However, I have also seen SRPT016 and ELAN009 which provide additional diagnostic information. Based on this difference in behavior, I’ve come to an educated guess about what it means when you get an ELAN009 with HBdebug diag info and what it means when you don’t get the info. Without HBdebug diag info, the most likely cause of the error message is a Firewall (i.e., Stateful Packet Inspection closing RUDP Port 15000 heartbeat sessions for “idle session” reasons. Even though the RUDP Heartbeat transmits at least 120-240 bytes per second back and forth between the two devices.)

Packet Captures

Packet captures covering the event are helpful to Avaya for troubleshooting root cause. When performing a packet capture, it is best to obtain them from a mirrored port of the CS and another mirrored port of the remote device experiencing the connectivity problems. While I’ve become proficient at reading the pcap log, it will take time for any new troubleshooter to become familiar with what’s in the logs and to be able to use them to self-diagnose the problem.

I like using the Wireshark filter string ((ip.src== || ip.src== && (ip.dst== || ip.dst== && udp.port==15000) — If is the CS and is the remote MGC, this will find any packets coming from either of them that are also destined for either of them while also being on UDP port 15000. Adjust IP addresses as needed and the UDP port as needed (15000, 32779 or 32780).

Network Analysis

Get a network topology & configs. Look at intermediate devices like Firewalls, WAN networks and for everything from physical layer issues up to network prioritization (QoS).

Why it happens (Root causes)

  • ELAN heartbeat traffic is supposed to be treated as Real Time traffic. When you’re bridging the ELAN across multiple physical locations, or routing it across a WAN, it’s best to give this traffic Expedited Forwarding (DSCP 46) so that it is routed as one of the highest priority packets in your network. The Heartbeat is “network control” after all.
  • WAN (carrier) may not support QOS. Packet loss, jitter and latency can all be treated as “lost” heartbeat packets.
  • Firewalls may not support QOS. For intra-site Firewalls, the QOS tags may be dropped at the Firewall– for this the network team needs to evaluate prioritization rules on ingress from the firewall (either side) to re-tag the packets. The Firewall may not support QOS, but if the equipment on both sides is prioritizing the traffic sent to the Firewall and re-tagging it on egress, it doesn’t make as much of a difference.
  • Firewalls may block traffic (Access Control Lists)– engage the network security team to make sure all necessary packets are allowed.
  • Speed/Duplex mismatches (or Autonegotiation failures)– Physical layer trumps everything. Trace your cables and make sure everything is connected. Perform network testing to identify any potential
  • Congestion, high network utilization, broadcast storms, etc.– Make sure there is sufficient bandwidth to serve your applications. This is rarely an issue in LAN environments, but is certainly an issue for numerous WANs.
  • Outage- equipment goes down, cables are removed/cut, WANs suffer failures reducing available bandwidth.

Other survivability considerations

  • Geographically redundant systems use RUDP Heartbeat.
  • Highly Available systems use RUDP Heartbeat over HSP, and will fail over to ELAN if possible.
  • Each CS/CPU in a CS1000E system uses RUDP Heartbeat with every other device in the environment that participates in Heartbeat activity. Unlike legacy CS1000M systems, the redundant CPU is not inactive.
  • IPMGs (VGMCs & MGCs) use RUDP Heartbeat to each CS. IPMGs and CSs also participate in an IPMG heartbeat process (using UDP Port 32779 & 32780 with a different payload structure than the RUDP Heartbeat sent over Port 15000.)
  • Branch Office CS participate in RUDP Heartbeat with the Main Office CS.
  • Secondary NRS participate in ICMP Heartbeat with Primary NRS, as do different UCM and SS systems.
  • Callpilot & AACC communicate over AML IP Phones
  • Callpilot & AACC communicate with each other of a ACCESS port connection.
  • IP Phones participate in RUDP heartbeat with the SS (as noted above, this is somewhat configurable).

Symptoms to Cause

  • Alarm pattern: CS1000 SRPT0308, ELAN009 / No reboot of remote device – Firewall Stateful Packet Inspection timer closes RUDP Heartbeat between devices.
  • Alarm pattern: CS1000 ELAN009, SRPT016 IPMG DOWN / Remote device reboots – Use LastResetReason on IPMG. Most likely due to loss of network connectivity due to packet loss, latency or jitter. Use HBdebug to perform further diagnoses and/or obtain a packet capture from both ends. Network analysis may be required to resolve.
  • IP Phones reboot – use usiQueryResetReason to obtain last reset reason. If caused by RUDP Heartbeat retry exhaust, evaluate RUDP Retry settings on linuxbase (UCM) and network cause.

Useful commands & notes

  • IP Phone RUDP status display & toggling state – Mute Up Down Up Down Up * 2 – Current RUDP state appears and one softkey is available to switch state, another softkey is available to exit. (Not available on all UNIStim releases.)
  • Linux/VGMC pbxLinkShow – Show link state, including RUDP information
  • PDT/VGMC rudpShow – Show RUDP information
  • VGMC usiQueryResetReason – Show reason for last reboot of an IP Phone
  • IPMG LastResetReason – Show reason for last reboot of an MGC
  • SS/Linux usiGetPhoneRudpSettings (7.5 or later) – Show the Retry & Timeout settings for RUDP Heartbeat between IP Phones and TPS (Terminal Proxy Server, located on the SS)
  • VGMC/SS/IPMG TPS.INI contains a retry limited called rudpWindowSize


  • Troubleshooting Guide for Distributors contains a section entitled “IPMG Call Server heartbeat mechanism” which talks more fully about the heartbeat mechanism between the CS and IPMGs. It also provides several examples of outages and the alarm pattern.
  • Search for RUDP in the Documentation for other references

Sep 262013

A co-worker was recently tasked with providing cross training for a product which I do not have much experience with on the topic of T1 troubleshooting and alarm clearing. After getting this class, I decided it would be fun to put together something similar (in blog format) for CS1000 T1 alarm clearing.

Receiving an alarm

There are a variety of PRI alarms, but we’ll take one of them as an example:

DTA021 [loop]

Interpreting the alarm

Some systems have part of the alarm lookup database on-system which can be accessed via the Overlay Loader (OVL000 and a > prompt) using the ERR command


If the alarm library is loaded with that alarm, then you’ll get the help text. If not, you’ll get an error:

OVL441 Help text not found for error code: [code]

All alarms in the documentation are in 4 digit length after the 3 letter alarm group code. DTA are digital trunking alarm. 021 is the specific alarm. Finding it in the documentation can be done by searching for DTA0021. From the documentation we get the alarm text:

Frame alignment alarm persisted for 3 seconds

Responding to the alarm

Let’s talk briefly about some of the different tools available for troubleshooting:

  • LD 60 / Digital Trunk Interface and Primary Rate Interface Diagnostics
  • LD 96 / D-channel Diagnostics
  • LD 20 / Print Routine 1 – Use to identify a trunk’s association with a Route Datablock (RDB).
  • LD 21 / Print Routine 2 – Used to list trunk members to determine trunk group associations between PRIs/DTIs.
  • LD 22 / Print Routine 3 – Used to print common equipment (CEQU) and D-channel configuration (ADAN DCH)
  • LD 73 / Digital Trunk Interface – Used to check clocking (DDB)

Digital Trunk Interface and Primary Rate Interface Diagnostics

DTI and PRI diagnostics (LD 60) cover a variety of tasks, you can: enable/disable loops, clocking, individual bearer channels (B-channels or BCH) and print/clear counters.  For a full list of commands, see the Software Input/Output Reference – Maintenance (NN43001-711).


STAT [loop]
STAT [loop channel]
Show status of all loops or loop specified. Loop status include loop state and BCH state.
LCNT [loop]
List counters
RCNT [loop]
Reset counters
SSCK [core]
SSCK [loop shelf]
Show system clock. Includes which circuit is being used for primary clocking and clock state.
SWCK Swap clock from current active to current standby
TRCK [Source] Set clock controller tracking to PCK/Primary Clock, SCK/Secondary Clock, FRUN/Free run-no clock.
ENLL [loop] Enable loop
ENCH [loop channel] Enable B-channel
DISL [loop] Disable loop
DISI [loop] Disable loop when idle. Disables any IDLE channel then waits till other channels are disabled. Loops until all B-channels are disabled then disables loop.
DSCH [loop channel] Disable B-channel

D-channel Diagnostics

DCH Diagnostics covers: enable/disable d-channels, d-channel monitors, and work with MSDL or TMDI cards. On larger legacy 1000M systems, the Multipurpose Serial Data Link (MSDL) card is used to provide D-channel functionality. On smaller 1000M systems and newer 1000E systems, the D-channel functionality is built into the TMDI (T1 Multipurpose Digital Interface) card. In this article, we will not be discussing troubleshooting D-channel diagnostics for MSDL cards on larger 1000M systems.


STAT DCH [dch]
Show status of all/specific DCH.
ENL DCH [dch] Enable DCH
DIS DCH [dch] Disable DCH
STAT TMDI [card] Show status of TMDI. (CS1000M small system)
STAT TMDI [loop shelf card] Show status of TMDI. (CS1000E)
DIS TMDI [card] Disable TMDI. (CS1000M small system)
DIS TMDI [loop shelf card] Disable TMDI. (CS1000E)
SLFT TMDI [card] Selftest TMDI. (CS1000M small system) Performs multiple hardware tests to verify TMDI is functional.
SLFT TMDI [loop shelf card] Selftest TMDI. (CS1000E) Performs multiple hardware tests to verify TMDI is functional.
ENL TMDI [card [fdl]] Enable TMDI. (CS1000M small system) Optional FDL/Full Download of TMDI EPROM.
ENL TMDI [loop shelf card [fdl]] Enable TMDI. (CS1000E) Optional FDL/Full Download of TMDI EPROM.
PLOG DCH [dch]

Print Routine 1

The command architecture for the CS1000 is built on the older Meridian-1 systems, which in turn is built upon the even older SL-1 systems. When the SL-1 hardware architecture was replaced or improved, Nortel introduced new commands or Overlays as needed, all while keeping the essential command structure introduced with the first SL-1 system in the mid-1970s.

Print Routine 1 covers peripheral programming, including the bearer channel (BCH) configuration for a T1. From the Terminal Number configuration of a BCH, it is possible to identify the route membership for a particular channel, and by extension the T1. (While it is technically possible to configure different channels within a T1 to belong to multiple routes, I’ve never seen this and excepting MUXed circuits I am not aware of any reason why it might be done.)

Print Routine 2

Print Routine 2 covers customer datablock configurations, including route datablock (RDB) settings. By using the List Trunk Members command, it is possible to identify all of the BCH (and by extension all the T1s) that belong to a particular route (i.e., trunk group).

Print Routine 3

Print Routine 3 covers hardware and system configuration data, such as the Common Equipment (CEQU) datablock and Action Device and Number (ADAN) datablock, the latter of which is used to store information about D-channel configuration.

Digital Trunk Interface

When building a PRI/DTI in a CS1000 system for the first time, the Clock Controller and Alarm Threshold values must be set. For systems in the USA, the DDB (digital data block) configuration record contains the relevant configuration settings.

Diagnostics vs Print Routines & configuration

LD 60 and 96 are used primarily for diagnostics. Overlays 20-22 and 73 would be used to configuration review to assist with diagnostics. For the purposes of this article, we will assume that a configuration issue is not at fault. Perhaps in some future article I might cover PRI configuration in more detail.


NN43001-611 Software Input/Output Reference – Administration

NN43001-711 Software Input/Output Reference – Maintenance

NN43001-712 Software Input/Output Reference – System Messages

NN43001-301 ISDN Primary Rate Interface Installation and Commissioning

Sep 192013

LLDP-MED for Avaya CS1000 IP PhonesIncreasing boot efficiency is one of those things I’m working on. My personal or work PC, my IP Phone, systems I manage. The less time I have to spend sitting around waiting for something to boot up is more time doing something productive. On the PC, that involves looking at your startup folder, your registry run folders and removing any unnecessary services from automatic startup.

For Avaya CS1000 IP Phones, that involves looking at the config and determining which features can be added or removed to achieve an optimal boot up sequence.

Although my 4st post is not live yet (when it is, it will be here), in it I cover Link Layer Discovery Protocol (LLDP) and how it applies to Avaya CS1000 IP Phone deployment. On of the biggest inefficiencies I’ve found in CS1000 IP Phone deployments is where customers leave LLDP enabled but don’t use it.

ZzzWaiting for LLDP-MED (Link Layer Discovery Protocol, Media Endpoint Discovery) can add as much as 30 seconds delay to the boot process… So disable it if you’re not using it!

With stickiness, you can configure the Phone to not use LLDP on bootup, or you can disable it manually at each phone by turning it off.

On the other hand, if use LLDP you might increase boot efficiency by distributing the configuration of the IP Phones and reducing dependency upon DHCP. If you want to configure the Voice VLAN but don’t use LLDP, your options are to manually configure each IP Phone or use the VLAN-A option to assign a Voice VLAN ID.

Avaya CS1000 IP Phone, DHCP provisioning behaviorIf you use DHCP though, you’re going to be querying the DHCP server (or multiple servers) multiple times.

It’s certainly faster than waiting for LLDP-MED to time out, but using LLDP-MED is faster than multiple DHCP queries (Although talking a fraction of the delay caused by LLDP-MED being enabled and unused.)

It’s also a good idea to reduce the number of retries to allow the IP Phone to failover to an alternate signaling server (i.e. Connect Server) more quickly.

Take away:

  • If you’re not using a feature, disable it. Your phones will boot faster and you’ll recover more quickly from maintenance windows or disaster.
  • Nortel-i2004-B,s1ip=;p1=4100;a1=1;r1=3;s2ip=;p2=4100;a2=1;r2=3;vq=y;st=y;lldp=n;vvsource=a;

Sep 172013

Avaya CS1000 IP Phone registration process, DHCP (c) 2013 John Williams, DATARAVEIn my 3rd company blog article, I talked about DHCP and auto configuration via DHCP. There are fifteen different feature groups (IP Deskphones Fundamentals, NN43001-368, Appendix B: Provisioning the IP Phones) and 100+ different settings configurable via DHCP (I didn’t count them, I’m estimating). Some of the gotchas I’ve learned over the years are as follows:

  • As noted in the 3rd blog post, Stickiness can cause deployment issues for refurb phones– Always add a factory reset to your troubleshooting efforts (**RENEW[MAC]##), which was introduced in UNIStim 3.0, otherwise your refurb phone may not operate as expected when first deployed.
  • When you have multiple Signaling Servers– Always make sure that both Sig Servers are using the same IP Phone firmware, otherwise your phones may “upgrade” to an earlier firmware release when they failover, resulting on extended and unexpected downtime.
  • Know your current UNIStim firmware. There have been issues in the past where newer IP Phone hardware requires a minimum firmware level, which means that all IP Phones in the environment must be upgraded in order to deploy new phones.
  • Learn how to use PRT TNB commands in LD 20 and STAT IP commands in LD 117 (I’m actually planning on doing a future blog post on this topic, since the LD 117 commands can be used to also inventory your IP phones)–
    • Make sure that the TNB is programmed, otherwise you’ll get SRPT062/SRPT0062 – Request to register TN rejected. Reason = UNEQUIPPED.
    • Make sure the set isn’t deployed elsewhere, otherwise you’ll get SRPT062/SRPT0062 – Request to register TN rejected. Reason = DUPLICATE.
    • Make sure the set is the right TYPE, otherwise you’ll get SRPT062/SRPT0062 – Request to register TN rejected. Reason = WRONGTYPE.
    • Make sure the set is of the right TN_TYPE (covered in part 2 of the company series), otherwise you’ll get SRPT062/SRPT0062 – Request to register TN rejected. Reason = CANNOTCONVERT
      • 2004P1, 2004P2, 2210, 2211, 2007, 1140, 2050PC, 2050MC, 6140
      • 2002P1, 2002P2, 1120, 1220, 6120
      • 2001P2, 2033, 1110, 1210
      • 1230
      • 1150
  • Know how to troubleshoot your data network– Always make sure that your provisioning method is supported in the subnet where the IP phone is being deployed and know how to check the data network configuration, otherwise your phone might get stuck at “Starting DHCP…” or “Connecting to S1…” (or “Connecting to S2…”)
    • I can’t tell you how many times I’ve seen a telecom tech install an IP Phone into a brand new subnet which has no DHCP configuration options, or a new data switch without correct VLAN settings, and then request help– only to be landed with a bill for not setting up the new subnet correctly.

Another thing that I find extremely helpful is learning the provisioning cycle and the status messages which display on the IP Phone during boot up:

Provisioning Step Display message
LLDP Waiting for Cfg Data…
DHCP Starting DHCP…
Manual Provisioning Prov. (or whatever IP you have)
system.prv reading…
(system.prv failed may display)Attempting TFTP…
Registration (pre-connect) Connecting to S1
Connecting to S2
Registration (post-connect) Connect Svc
Node & TN
Connect Svc

This way, if you get stuck at a particular phase, you know where and can use that to determine your next troubleshooting step. For example;

  • If you get stuck at “Starting DHCP…” then you should check (a) is your DHCP server still running, (b) are you in a subnet that provides the DHCP options you need, (c) can your IP phone reach the DHCP server, (d) does the IP Phone firmware support the features you’re trying to push to it?
  • If you get stuck at “Attempting TFTP…”, (a) is your TFTP server still running, (b) can you IP Phone reach your TFTP server, (c) are your PRV files correctly configured?
  • If you get stuck at “Connecting to S1” (or S2), (a) is your IP Phone in the right VLAN, (b) Can the IP Phone reach the signaling server, (c) is the signaling server still running?
  • If you get stuck at “Connect Svc Forwarding…” then there is something wrong with your signaling server– it’s responded to the initial connect request, and it’s forwarded the request to the signaling server but the signaling server isn’t responding correctly to complete the registration process.