Avaya FSY – Filesync Alarms
Learn the ins and outs of the Avaya file synchronization
In this post, “Avaya FSY – Filesync Alarms,” you will learn the 5 ways to identify the file synchronization errors and alarms caused whenever the main server is not able to synchronize its data files with its peers (ESS and LSPs).
The File-synchronization or Filesync (rsync) uses the “Filesyncd” application to manage the files from the Core system out to its peers. The more errors I found, the more I found myself going in circles and chasing my tail. To help you pinpoint the core issues related errors and alarms to “filesync,” I have created this resource, which will help you save time and learn along the way.
Here are the troubleshooting procedures:
- 1.- Understand FSY Alarms
- 2.- Identifying the FSY-Filesync
- 3.- Document existing alarms
- 4.- Filesync Troubleshooting Commands
- 5.- Execute the plan
1.- Understand FSY Alarms
As files are shared within the server file structure and the rest of its ESS/LSP branches, it is important that the flow of these files are not interrupted whenever they are being written or transferred. The two families of the Filesync alarms are categorized on=
Fileset Configuration File Errors – These are associated with the files shared by the system itself. The event ID for these type of alarms ranges from 1000 to 1999.
Server Communication Failure – As the description states, it identifies the series of errors or alarms related to the Core Server communication or connectivity to its siblings and the ID Ranges from 1 through 999.
See the Resources section for more details on the error IDs.
2.- Identifying the FSY-Filesync
The most common file synchronization errors are the “Server Communication Failure; therefore, like anything else that involves networking, Ethernet LANs, in a packet-switched network, timing is a vital instrument that needs to be negotiated and set similar in all of the nodes in order to keep the systems synchronized.
The following are some of the elements to look at=
Speed and Duplex – It is hardly seen nowadays, but never assume someone else’s configuration. Assure that these are set the same across the board.
Local or Network Time Settings – This is a very important setting that most of us forget to check, as packets travels through the network, they have to arrive at a certain time.
Firewalls and Routers – These should be configured to allow the necessary ports and packets to arrive to their destination.
System Files – Patches have to be equal or newer at each server. Some other files that you should be aware of are the License and Authentication fIles.
3.- Document existing alarms
Now that you know what type of Filesync alarms your server has, it is time to look at some log files and identify some of the reasons why the files are not been able to reach its destination, or maybe why they are being rejected by the far end system.
These type of alarms are server generated, which means that you need to Web-Console to the Avaya Aura Communications Manager Maintenance interface and check under the Alarm’s section. Or connect via PuTTy’s ssh console and rung the command “almdisplay”.
ID Source EvtID Lvl Ack Date Description
3 FSY 1 MIN Y Wed Jun 07 11:22:24 JST 2017 filesync failure (server): 192.168.1.5 LSP
ID= Alarm Number (this is the third alarm in the server)
Source= Self explanatory
EventID= This helps you detail and compare the alarm.
Ack and Date= Are self explanatory
Description= In this case “filesync failure” (server) where the alarm has been generated, from the LSP’s IP Address.
With the alarm identified, it is time for you create a plan to troubleshoot the server and network connectivity between the nodes affected by the file-synchronization alarms.
4.- Filesync Troubleshooting Commands
The following are a series of commands and procedures. I suggest that you backup the existing data before you start working with the filesystem. This might corrupt the database if the system is under production.
statapp – display the active and installed server applications in the server including the “Filesyncd”.
/etc/hosts – The hosts file contains the server name and associated IP Addresses.
/sbin/ifconfig – Display the Ethernet interfaces and its IP Addresses.
ping – Use ping to see how fast the ICMP packets travel through the network.
If you can not reach the remote host. check the routing tables, by running the command “netstat -rn”. Here you are looking for your NIC and which default route the system using.
traceroute Run a trace route command to see which path the packets are taking to get to its destination.
Now with the routing, duplex, and speed configured correctly, proceed to checking the filesync port connectivity by running=
cat netstat -tan | grep 21874 (Or – > netstat -tan | grep 21874) = Network Active Applications Port + t= offload connection state (Established, Closed_Wait, etc) + a= estate up or down + n= network address. Only printing the filesync TCP port 21874. The output should show=
tcp 0 0 0.0.0.0:21874 0.0.0.0:* LISTEN
As show above the port is listening, your next step is testing that no Firewalls or any other rules in the network is blocking the filesync 21874 port. This can be accomplished by running a telnet command + destination IP Address + 21874
dadmin@COREServerA> telnet RemoteLSP01 21874
dadmin@COREServerA> Trying 192.168.1.5,… Connected to COREServerA. Escape character is ‘^]’.
Above shows that port 21874 was successfully connected to the LSP. If this test fails, you need to remote into the LSP and make sure that the TCP 21874 port is also listening and running.
From the Remote site, you also can check its Primary CM IP Address by running “cat /etc/ecs.conf | grep -i sray.”
Now that you have confirmed that the network is configured correctly, move on to checking the default gateways and Firewalls.
Time Server Troubleshooting Techniques
An NTP Server helps the servers keep their time synchronized, avoiding the time drift causing future problems. These can be configured with a FQDN or IP Address.
In CM’s shell run the “nslookup” tool to check the FQDN and IP Addresses related to the NTP Server.
date – This command displays the system time and date. Remember to keep each server with the correct Time Zone and Country.
Other Places to Look
grep ‘filesync -d’ commandhistory*
Check the LSP Licenses by running the “statuslicense” command.
Output= CommunicaMgr License Mode: Normal WebLM server used for License: https://192.168.1.5:52233/WebLM/LicenseServer
displaypwd – This command display the Authentication file.
In my case the remote sites were missing the Authentication file causing the Filesync to time out, unable to communicate with the Core system.
On a good LSP the output =
Unable to display authentication file data; no authentication file installed.
Another thing that you might notice is SSL Certificate errors. The LSP uses the system Authentication files to exchange information between the LSP and Core system. Head over to /etc/opt/ecs/certs/server or /etc/opt/ecs/certs/CM/ID. You are looking for the Server Certificate named “Server.crt”.
Now that you have located the Server.crt Certificate, use the “openssl” tool to test the expiry of the certificate. For more details refer to the Resources “Avaya Communication manager File Synchronization overview and Troubleshooting White Papers”.
You can send the files on-demand from the main server by running=
filesync -w -a lsp 192.168.1.5 trans
Verify the log files to capture any errors related to filesync, by running “logc today | grep filesync”.
Another way to check the filesync error is by extracting the filesync events from the logs generated under /var/log/ecs. These files are named with the time and date. Using the linux “cat” utility to read the file, and searching with “grep” you can extract the right information. e.g.=
cat 2017-0627-021613.log | grep -i filesync
The bottom result shows how the SSL Certificate failed. One of the main reason was related to the system time, in other cases you might need manipulate the Folder and Files attributes, making sure the account has enough rights to read/write.
20170626:232143850:102006:filesync_rcv(23498):HIGH:[ERROR: ssl_read: SSL_read failed: premature close 5] 20170626:232143850:102007:filesync_rcv(23498):HIGH:[ERROR: rcv_files: filesync API failed] 20170626:232143850:102008:filesync_rcv(23498):HIGH:[ERROR: ssl_close: shutdown call failed: Transport endpoint is not connected]
Other example, show how the Authentication File was not found in the LSP, causing a time out.
dadmin@ RemoteLSP01 > cat 2017-0629-202118.log | grep -i filesync
20170629:202545645:11862434:filesyncd(7886):HIGH:[ERROR: connect_timeout: connect timed out]
5.- Execute the plan
By now you know what is causing the Filesync alarms and what steps you have to take to fix it. You have to translate that to the customer the best way you can. In this case, I uploaded the missing files and gave them the choice for them to schedule a system reboot or sign an authorization letter that allows me to do the System Platform reboot.
When dealing with more advanced system administrators, you should be more descriptive in which steps you are taking to troubleshoot along the way.
Question – Which other filesync commands are you executing when troubleshooting file synchronization issues?