Search This Blog

SBL-SSM-00005: Timeout occurred while opening SISNAPI connection

Applies to:

Siebel Communications CRM - Version 8.1.1.3 SIA[21219] and later
Information in this document applies to any platform.

Symptoms

Siebel 8.1.1.3 SIA[21219] is installed on IBM AIX on POWER Systems (64-bit)

Connected to srvrmgr at server level and "list comp" command is executed. Which resoluted in below errors:

SBL-SCM-00028: Key not found
SBL-SSM-00005: Timeout occurred while opening SISNAPI connection.



Changes

 No changes were happened and evevry thins was working as expected

Cause

srvrmgr might show SBL-SSM-00005 or SBL-ADM-02049 errors when trying to run simultaneously.

Solution

Rebooting all application servers including gateway resolved the issue.






Applies to:

Siebel System Software - Version 8.0.0.9[20433] and later
Siebel CRM - Version 8.0.0.9[20433] and later
Information in this document applies to any platform.

Purpose

This note is only applicable to version Siebel 8.0.0.9 and 8.1 and above, and illustrates troubleshooting steps including a workaround for the behavior described in the following alert:

Siebel Connection Broker May Be Blocked by a Hanging Object Manager Process Document 473950.1.

The scenario that SCBroker can not connect a new session to an existing object manager process because this process itself is hanging, can usually be determined by the following entry in the swse log file:
2011-05-13 16:31:54 56: [SWSE] Open Session failed (0xa600d1) after 60.1958 seconds.
ProcessPluginRequest ProcessPluginRequestError 1 000010844d5a0a7a:0 2011-05-13 16:31:54 56: [SWSE] Failed to obtain a session ID. Login failed attempting to connect to %1
ProcessPluginRequest ProcessPluginRequestError 1 000010844d5a0a7a:0 2011-05-13 16:31:54 56: [SWSE] Set Error Response (Session: Error: 10879185 Message: Login failed attempting to connect to siebel.TCPIP.None.None://hostname:2361/enterprise/SCCObjMgr_enu)
ProcessPluginRequest ProcessPluginRequestError 1 000010844d5a0a7a:0 2011-05-13 16:31:54 56: [SWSE] Login failed.
SBL-SSM-00005: Timeout occurred while opening SISNAPI connection.
SisnapiLayerLog Error 1 000010894d5a0a7a:0 2011-05-13 16:32:10108: [SISNAPI] Async Thread: connection (0x2cdadb8), error (1180682) while reading message
GenericLog GenericError 1 0000108a4d5a0a7a:0 2011-05-13 16:33:24 (ssmsismgr.cpp (0) err=0 sys=0) SBL-GEN-00000: (ssmsismgr.cpp: 0) errorcode = 0, system error = 0, msg1 = (null), msg2 = (null), msg3 = (null), msg4 =(null)
ObjMgrSessionLog Error 1 0000108a4d5a0a7a:0 2011-05-13 16:33:24 Login failed for Login name : SADMIN
ProcessPluginState ProcessPluginStateError 1 0000108a4d5a0a7a:02011-05-13 16:33:24 71: [SWSE] Open Session failed (0xa600d1) after 60.0111 seconds.

Note the last line where we see a connection timeout of 60 seconds displayed.

This is the built in timeout of 60 seconds after which a request from SWSE plugin to the SCBroker is cancelled.

From that message we can conclude that SCBroker was not able to transfer a login request to a working object manager on node 'hostname'

We can also conclude that the object manager process is still running since it is still in SCBrokers routing table, however it does not accept new connections.

This is usually caused by an MT process hang scenario.

Now we are facing two challenges:

a) which process id exactly  is hanging  ? Imagine if we have for example 25 MT server processes running on that node, this is not easy to determine.

and

b) why is this process hanging ?

In the next section we will describe how we can find out which pid is hanging, how to mitigate the situation and what diagnostic data should be captured to help with investigation of b)

Troubleshooting Steps

To help us in this situation there is a new SCBroker parameter that has been introduced with the Fixpacks as indicated above. Its name is ConnForwardAlgorithm or "Connection Forward algorithm for SCBroker".

This is a hidden parameter.

This parameter determines which algorithm is used to forward incoming login requests to MT server processes. There are two possible methods:

a) Least Loaded "LL", the default
and
b) Round Robin "RR"

Note that this parameter is an advanced parameter as can be seen in following srvrmgr example:
srvrmgr> list advanced param ConnForwardAlgorithm for comp SCBroker show PA_ALIAS, PA_VALUE, PA_NAME
PA_ALIAS                   PA_VALUE       PA_NAME
--------------------       --------       -----------------------------------------
ConnForwardAlgorithm       LL             Connection Forward algorithm for SCBroker



Although LL is advisable in terms of performance, in case of a SCBroker hang this is causing unwanted behavior: once SCBroker has identified the least loaded process, it will reconnect to this particular pid until next session is established. Now in the case that the supposed to be least loaded process is hanging, SCBroker will try that process again and again and we might see stalling logins on all web servers.

In this situation, the algorithm should be changed to "RR".
This has two advantages:

1) The hanging process will only be contacted once. Next attempt is going to next available pid as described in the routing table. The effect will be that only one login is affected, and the consecutive logins will be successful until the hanging process is revisited. In that way, all requests will be distributed among the remaining, non-hanging processes and end users should get server busy message only once and a retry should them connect to a working process.

2) By monitoring the task distribution you can identify the process id that is not getting additional hits.
This process id is very likely the culprit.

Now you can take a userdump or pstack  as per Document 478050.1 or Document 478027.1 for exactly this one single process and then bounce that pid afterwards. This should greatly reduce the effort to get the right call stack of the hanging process to TechSupport in order to do root cause analysis.


The following srvrmgr command can be used in order to switch the forwarding mode:
Using srvrmgr with the /s switch, connect to the siebel server corresponding to the hostname as identified by the hostname in the swse error message.

Then change the parameter:

change param ConnForwardAlgorithm=RR for comp SCBroker

Now restart SCBroker component:

shutdown fast comp SCBroker
startup comp SCBroker

Now you can monitor for which particular process id the number of running tasks value is not increasing anymore:

list procs for comp <interactive_comp_name> show TK_PID,TK_NUM_NORMAL_TASKS

If you encounter hanging processes on multiple server nodes, then you need to change the parameter value to RR on all of these nodes.

Please note that the SCBroker forwarding mode is local to each siebel server in the enterprise, so you can have mixed LL and RR configuration within an enterprise.

It should also be mentioned that existing, established sessions are not affected by the SCBroker component  bounce.

Established session will continue to work even when the SCBroker is temporarily unavailable.

By running the round robin scheduler mode  of versions as stated above, it is possible to distribute incoming login requests to all processes that in this moment are still able to process requests.

This new parameter is documented in the following documentation:
     Siebel 8.0 Deployment Planning Guide > Siebel Architecture Overview > About Siebel Connection Broker
     Siebel 8.1 Deployment Planning Guide > Siebel Architecture Overview > About Siebel Connection Broker

No comments:

Post a Comment