Search This Blog

SBL-NET-01201: Internal: connect() failed: %1


Applies to:

Siebel CRM - Version: 8.1.1 [21112] to 8.1.1 [21112] - Release: V8 to V8
Information in this document applies to any platform.

Symptoms

Customer reported the following:

In our new Siebel TEST environment, 2 Application Servers failing with Handshake failed error.

This is newly built Siebel environment, we have multiple servers
Ntsydasu304 - gateway and app server
Ntsydasu303 - App Server
Ntsydasu1186 - App Server
Ntsydwbu117 - Web Server

The Siebel gateway is up and running, all the application servers are started. But 2 application servers Ntsydasu304 and Ntsydasu1186 are failing with handshake Failed error

Cause


The issue seems to be caused by either port number being used by other non Siebel process, or the mapping between the hostname of the servers and the IP addresses. This analysis was based on the fact that when srvrmgr tried to communicate with the ServerMgr, it threw errors below:

- Handshake(siebel://ntsydasu1186:49162/es_obfstsb1/servermgr/ntsydasu1186) on conn 0x3121330 ok
- connect() to ntsydasu303:49168 failed (err=10060 | Connection timed out.
- connect() to ntsydasu304:49168 failed (err=10060 | Connection timed out.


Solution

For the benefit of other readers:

It was suggested to the customer to try the following:

telnet ntsydasu1186 49162
telnet ntsydasu303 49168
telnet ntsydasu304 49168

It is expected that the first one should be successful. If the 2nd and 3rd are also successful, please shutdown the Siebel servers and run the telnet again on the last 2 servers, to verify if other non Siebel process is listening on port 49168.

If the second and third fails or not responding, please try telnet the IP address. If successful, then there seems to be problem with the host-IP mapping, please verify.


By following the above steps, customer was able to identify the cause of the problem, They opened up all the required ports and the problem could be resolved.




Applies to:

Siebel System Software - Version: 7.7.2.1 SIA [18353] and later   [Release: V7 and later ]
z*OBSOLETE: Microsoft Windows Server 2003
Product Release: V7 (Enterprise)
Version: 7.7.2.1 [18353] Hi Tech
Database: Oracle 9.2.0.6
Application Server OS: Microsoft Windows 2003 Server SP1
Database Server OS: Sun Solaris 9

This document was previously published as Siebel SR 38-2264779307.

Symptoms

SBL-SMI-00033, SBL-NET-01201Hello,

In migrating from Siebel 6.3 to 7.7 the environment was taken from regional servers one in the US and one in Sweden to a single Siebel server in Germany. The reason that I have opened this SR is that we are having connectivity problems when users access Siebel from the non European companies that did not exist in 6.3.

We are looking for suggestions on how to trace this issue or solve this issue. Our users use VPN software to access the network and therefore Siebel. We have problems with the connected clients but the are worse for Siebel Mobile Web Clients. We have no Siebel Dedicated Web Clients.


Best regards-

Cause

Configuration/ Setup

Solution

Message 1

For the benefit of other readers. The customer found that during synchronization over VPN several clients would experience disconnection.

Analysis of the log files on the client revealed the following:
SisnTcpIp    SisnSockWarning    2    0    2005-07-13 13:16:15     1380: [TCPIP-client] connect() to 163.157.2.202:40400 failed (err=10060 | Connection timed out.)
GenericLog    GenericError    1    0    2005-07-13 13:16:15    (commapi.cpp (298) err=1801201 sys=10060) SBL-NET-01201: Internal: connect() failed: Connection timed out.
GenericLog    GenericError    1    0    2005-07-13 13:16:15    (commapi.cpp (298) err=1700175 sys=2) SBL-DCK-00175: Cannot open connection to 163.157.2.202. The Synch Manager component on the server is most likely unavailable.

These errors indicate a network related behavior and not a problem or defect with the Siebel Software.

After further troubleshooting we believed that possibly packets were being dropped due to size.

The customer was referred to SR # 38-743187351. Here is some additional information on this registry setting:
http://www.microsoft.com/resources/documentation/Windows/2000/server/reskit/en-us/Default.asp?url=/resources/documentation/Windows/2000/server/reskit/en-us/regentry/58752.asp


In addition the following suggestions were made and brought back to the Network group:

In order to test this theory out you can try a couple of things.

continued...

Message 2

continued...
1) Test if this may be an issue by changing the MTU setting as is suggested in the SR on SW and the MS link provided. This is a client side based fix. If the synch no longer fails after implementing the change and a healthy connection can be established over and over then you would have a good indication that this was the problem. A global fix then would be required. This needs to be worked out by the Network group in that case rather then going from Client PC to Client PC and changing the setting.
2) You can try to run a 'netstat -a' during a synch session. A healthy connection will have the value of ESTABLISHED or LISTENING. If you get a value of SYN_SENT then this indicates a network connection problem and most likely show that packets are being dropped.
3) DSL connections are more susceptible to this issue and it may be possible to alter the MTU setting on the user's router. Again this is a client side fix. You may get into the issue of user's having different routers and setting so this approach is not as attractive.

Siebel Technical Support




Applies to:

Siebel System Software - Version 7.5.3 [16157] and later
z*OBSOLETE: Microsoft Windows 2000
Product Release: V7 (Enterprise)
Version: 7.5.3 [16157]
Database: Oracle 9i
Application Server OS: Microsoft Windows 2000 Advanced Server SP 4
Database Server OS: HP-UX 11i

This document was previously published as Siebel SR 38-1719219409.
***Checked for relevance on 11-NOV-2010***


Symptoms

SBL-NET-01201, SBL-SSM-00003
Hi,

We are experiencing occasional HTTP 500 errors with our Inbound HTTP EAI interface. We have two web servers dedicated to EAI requests that are load balanced using Windows 2000 NLB. These pass requests in a load balanced Resonate environment to two application servers both of which run the EAI Object Manager.

The vast majority of transactions are successful with each app server processing 30,000+ tasks each day. However, both of our web server logs show the following occasional error.


GenericLog GenericError 1 2005-01-17 16:31:27 (smconn.cpp 5(367) err=1801201 sys=10060) SBL-NET-01201: Internal: connect() failed: Connection timed out.
GenericLog GenericError 1 2005-01-17 16:31:27 (ssmsismgr.cpp 83(256) err=5600003 sys=0) SBL-SSM-00003: Error opening SISNAPI connection
GenericLog GenericError 1 2005-01-17 16:31:27 Login failed for Login name : tibcoadmin
GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE] Open Session failed (0x6ce5) after 22.9520 seconds.
GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE] Impersonate failed. Login failed attempting to connect to %1
GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE] Set Error Response (User: tibcoadmin Session: Error: 00027877 Message: Login failed attempting to connect to siebel.TCPIP.None.None://10.97.251.155:2320/sbl01p/EAIObjMgr_enu)
GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE] Error Child Messages : <0> Login failed attempting to connect to siebel.TCPIP.None.None://10.97.251.155:2320/sbl01p/EAIObjMgr_enu<1> Login failed. SBL-SSM-00003: Error opening SISNAPI connection
GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE] HTTP Status 500 : Error The service request could not be processed. Please check that the user name and password are correct, and that the request format is correct


If the problem persists, please contact the system administrator to get more detailed information and to check the system configuration.
GenericLog GenericLog 0 2005-01-17 16:31:27 [3208] ERROR 3208: [SWSE] Login failed. SBL-SSM-00003: Error opening SISNAPI connection

This error happens approximately 30 times a day on each web server and although this is a small % of total transactions it is important we eliminate these errors.
At the times the errors occur, we have found nothing in the app servers logs, nothing indicating a problem in the Resonate Message console, none of our hardware is under load (every server has an abundance of CPU and memory available), no indications of network problems exist

We are also curious as to the time period it takes for the timeout. The 22.9xxx seconds consistently appears in our logs and are wondering where this timeout is set. Is this configurable or this an internal value with the SWSE?



Cause

recommended TCP/IP settings were not implemented.

Solution


Customer implemented the TCP/IP registry changes described by the EVT tool on their web servers and application servers. After these changes the timeouts was almost completely disappeared.

HTTP_INACTIVE_CONN_TIMEOUT and SERVER_INACTIVE_CONN_TIMEOUT parameters only need to be set in each node (Machine) that is running Resonate in a Siebel Enterprise application but EVT recommended them on Web Servers too.
Change request # 12-SWZLYA has been logged to address this.

TCP* parameters are suggested for Windows platform and the information is available about these parameters on the Microsoft site or What is true benefit, if any, of changing TCP registry parameters as recommended by EVT? (Doc ID 499134.1)

These parameters are important as Siebel Server must have network access to other Siebel components, such as the Siebel Gateway Name Server, and the Siebel Database server and SWEApps.

References

NOTE:499134.1 - Benefit of Changing TCP Registry Parameters as Recommended by EVT Utility




Applies to:

Siebel eConfigurator - Version: 7.8.2.8 SIA [19237] and later   [Release: V7 and later ]
Information in this document applies to any platform.

Goal


The set up at the customer’s end which are all on Solaris OS :

- The Production Environment has 4 Remote eConfigurator Servers (PRD_ISS1, PRD_ISS2, PRD_ISS3, PRD_ISS4)

- The following Parameters have been set on each of the 5 Production OMs (PRD_OM1, PRD_OM2, PRD_OM3, PRD_OM4, PRD_OM5):

* Product Configurator - Remote Server Name - PRD_ISS1;PRD_ISS2;PRD_ISS3;PRD_ISS4

* Product Configurator - Use Remote Service - True

Issue:

- We had a hardware failure of PRD_ISS4. The physical server was off-line.
- The OMs were still attempting to send requests to PRD_ISS4.
- The request was hanging on PRD_ISS4 for 4 minutes. This was causing 25% of Production user sessions to hang.

Questions for this:
Is this normal behavior? This is causing OMs dependent on the IIS session to hang.
Can we dynamically change this parameter to remove an IIS server from the pool?
Are there additional settings required to have this work correctly?
Let me know what needs to be rectified to prevent the 4-minute delay when a single eConfigurator server is offline.

The issue is that Siebel Callcenter AppServers keep sending requests to an ISS server that is no longer available. We need to develop a plan to get a fix or workaround for this issue. Immediate questions are:

1) What is the recommendation on timeout value change? Is the timeout value change possible?

2) What does Oracle recommend in the case of another similar failure with one of the ISS servers becoming unavailable?

3) Even with a lower timeout value, I suspect that we will still see a delay and error with Callcenter trying to connect to the failed server. How can this situation be avoided or minimized? (i.e. a patch with smarter load-balancing mechanism?)

What we are seeing however differs if the configurator server is down, as opposed to having just the eConfigurator Component disabled.
ie eProdCfgTimeOut is set on the AOM to 5 seconds.
- If the eCfg Server is shutdown we get a 4 minute timeout (user sees this as a hang) before rerouting configurator session to another server.
- If the sCfg Server is online, but the Configurator Component is offline, we see the 5 second timeout.

Solution


EProdCfgTimeOut is the setting in Seconds that determines the time for which the application server would try to initiate a connection with the remote configurator server before returning error to user. However, irrespective of the timeout setting, the requests would still get routed to the remote server. This setting only determines for how long it should try to contact the remote server before returning an error.

Answers :

Regarding the timeout we need to distinguish between 2 scenarios:

1) Siebel server is down but operating system is still up and running. In this case parameter eProdCfgTimeOut is used to check whether a connection can be made within the defined timeframe (i.e. 5 seconds). Please note that here we still have a running tcp/ip stack (OS is up and running) which can accept a connection and returns an error because of a missing port (eCfg OM is down). So this will always work.

2) This is the bad situation. The whole machine is down, away, destroyed, not reachable etc. In this case no tcp/ip stack is running on the other side and a tcp/ip request will simply wait for a specific time before it returns with an error message. This is different for the used OS. For Windows you have to wait around a minute. For Solaris about 3-4 minutes. Anyway it is a OS parameter which can be changed but it is not a Siebel parameter.

That's why we have this problem at customer’s side as they are using Solaris. There are some approaches for this issue:
a) using a hardware balancer. This is not documented or tested. So we cannot tell whether this will work for remote eConfigurator

b) changing the Solaris OS parameter for the tcp/ip timeout setting
Parameter = tcp_time_wait_interval
default 240000 (2MSL according to RFC 1122) = 4 minutes

Please discuss this setting with Sun. You could change this to 10000. You will find a description with google, i.e. http://www.sean.de/Solaris/soltune.html#tcp_time_wait_interval

Additional Comments :

Whenever you see the error message

SBL-NET-01201: Internal: connect() failed: Connection timed out

in the log file, then we are dealing with a network issue and not Siebel issue. In this case we always see the delay, which is a TCP/IP issue and Operating System dependent. This happens if the machine is down which should answer.

Therefore, in your case, Siebel works fine. The current issue is that the delay seems not to be reactive on your OS specifications.
Again whenever we see error message "SBL-NET-01201: Internal: connect() failed: Connection timed out" then this is not a Siebel issue but a network issue (i.e. machine is down). In this case the delay is dependent on the TCP/IP stack and it's implementation. This is out of the control of Siebel and this is not an eConfigurator issue anymore.

Therefore, if our earlier suggested parameter does not show the effect it should, then kindly address this with a service request in the Core Server Technologies area and also with Sun, as it is a Solaris issue.

One of the possible solutions we discussed was changing TCP/IP parameters but this has effect for the whole machine, OS, and all software running on this machine. So this should be directly discussed with Sun and customer as we can only discuss the part for Siebel but a change of this parameter has effect for the whole machine and systems installed here. Changing this parameter would probably need a check of their network etc as well.
We have also raised an Enhancement Request # 10559309 to see if there is any possibility of addressing this in the long term. Enhancement Requests are reviewed, prioritized and if found viable, implemented in a future release.

Additional suggestion:

In cases when a machine is down and the administrator knows it, you may consider the following approaches:
a.) Remove the machine from network and put in a simple PC with the same IP address. This should avoid 4 minutes time out problem.
b.) Try with Administration - Product > Cache Admin. The idea is to set up the cache without using the broken machine. This would avoid the time outs as well.
For more information about the Configurator Caching please refer to Performance Tuning Guide 7.8 > Tuning Siebel Configurator for Performance > Administering Siebel Configurator Caching > Cache Management for Siebel Configurator (http://download.oracle.com/docs/cd/B31104_02/books/PerformTun/PerformTunConfigISS13.html#wp1063160)



Applies to:

Siebel CRM - Version: 8.1.1.5 [21229] and later   [Release: V8 and later ]
Information in this document applies to any platform.

Symptoms


On : 8.1.1.5 [21229] version, System Admin

After installation and configuring Siebel application
the following error occurs.

ERROR
-----------------------
Server Status is "Handshake failed"

The following error appears on srvrmgr.log file :


SessMgr ConnOpen 3 000000084f3e0040:0 2012-02-17 16:12:21 1: [SESSMGR] Open(siebel://machine1:49174/mbprod/servermgr/machine1, 60, -1)

SisnTcpIp SisnSockWarning 2 000000084f3e0040:0 2012-02-17 16:13:36 1: [TCPIP-client] connect() to machine1:49174 failed (err=78 | Connection timed out)

SessMgr ConnOpen 3 000000084f3e0040:0 2012-02-17 16:13:36 1: [SMCONN] Failed to open connection to (siebel://machine1:49174/mbprod/servermgr/machine1) in 75 sec(s)

GenericLog GenericError 1 000000084f3e0040:0 2012-02-17 16:13:36 (smconn.cpp (284) err=1180849 sys=78) SBL-NET-01201: Internal: connect() failed: Connection timed out

SisnTcpIp SisnSockDetail 4 000000084f3e0040:0 2012-02-17 16:13:36 1: [TCPIP-client] socket() closed descriptor = 5 from :0 to :55452

SessMgr ConnOpen 3 000000084f3e0040:0 2012-02-17 16:13:36 1: [SESSMGR] Error closing connection object

SessMgr MsgReceive 5 000000084f3e0040:0 2012-02-17 16:13:36 1: [SESSMGR] CB: conn 0x0, url NULL, mbuf 0x0, mlen 0, err 3670029

SessMgr SessMgrGeneric 4 000000084f3e0040:0 2012-02-17 16:13:36 1: [SESSMGR] conn 0x0: found error code (3670029), error info (NULL)

SessMgr ConnClose 5 000000084f3e0040:0 2012-02-17 16:13:36 1: [SESSMGR] conn 0x0: ctx 0x20753b48, url 0x<?INT?> cleaned up

SisnapiLayerLog Trace 3 000000084f3e0040:0 2012-02-17 16:13:36 1: [SISNAPI]: releasing connection (0x20753e20), refCount = 0

SessMgr ConnOpen 3 000000084f3e0040:0 2012-02-17 16:13:36 1: [SESSMGR] Open has taken 74.9 seconds so far, timing out

GenericLog GenericError 1 000000084f3e0040:0 2012-02-17 16:13:36 (ssmsismgr.cpp (544536816) err=0 sys=-752920388) SBL-GEN-00000: (ssmsismgr.cpp: 544536816) error code = 0, system error = -752920388, msg1 = (null), msg2 = (null), msg3 = (null), msg4 = (null)



STEPS
-----------------------
The issue can be reproduced at will with the following steps:

1. install siebel siebel application
2. configure
3. status is Handshake failed

Cause


Issue was caused by incorrect settings on hosts file

Solution


Customer resolved the issue , and updated the SR with the following information :

"The issue was resolved after rectifying the hosts file. It had two entries with two different IPs for the same hostname."

Thank you.

Oracle Product Support - Siebel CRM
 

No comments:

Post a Comment