天道酬勤

Oracle and My Life

Archive for January, 2010

VIP Failover Take Long Time After Network Cable Pulled

leave a comment

在一个10g RAC环境中测试拔掉网线,vip的切换花了很长时间,在oracle support上搜索如下:

[ID 403743.1]

Applies to:

Oracle Server – Enterprise Edition – Version: 10.2.0.1 to 11.1.0.7

This problem can occur on any platform.

Symptoms

This example is based on SUN Solaris platform, with IPMP configured for the public network. In this case, VIP failover takes almost 4 minutes to complete when both network cables of the public network are pulled from one node.

crsd.log shows:

2006-12-07 13:14:05.401: [ CRSAPP][4588] CheckResource error for ora.node1.vip error code = 1

2006-12-07 13:14:05.408: [ CRSRES][4588] In stateChanged, ora.node1.vip target is ONLINE

2006-12-07 13:14:05.409: [ CRSRES][4588] ora.node1.vip on node1 went OFFLINE unexpectedly

<<< detect network cable failure and VIP OFFLINE immediately

2006-12-07 13:14:05.410: [ CRSRES][4588] StopResource: setting CLI values

2006-12-07 13:14:05.420: [ CRSRES][4588] Attempting to stop `ora.node1.vip` on member `node1`

2006-12-07 13:14:06.651: [ CRSRES][4588] Stop of `ora.node1.vip` on member `node1` succeeded.

2006-12-07 13:14:06.652: [ CRSRES][4588] ora.node1.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0

2006-12-07 13:14:06.667: [ CRSRES][4588] ora.node1.vip failed on node1 relocating.

2006-12-07 13:14:06.758: [ CRSRES][4588] StopResource: setting CLI values

2006-12-07 13:14:06.766: [ CRSRES][4588] Attempting to stop `ora.node1.LISTENER_NODE1.lsnr` on member `node1`

2006-12-07 13:17:41.399: [ CRSRES][4588] Stop of `ora.node1.LISTENER_NODE1.lsnr` on member `node1` succeeded.

<<< takes 3.5 minutes to stop listener

2006-12-07 13:17:41.402: Attempting to stop `ora.node1.ASM1.asm` on member `node1`

<<< stop dependant inst and ASM

2006-12-07 13:17:55.610: [ CRSRES][4588] Stop of `ora.node1.ASM1.asm` on member `node1` succeeded.

2006-12-07 13:17:55.661: [ CRSRES][4588] Attempting to start `ora.node1.vip` on member `node2`

2006-12-07 13:18:00.260: [ CRSRES][4588] Start of `ora.node1.vip` on member `node2` succeeded.

<<< now VIP failover complete after almost 4 mins

ora.node1.LISTENER_NODE1.lsnr.log shows:

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=node1vip)(PORT=1521)(IP=FIRST)))

TNS-12535: TNS:operation timed

2006-12-07 13:17:41.329: [ RACG][1] [23916][1][ora.node1.LISTENER_NODE1.lsnr]: out

TNS-12560: TNS:protocol adapter error

TNS-00505: Operation timed out

Solaris Error: 145: Connection timed out

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.1.10.100)(PORT=1521)(IP=FIRST)))

The command completed successfully

Client connection hang during this failover time.

Changes

This may be a new setup, or a setup that was migrated from an earlier release.

Cause

This problem is caused by the first address in the listener.ora configuration being an address that uses the TCP protocol.

In this circumstance, when a network cable is pulled, “lsnrctl stop” listener has to wait for TCP timeout before it can check next address. On the Solaris platform, TCP timeout is defined by tcp_ip_abort_cinterval with a default value of 180000 (3 minutes).   That is why shutting down listener almost took 3.5 minutes. (TCP timeout on other platforms may vary).  The error message “Solaris Error: 145: Connection timed out” in ora.node1.LISTENER_NODE1.lsnr.log also indicates it is waiting for tcp timeout.

The listener.ora in this scenario is defined as:

LISTENER_NODE1 =
 
(DESCRIPTION_LIST =
 
(DESCRIPTION =
 
(ADDRESS_LIST =
 
(ADDRESS = (PROTOCOL = TCP)(HOST = node1vip)(PORT = 1521)(IP = FIRST))
 
)
 
(ADDRESS_LIST =
 
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.10.100)(PORT = 1521)(IP = FIRST))
 
)
 
(ADDRESS_LIST =
 
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
 
)
 
)
 
)

Solution

To prevent this, move the IPC address to be the first address for the listener in the listener.ora, eg:

LISTENER_NODE1 =
 
(DESCRIPTION_LIST =
 
(DESCRIPTION =
 
(ADDRESS_LIST =
 
(ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
 
)
 
(ADDRESS_LIST =
 
(ADDRESS = (PROTOCOL = TCP)(HOST = node1vip)(PORT = 1521)(IP = FIRST))
 
)
 
(ADDRESS_LIST =
 
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.1.10.100)(PORT = 1521)(IP = FIRST))
 
)
 
)
 
)

When lsnrctl tries to stop the listener, it will now connect to the IPC address first, which is available during that time. It will not have to wait for tcp timeout.

After the above change, the VIP failover only takes 48 to 50 seconds to complete regardless of the tcp_ip_abort_cinterval setting.

Please note, listener.ora files newly created from 10.2.0.3 to 11.1.0.7 should have the IPC protocol as the first address in listener.ora in most cases.  However, if you have upgraded from a previous release, or manually modified/copied over a listener.ora from a previous install, you may not have the IPC protocol as the first address, regardless of your version. Manual modification is required to move IPC protocol to be the first address to avoid the problem described in this note.

BTW:梦想有多远的blog也有说明。

-The End-

Written by ochef

January 11th, 2010 at 10:13 am

Posted in Database

Tagged with , ,

Changed My blog thems

leave a comment

今天把blog的主题换了,选了一款看起来更为简洁的,我就喜欢简约而不简单的东西。等有空的时候再来慢慢修改一些细节问题。

Written by ochef

January 7th, 2010 at 10:11 pm

Posted in Life

Tagged with

2010,I wish……

leave a comment

元旦前后几天比较忙,之前忙于工作,之后忙于吃饭喝酒。

不管怎么样迟来的总结总比没有的好,还是废话一把就当勉励吧:

1.09年大部分时间泡在Internet上,正因于此在网络上知道了很多牛人(排名不分先后 :lol: ):eygle、rickyzhu、oracleblog、fenng、ningoo太多了就不依依写了。从这些牛人身上我收获了很多,也得到过很多帮助,这此对他们说声:“谢谢!”(即使他们不知道我)

2.09年俺也开博了,数了数一共73篇文章,虽然2/3写的是一些废话,但我相信只要坚持下去,总会等到有料的那一天。

3.6月份跳到现在的公司,开始大量接触小型机和企业级存储,感谢公司给了我这样的机会。

4.通过了Oracle 9i OCP和IBM 223考试 (记错了,OCP是08年考的)

5.太多牛人写了太多牛书,还只看了其中几本,这点还很需要加强。

6.换了工作之后,公司每周五下午都会去运动锻炼身体,像我们这种工作性质的人太需要锻炼了。

2010年,我不敢奢求太多,仅几点而已:

1.8月15日,静静等待宝宝(小名:果果,取自与老婆爱情的果实之意)呱呱落地来到这个充满爱的世界。

2.Oracle OCP升级考试(有可能的话OCM考试)和IBM 102考试(公司需要)

3.把买的牛人们写的书认真读完,再读一二本非技术类的书充实自己。

4.加强口语学习,争取能开口。

5.多多联系朋友。

6.最后一点,也是最重要的一点:我希望家人、朋友都能平平安安、健健康康、快快乐乐的生活每一天。

-The End-

Written by ochef

January 4th, 2010 at 10:27 pm

Posted in Life

Tagged with

无觅相关文章插件,快速提升流量