<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>天道酬勤 &#187; rac_udp</title>
	<atom:link href="http://www.ochef.net/tag/rac_udp/feed" rel="self" type="application/rss+xml" />
	<link>http://www.ochef.net</link>
	<description>Oracle and My Life</description>
	<lastBuildDate>Mon, 09 Jan 2012 05:39:00 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>AIX network buffer参数设置引起RAC故障</title>
		<link>http://www.ochef.net/2010/03/aix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html</link>
		<comments>http://www.ochef.net/2010/03/aix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html#comments</comments>
		<pubDate>Fri, 26 Mar 2010 04:49:54 +0000</pubDate>
		<dc:creator>ochef</dc:creator>
				<category><![CDATA[Troubleshooting]]></category>
		<category><![CDATA[ora-600[12333]]]></category>
		<category><![CDATA[rac_udp]]></category>

		<guid isPermaLink="false">http://www.ochef.net/?p=807</guid>
		<description><![CDATA[故障日期：2010年3月23日 11:30 AM 生产环境：数据库：  Oracle 10.2.0.4  2Nodes RAC 操作系统：AIX 5309 故障现象： 现象1：在23日上午11:30，发现应用系统不能连接到RAC的实例1（Instance name:int1），此时实例2是正常的。 现象2：使用辅助工具TOAD也不能连接到实例1。 现象3：在实例1所在机器的本地使用SQLPLUS可以连接到实例1，此时也能在实例2上通过TNS连接到实例1。 现象4：15：19重启实例1后系统恢复正常。 故障分析： 1.根据实例1的alert log日志记载，在11：30记录的错误如下： Tue Mar 23 11:30:08 2010 WARNING: inbound connection timed out (ORA-3136) Tue Mar 23 11:32:05 2010 WARNING: inbound connection timed out (ORA-3136) 首先来了解ORA-3136这个错误，该错误表示客户端在sqlnet.ora文件中SQLNET.INBOUND_CONNECT_TIMEOUT参数定义的时间内没有完成登录认证，该参数默认值为60S，据Oracle官方文档记载，此默认值能够满足绝大多数条件；此外该错误还涉及到listener.ora文件中定义的参数INBOUND_CONNECT_TIMEOUT_LISTENER，Oracle 10.2.0.1之前默认值为0，从10.2.0.1开始默认值为60S，根据alert log日志记录的其它信息，目前暂时排除实例1的错误是由以上参数造成。 2.Alert log还记载 …… Tue Mar 23 12:15:36 2010 Errors in file [...]<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.ochef.net%2F2009%2F11%2Faix-using-raw-create-a-non-rac-of-the-asm.html&from=http%3A%2F%2Fwww.ochef.net%2F2010%2F03%2Faix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">AIX下使用 raw 创建 non-rac 的ASM</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.ochef.net%2F2010%2F07%2Fora-00600-2103.html&from=http%3A%2F%2Fwww.ochef.net%2F2010%2F03%2Faix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">ORA-00600:[2103],[1],[0],[1],[900]的处理</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.ochef.net%2F2009%2F08%2Frman-ora-1950427038.html&from=http%3A%2F%2Fwww.ochef.net%2F2010%2F03%2Faix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">RMAN ORA-19504、ORA-27038错误解决方法</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.ochef.net%2F2010%2F04%2Fora-12547-tns-lost-contact.html&from=http%3A%2F%2Fwww.ochef.net%2F2010%2F03%2Faix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">ORA-12547: TNS:lost contact</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems.htm" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p>故障日期：2010年3月23日 11:30 AM</p>
<p>生产环境：数据库：  Oracle 10.2.0.4  2Nodes RAC</p>
<p>操作系统：AIX 5309</p>
<p>故障现象：</p>
<p>现象1：在23日上午11:30，发现应用系统不能连接到RAC的实例1（Instance name:int1），此时实例2是正常的。</p>
<p>现象2：使用辅助工具TOAD也不能连接到实例1。</p>
<p>现象3：在实例1所在机器的本地使用SQLPLUS可以连接到实例1，此时也能在实例2上通过TNS连接到实例1。</p>
<p>现象4：15：19重启实例1后系统恢复正常。</p>
<p><strong>故障分析：</strong></p>
<p>1.根据实例1的alert log日志记载，在11：30记录的错误如下：</p>
<p>Tue Mar 23 11:30:08 2010</p>
<p>WARNING: inbound connection timed out (ORA-3136)</p>
<p>Tue Mar 23 11:32:05 2010</p>
<p>WARNING: inbound connection timed out (ORA-3136)</p>
<p>首先来了解ORA-3136这个错误，该错误表示客户端在sqlnet.ora文件中SQLNET.INBOUND_CONNECT_TIMEOUT参数定义的时间内没有完成登录认证，该参数默认值为60S，据Oracle官方文档记载，此默认值能够满足绝大多数条件；此外该错误还涉及到listener.ora文件中定义的参数INBOUND_CONNECT_TIMEOUT_LISTENER，Oracle 10.2.0.1之前默认值为0，从10.2.0.1开始默认值为60S，根据alert log日志记录的其它信息，目前暂时排除实例1的错误是由以上参数造成。</p>
<p>2.Alert log还记载</p>
<p>……</p>
<p>Tue Mar 23 12:15:36 2010</p>
<p>Errors in file /soft/oracle/admin/int/udump/int1_ora_2617378.trc:</p>
<p>ORA-00600: internal error code, arguments: [12333], [7], [2], [49], [], [], [], []</p>
<p>……</p>
<p>根据Oracle metalink文档[ID 35928.1]描述：<strong>&#8220;Fatal Two-Task Protocol Violation&#8221;</strong></p>
<p>ORA-600 [12333]描述收到一个没有经过验证的无效的网络数据包，这里有二个可能：一是客户端多线程的应用发送了一个无顺序的OCI调用请求，二是网络缓冲区中的数据可能被覆盖，进一步查看trace文件，可以看到每个trace文件的开关处都有：PROTOCOL VIOLATION DETECTED。</p>
<p>另外，由贵行的带内网管软件Tivoli监控到故障当时RAC心跳网络（ent8）的通信流量信息证明，当时心跳网络流量确实比正常情况下高，RAC 采用UDP 协议进行节点间的互联通信，查询系统统计如下：</p>
<p>RACDB1# netstat -p udp -s</p>
<p>udp:</p>
<p>574337869 datagrams received</p>
<p>0 incomplete headers</p>
<p>0 bad data length fields</p>
<p>0 bad checksums</p>
<p><span style="color: #ff0000;">169617 dropped due to no socket</span></p>
<p>32335 broadcast/multicast datagrams dropped due to no socket</p>
<p><span style="color: #ff0000;">243 socket buffer overflows</span></p>
<p>574135674 delivered</p>
<p>500048775 datagrams output</p>
<p>RACDB2# netstat -p udp -s</p>
<p>udp:</p>
<p>500187207 datagrams received</p>
<p>0 incomplete headers</p>
<p>0 bad data length fields</p>
<p>0 bad checksums</p>
<p><span style="color: #ff0000;">171357 dropped due to no socket</span></p>
<p>32333 broadcast/multicast datagrams dropped due to no socket</p>
<p><span style="color: #ff0000;">2108 socket buffer overflows</span></p>
<p>499981409 delivered</p>
<p>574427147 datagrams output</p>
<p>以上信息可以看到，由于系统网络参数network buffer设置不当出现通信问题，查看涉及network buffer大小的参数：</p>
<p>#no -a |pg</p>
<p>sb_max = 1310720</p>
<p>udp_recvspace = 655360</p>
<p>udp_sendspace = 65536</p>
<p>sb_max被用来指定允许的TCP和UDP socket的最大缓冲区大小，默认值为1048576 bytes，1048576 bytes，很显然，udp_recvspace与udp_sendspace设置不对称且sb_max参数设置过小。</p>
<p>3.ORA-600 [12333]的错误也可以由JDBC驱动版本与Oracle数据库版本不一致造成，但贵行此套系统已上线很久，由此可以暂时先排除该原因。另外，根据trace文件的记录，在故障期间有大量的UNION联合查询操作，而这种大量的UNION操作会增加节点间的通信，ashrpt的报告也证实了gc buffer busy随故障时间增加，到最后被剔出RAC降下来。</p>
<p><strong>初步结论：</strong></p>
<p>基于以上情况分析，现初步判断此次故障为：由系统网络buffer参数设置不当引起RAC 节点间的互联网络故障，而节点间的互联网络用于协调各个节点的运行，包括全局锁(global locking) ，队列(enqueue) 和缓存管理(buffer cache management)，建议udp_sendspace 的起始值为db_block_size * db_file_multiblock_read_count ，udp_recvspace 设为udp_sendspace 的4 倍，上限为1048576 。如果发生socket 缓存溢出( 可通过 netstat -s | grep &#8220;socket buffer overflows&#8221; 命令察看) udp_recvspace 参数值需要增加，netstat -p udp -s的结果也证实了这一点。</p>
<p>BTW:这里还有EYGLE大师的文章供参考：</p>
<h3><a href="http://www.eygle.com/digest/2009/07/ibm_aix_oracle_9i_rac_udp.html" target="_blank">IBM AIX Oracle 9i RAC 性能因素 &#8211; udp及其他</a></h3>
<p>-The End-</p>
<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.ochef.net%2F2009%2F11%2Faix-using-raw-create-a-non-rac-of-the-asm.html&from=http%3A%2F%2Fwww.ochef.net%2F2010%2F03%2Faix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">AIX下使用 raw 创建 non-rac 的ASM</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.ochef.net%2F2010%2F07%2Fora-00600-2103.html&from=http%3A%2F%2Fwww.ochef.net%2F2010%2F03%2Faix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">ORA-00600:[2103],[1],[0],[1],[900]的处理</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.ochef.net%2F2009%2F08%2Frman-ora-1950427038.html&from=http%3A%2F%2Fwww.ochef.net%2F2010%2F03%2Faix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">RMAN ORA-19504、ORA-27038错误解决方法</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect.htm?url=http%3A%2F%2Fwww.ochef.net%2F2010%2F04%2Fora-12547-tns-lost-contact.html&from=http%3A%2F%2Fwww.ochef.net%2F2010%2F03%2Faix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">ORA-12547: TNS:lost contact</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems.htm" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://www.ochef.net/2010/03/aix-network-buffer-parameter-is-set-to-rise-to-rac-failures.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

