<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Oracle From Scratch</title>
	<atom:link href="http://sid.gd/feed/" rel="self" type="application/rss+xml" />
	<link>http://sid.gd</link>
	<description></description>
	<lastBuildDate>Wed, 11 Apr 2012 09:32:45 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>SLOB: 有趣的IO测试工具</title>
		<link>http://sid.gd/slob-oracle-io-benchmark/</link>
		<comments>http://sid.gd/slob-oracle-io-benchmark/#comments</comments>
		<pubDate>Tue, 10 Apr 2012 11:24:35 +0000</pubDate>
		<dc:creator>Sidney Chen</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[filesystemio_options]]></category>
		<category><![CDATA[SLOB]]></category>

		<guid isPermaLink="false">http://sid.gd/?p=1661</guid>
		<description><![CDATA[除了ORION, 如果想更真实测试存储的IO能力, Kevin Closson的工具SLOB(Silly Little Oracle Benchmark)可能会很有用. 见他的博客介绍, 需要翻墙访问. http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/ 下面是下载链接, 配置很简单, 看一下README-FIRST就知道了. http://oaktable.net/articles/slob-silly-little-oracle-benchmark 使用SLOB, 不同类型的测试只需要控制db_cache_size大小. 1. 随机单块读, 如果把buffer cache设置足够小, 对一个大表做索引扫描, 将大部分的逻辑读转化为物理读, 测试存储随机单块读的能力. 2. 逻辑读, 如果把buffer cache设置足够大, 逻辑IO都可以命中cache, 可以测试系统逻辑IO的极限. 3. 顺序写, 如果把buffer cache设置足够大, 执行更新语句, 产生大量的redo, 可以模拟lgwr顺序写的吞吐量. 4. 顺序写和随机写, 如果把buffer cache设置的很小, &#8230; <a href="http://sid.gd/slob-oracle-io-benchmark/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>除了ORION, 如果想更真实测试存储的IO能力, <a href="http://kevinclosson.wordpress.com" target="_blank">Kevin Closson</a>的工具SLOB(Silly Little Oracle Benchmark)可能会很有用. 见他的博客介绍, 需要翻墙访问.</p>
<p><a href="http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/" target="_blank">http://kevinclosson.wordpress.com/2012/02/06/introducing-slob-the-silly-little-oracle-benchmark/</a></p>
<p>下面是下载链接, 配置很简单, 看一下README-FIRST就知道了.</p>
<p><a href="http://oaktable.net/articles/slob-silly-little-oracle-benchmark" target="_blank">http://oaktable.net/articles/slob-silly-little-oracle-benchmark</a></p>
<p>使用SLOB, 不同类型的测试只需要控制db_cache_size大小.<br />
<span id="more-1661"></span><br />
1. 随机单块读, 如果把buffer cache设置足够小, 对一个大表做索引扫描, 将大部分的逻辑读转化为物理读, 测试存储随机单块读的能力.<br />
2. 逻辑读, 如果把buffer cache设置足够大, 逻辑IO都可以命中cache, 可以测试系统逻辑IO的极限.<br />
3. 顺序写, 如果把buffer cache设置足够大, 执行更新语句, 产生大量的redo, 可以模拟lgwr顺序写的吞吐量.<br />
4. 顺序写和随机写, 如果把buffer cache设置的很小, 执行更新语句, 产生大量的redo, 同时, 因为buffer cache很小, dbwr需要及时把脏块快速写回数据文件, 可以测试lgwr和dbwr一起工作时的吞吐量.</p>
<p>用SLOB可以做一些有趣的测试, 比如启用异步IO, 对于随机单块读, 性能提升有多大? 我刚刚把Mackbook Pro换上一块SSD, 用VirtualBox装了Oracle Linux 6.2, 数据库版本是11.2.0.1.0, 用来测试.</p>
<p>设置cpu_count为1, db_cache_size为40M, 准备随机单块读的环境. </p>
<pre class="brush: plain; title: ; notranslate">
cpu_count=1
db_cache_size=40m

SQL&gt; startup pfile=./create1.ora
ORACLE instance started.

Total System Global Area  208805888 bytes
Fixed Size                2211848 bytes
Variable Size             159387640 bytes
Database Buffers          41943040 bytes
Redo Buffers              5263360 bytes
Database mounted.
Database opened.
</pre>
<p>下面是两份awr片断, 只执行一个reader(./runit.sh 0 1). 没有启用异步IO, Physical reads Per second = 1300, db file parallel read的平均等待时间为21毫秒; 启用异步IO的话, Physical reads Per second = 5,315, db file parallel read的平均等待时间为5毫秒. 可见启用异步IO, 对IO性能的提升还是很大.</p>
<p>没有启用异步IO: filesystemio_options=directio</p>
<pre class="brush: plain; title: ; notranslate">
Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):                1.0              511.0       0.07      34.06
       DB CPU(s):                0.2               77.5       0.01       5.17
       Redo size:            1,700.2          868,944.0
   Logical reads:            2,541.8        1,299,057.0
   Block changes:                3.6            1,819.0
  Physical reads:            1,300.6          664,714.0

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
db file parallel read                22,379         476     21   93.1 User I/O
DB CPU                                               77          15.2
db file sequential read              23,498          18      1    3.6 User I/O
direct path write                       379           1      2     .2 User I/O
db file scattered read                  100           0      1     .0 User I/O
</pre>
<p>启用异步IO: filesystemio_options=setall</p>
<pre class="brush: plain; title: ; notranslate">
Load Profile              Per Second    Per Transaction   Per Exec   Per Call
~~~~~~~~~~~~         ---------------    --------------- ---------- ----------
      DB Time(s):                1.0              138.2       0.02      12.56
       DB CPU(s):                0.7               96.1       0.01       8.74
       Redo size:            4,992.2          696,960.0
   Logical reads:            9,288.9        1,296,834.0
   Block changes:                9.4            1,314.0
  Physical reads:            5,315.7          742,129.0

Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                           Avg
                                                          wait   % DB
Event                                 Waits     Time(s)   (ms)   time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
db file parallel read                21,968         105      5   76.2 User I/O
DB CPU                                               96          69.6
db file sequential read              25,316          20      1   14.6 User I/O
direct path write                        87           1      8     .5 User I/O
db file scattered read                  124           0      1     .1 User I/O
</pre>
<p>下面的awr片断来自10g, 同样是随机单块读的测试, 存储是20块硬盘组成的raid 5, EMC CLARiiON CX3-20F, 控制器有2G内存. 启用异步IO, 执行32个reader时(./runit.sh 0 32), 只能把Physical reads per Second推到3,705, 而一块SSD一个reader, Physical reads per seconds可以达到5,315, 在随机IO方面, 硬盘的表现真是很孱弱.</p>
<pre class="brush: plain; title: ; notranslate">
Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                   ---------------       ---------------
                  Redo size:              2,114.47             44,129.00
              Logical reads:              3,780.77             78,904.89
              Block changes:                 10.78                224.93
             Physical reads:              3,705.33             77,330.27
            Physical writes:                  2.90                 60.45

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
db file sequential read           4,323,894      34,130      8   99.2   User I/O
CPU time                                            244           0.7
db file scattered read                  529           9     16    0.0   User I/O
control file sequential read          2,889           6      2    0.0 System I/O
control file parallel write             389           1      1    0.0 System I/O
</pre>
<p>&#8211;EOF&#8211;</p>
<p>2010-09-23 冈仁波齐山脚, 塔尔钦村<a href="https://lh6.googleusercontent.com/-K0kWH4kmxlc/T4QXpCVNkHI/AAAAAAAAY1M/XHh08rc0ZHQ/s1440/DSC_2914.JPG" target="_blank"><br />
<img class="Picasa" src="https://lh6.googleusercontent.com/-K0kWH4kmxlc/T4QXpCVNkHI/AAAAAAAAY1M/XHh08rc0ZHQ/s800/DSC_2914.JPG" alt="" /><br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://sid.gd/slob-oracle-io-benchmark/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>关于gc current blocks received一个特殊例子</title>
		<link>http://sid.gd/a-special-case-abount-gc-current-blocks-received/</link>
		<comments>http://sid.gd/a-special-case-abount-gc-current-blocks-received/#comments</comments>
		<pubDate>Wed, 04 Apr 2012 14:51:06 +0000</pubDate>
		<dc:creator>Sidney Chen</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[gc cr]]></category>
		<category><![CDATA[gc current]]></category>

		<guid isPermaLink="false">http://sid.gd/?p=1652</guid>
		<description><![CDATA[这篇文章源于和Kamus的讨论, 出现&#8217;gc current blocks received&#8217;的一种特殊情况. 当select语句请求gc cr时, 为什么会出现等待事件&#8217;gc current block 2-way&#8217;, 同时&#8217;gc current blocks received&#8217;统计值增加? 一个gc cr请求的出现, 常见的原因是数据块存在于其他实例的缓存中. 出现gc cr请求时, 可能会出现三种情况 1. 请求的数据块不在gc中 2. 请求的数据块在gc中且状态是xcur 3. 请求的数据块在gc中且状态是scur 以下是三种情况的描述, 测试环境 10.2.0.3.0 Linux 32bit 2-node RAC. 1. 请求的数据块不在gc中 请求的session等待消息&#8217;gc cr grant 2-way&#8217;, &#8230; <a href="http://sid.gd/a-special-case-abount-gc-current-blocks-received/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>这篇文章源于和<a href="http://www.dbform.com/" block="_blank">Kamus</a>的讨论, 出现&#8217;gc current blocks received&#8217;的一种特殊情况. <strong>当select语句请求gc cr时, 为什么会出现等待事件&#8217;gc current block 2-way&#8217;, 同时&#8217;gc current blocks received&#8217;统计值增加?</strong></p>
<p>一个gc cr请求的出现, 常见的原因是数据块存在于其他实例的缓存中. 出现gc cr请求时, 可能会出现三种情况<br />
1. 请求的数据块不在gc中<br />
2. 请求的数据块在gc中且状态是xcur<br />
3. 请求的数据块在gc中且状态是scur<br />
<span id="more-1652"></span><br />
以下是三种情况的描述, 测试环境 10.2.0.3.0 Linux 32bit 2-node RAC.</p>
<p>1. 请求的数据块不在gc中 请求的session等待消息&#8217;gc cr grant 2-way&#8217;, 拿到grant之后, 开始物理读. 因为可以从数据块的地址算出对应的master, 无论RAC多少个实例, gc cr grant只会是2-way, 不会出现3-way. 以下是这种情况的sql trace片断.</p>
<pre class="brush: plain; title: ; notranslate">
WAIT #34: nam='gc cr grant 2-way' ela= 145 p1=1 p2=64631 p3=1 obj#=146039 tim=1302016329430647
WAIT #34: nam='db file sequential read' ela= 2072 file#=1 block#=64631 blocks=1 obj#=146039 tim=1302016329432741
WAIT #34: nam='gc cr grant 2-way' ela= 111 p1=1 p2=64632 p3=1 obj#=146039 tim=1302016329432928
WAIT #34: nam='db file sequential read' ela= 3026 file#=1 block#=64632 blocks=1 obj#=146039 tim=1302016329435970
</pre>
<p>2. 请求的数据块在gc中且状态是xcur, 等待事件是&#8217;gc cr block 2-way&#8217;, &#8216;gc cr blocks received&#8217;统计值增加. 持有xcur的实例会负责构建CR版本, 发给请求的实例. 如果需要的话, 构建的过程会apply undo records, 可以观察到统计信息&#8217;data blocks consistent reads &#8211; undo records applied&#8217;, 或者&#8217;transaction tables consistent reads – undo records applied&#8217;的增长. </p>
<p>3. 请求的数据块在gc中且状态是scur, &#8216;gc current blocks received&#8217;统计值增加, 这种情况可以解释文章开始的问题. 对于单块请求, 等待事件是&#8217;gc current block 2-way&#8217;, 对于多块请求, 等待事件是&#8217;gc cr multi block request&#8217;. RAC中因为请求CR, 从DISK中首次读出来的数据块的状态是scur, 对状态为scur的数据块, 没有被修改过的话, 每个实例持有的状态都是scur. 数据块的holder是否等于master, 不会影响这种等待事件和统计信息. 以下验证过程.</p>
<p>1. 建表T, 插入1000行到1000个数据块</p>
<pre class="brush: plain; title: ; notranslate">
create table t
(
id	number,
small_vc number,
padding varchar2(1000)
)
pctfree 99
pctused 1
/

insert into t
select
    rownum          	id,
    lpad(rownum,10,'0') small_vc,
    rpad('x',100)       padding
from
	dual
connect by level &lt;= 1000
;

commit;

alter table t add constraint t_pk primary key(id);

begin
dbms_stats.gather_table_stats(user,'T');
end;
/
</pre>
<p>表T所有数据块都在文件1上, 分布在三个区间 [63825, 63873 + 7], [64137, 64201 + 7], [64265, 65033 + 127]</p>
<pre class="brush: plain; title: ; notranslate">
sys@V10&gt; col owner for A3
sys@V10&gt; col segment_name for A2
sys@V10&gt; select
  2  	     owner,
  3  	     segment_name,
  4  	     segment_type,
  5  	     extent_id,
  6  	     file_id,
  7  	     block_id,
  8  	     blocks
  9  from
 10  	     dba_extents
 11  where
 12  	     segment_name = 'T'
 13  and owner = 'SYS'
 14  order by extent_id
 15  /

OWN SE SEGMENT_TYPE        EXTENT_ID    FILE_ID   BLOCK_ID     BLOCKS
--- -- ------------------ ---------- ---------- ---------- ----------
SYS T  TABLE                       0          1      63825          8
SYS T  TABLE                       1          1      63833          8
SYS T  TABLE                       2          1      63841          8
SYS T  TABLE                       3          1      63849          8
SYS T  TABLE                       4          1      63857          8
SYS T  TABLE                       5          1      63865          8
SYS T  TABLE                       6          1      63873          8
SYS T  TABLE                       7          1      64137          8
SYS T  TABLE                       8          1      64145          8
SYS T  TABLE                       9          1      64153          8
SYS T  TABLE                      10          1      64161          8
SYS T  TABLE                      11          1      64169          8
SYS T  TABLE                      12          1      64177          8
SYS T  TABLE                      13          1      64185          8
SYS T  TABLE                      14          1      64193          8
SYS T  TABLE                      15          1      64201          8
SYS T  TABLE                      16          1      64265        128
SYS T  TABLE                      17          1      64393        128
SYS T  TABLE                      18          1      64521        128
SYS T  TABLE                      19          1      64649        128
SYS T  TABLE                      20          1      64777        128
SYS T  TABLE                      21          1      64905        128
SYS T  TABLE                      22          1      65033        128

23 rows selected.
</pre>
<p>通过block address, 可以从x$kjbl查询表T的所有数据块对应的owner和master, 都是从0开始算起. 以下查询观察表T中, 数据块对应的owner和master</p>
<pre class="brush: plain; title: ; notranslate">
select
	kjblowner, kjblmaster, count(*)
from
	x$kjbl
where
	to_number(substr ( kjblname2,  instr(kjblname2,',')+1,   instr(kjblname2,',',1,2)-instr(kjblname2,',',1,1)-1)/65536) = 1
and
	(to_number(substr ( kjblname2, 1, instr(kjblname2,',')-1)) between 63825 and 63873 + 7
		or
	to_number(substr ( kjblname2, 1, instr(kjblname2,',')-1)) between 64137 and 64201 + 7
		or
	to_number(substr ( kjblname2, 1, instr(kjblname2,',')-1)) between 64265 and 65033 + 127)
group by
	kjblowner, kjblmaster
/
</pre>
<p>观察表T的数据块在缓存中的状态和数量.</p>
<pre class="brush: plain; title: ; notranslate">
select
	objd, status, count(*)
from
	v$bh
where
	objd = &amp;t_obj_id
group by
	objd, status
order by
	count(*)
;
</pre>
<p>2. instance 2, flush buffer_cache, 清空两个instance的缓存.</p>
<pre class="brush: plain; title: ; notranslate">
sys@V10&gt; alter system flush buffer_cache;

System altered.
</pre>
<p>3. instance 2, 通过oradebug lkdebug -m pkey 指定T的master node是instance 2. 接着在instance 1读出所有的数据块, 然后在instance 2向instance 1请求&#8217;gc cr&#8217;, 这时表T1000的数据块的holder=instance 1, master = instance 2, holder不等于master, 观察两个instance请求前后统计信息的变化.</p>
<pre class="brush: plain; title: ; notranslate">
sys@V10&gt; oradebug lkdebug -m pkey &amp;t_obj_id
Statement processed.

sys@V10&gt; select * from v$gcspfmaster_info where object_id=&amp;t_obj_id;

   FILE_ID  OBJECT_ID CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT
---------- ---------- -------------- --------------- ------------
         0     146039              1               0            0
</pre>
<p>4. instance 1, 索引扫描T, isntance 1的buffer cahce中有1000个状态为scur数据块, x$kjbl显示1000相应的shadow resource, 因为master是instance 2, gc remote grants和gcs message sent是1000.</p>
<pre class="brush: plain; title: ; notranslate">
sys@V10&gt; select /*+ index(t)*/
  2  	     max(small_vc)
  3  from
  4  	     t
  5  where
  6  	     id &gt; 0
  7  ;

MAX(SMALL_VC)
-------------
         1000

--query v$bh
      OBJD STATUS    COUNT(*)
---------- ------- ----------
    146039 cr               1
    146039 scur          1000
    146039 free          1820

--query v$kjbl
 KJBLOWNER KJBLMASTER   COUNT(*)
---------- ---------- ----------
         0          1       1000

--session statistics
Name               Value
----               -----
gcs messages sent  1,000
gc local grants        3
gc remote grants   1,000
</pre>
<p>5. instance 2, 索引扫描T, 观察两个instance统计信息的变化</p>
<pre class="brush: plain; title: ; notranslate">
a. instance 1, 'gc current blocks served'增加1003, v$bh和x$kjbl中表T的信息没变.
--system statistics
Name                     Value
----                     -----
gc current blocks served 1,003

--query v$bh
      OBJD STATUS    COUNT(*)
---------- ------- ----------
    146039 cr               1
    146039 scur          1000
    146039 free          1812

--query v$kjbl
 KJBLOWNER KJBLMASTER   COUNT(*)
---------- ---------- ----------
         0          1       1000
</pre>
<p>b. instance 2, &#8216;gc current blocks received&#8217;增加1003, 验证了&#8217;gc cr&#8217;的请求也会使&#8217;gc current blocks received&#8217;增加.</p>
<pre class="brush: plain; title: ; notranslate">
--session statistics
Name                       Value
----                       -----
gc current blocks received 1,003
</pre>
<p>c. instance 2, 1000个状态为scur的buffer, 因为instance 2是master, x$kjbl分别显示instance1/2上的1000个resource.</p>
<pre class="brush: plain; title: ; notranslate">
--query v$bh
      OBJD STATUS    COUNT(*)
---------- ------- ----------
    146039 scur          1000
    146039 free          1878

--query v$kjbl
 KJBLOWNER KJBLMASTER   COUNT(*)
---------- ---------- ----------
         1          1       1000
         0          1       1000
</pre>
<p>e. instance 2, sql trace显示扫描过程等待&#8217;gc current block 2-way&#8217; 1003次.</p>
<pre class="brush: plain; title: ; notranslate">
WAIT #2: nam='gc current block 2-way' ela= 237 p1=1 p2=65122 p3=1 obj#=146039 tim=1302120511468699
WAIT #2: nam='gc current block 2-way' ela= 240 p1=1 p2=65123 p3=1 obj#=146039 tim=1302120511468992
WAIT #2: nam='gc current block 2-way' ela= 258 p1=1 p2=65124 p3=1 obj#=146039 tim=1302120511469288
WAIT #2: nam='gc current block 2-way' ela= 257 p1=1 p2=65125 p3=1 obj#=146039 tim=1302120511469584
</pre>
<p>想把RAC cache fusion所有的行为想到罗列出来我想比较困难, block class有18中, 各种状态xcur/scur/pi/cr&#8230;, 每种的处理方法都是否一样, gc中的buffer是否dirty, 是否被ping住? 不过了解 cache fusion的基本原理之后, 针对特定的情况, 我们可以可以设计实验进行验证. </p>
<pre class="brush: plain; title: ; notranslate">
sid@V10&gt; select * from v$waitstat
  2  /

CLASS                   COUNT       TIME
------------------ ---------- ----------
data block               1212        525
sort block                  0          0
save undo block             0          0
segment header              4          1
save undo header            0          0
free list                   0          0
extent map                  0          0
1st level bmb             142         50
2nd level bmb             235         81
3rd level bmb             236         55
bitmap block                0          0
bitmap index block          0          0
file header block           0          0
unused                      0          0
system undo header          0          0
system undo block           0          0
undo header                16          0
undo block                 13          1

18 rows selected.
</pre>
<p>2010-09-22 黄昏, 人们在托林寺转经<a href="https://lh4.googleusercontent.com/-6JBDNlcNjVE/T3xfbGofwgI/AAAAAAAAYwI/XK27j8V3x-M/s1440/DSC_2879.JPG" target="_blank"><br />
<img class="Picasa" src="https://lh4.googleusercontent.com/-6JBDNlcNjVE/T3xfbGofwgI/AAAAAAAAYwI/XK27j8V3x-M/s800/DSC_2879.JPG" alt="" /><br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://sid.gd/a-special-case-abount-gc-current-blocks-received/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ORA-04030 and UTL_FILE.Get_Line</title>
		<link>http://sid.gd/ora-04030-and-utl_file-get_line/</link>
		<comments>http://sid.gd/ora-04030-and-utl_file-get_line/#comments</comments>
		<pubDate>Fri, 30 Mar 2012 13:19:41 +0000</pubDate>
		<dc:creator>Sidney Chen</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[ORA-04030]]></category>
		<category><![CDATA[UTL_FILE.Get_Line]]></category>

		<guid isPermaLink="false">http://sid.gd/?p=1643</guid>
		<description><![CDATA[This two weeks, I encountered an interesting case related to the function UTL_FILE.GET_LINE. One data loading application failed on ORA-04030 error, the physical free memory on the DB server is exhausted, by the server process 13225. The appliation is to &#8230; <a href="http://sid.gd/ora-04030-and-utl_file-get_line/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>This two weeks, I encountered an interesting case related to the function UTL_FILE.GET_LINE. One data loading application failed on ORA-04030 error, the physical free memory on the DB server is exhausted, by the server process 13225.</p>
<pre class="brush: plain; title: ; notranslate">
Errors in file /home/oracle/diag/rdbms/V11/V11/trace/V11_ora_13225.trc  (incident=16305):
ORA-04030: out of process memory when trying to allocate 16328 bytes (koh-kghu sessi,pl/sql vc2)
ORA-29282: invalid file ID

$free -m
             total       used       free     shared    buffers     cached
Mem:         16050      15960         90          0        126       3777
-/+ buffers/cache:      12056       3994
Swap:         4095          0       4095

$top

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
13225 oracle    15   0 10.2g 4.0g  51m S  0.0 25.8  29:05.39 oracle
 1278 root      17   0 1602m 308m 9508 S  0.0  1.9  37:45.32 java
</pre>
<p><span id="more-1643"></span><br />
The appliation is to load data from text file, use the utl_file to read the data, do some process then insert into DB. The ORA-04030 error seems random with different data files. As the plsql code has not been changed for years, we suspect that it&#8217;s a data issue. The max line size show a increase trend when an ORA-04030 error happened.</p>
<pre class="brush: plain; title: ; notranslate">
Success loading, the max line size is 32505.
$ wc -L test1.data
32505 test1.data

Failed loading, the max line size is 32799.
$ wc -L test2.data
32799 test2.data

Failed loading, the max line size is 37573.
$ wc -L test3.data
37573 test3.data
</pre>
<p>The line size change from 32505 to 32799 seem make the loading fail. The application include below code. It turns out that the application can&#8217;t handle the data files, which contain a line exceeding 32767. Oracle UTL_FILE.GET_LINE function can only fetch 32767 bytes. If a line exceeds 32767, the call on utl_file.get_line will raise utl_file.read_error exception (ORA-29284: file read error). It&#8217;s this exception trigger the application bug. In the plsql code, the exception is catched as exected, at the same time, the data file is closed by UTL_FILE.FCLOSE_ALL as well. The problem is the program does not exit, it go on to read next line, on the closed file handle, causesing the UGA memory leak.</p>
<p>Actually The line size of data files does not change suddenly, the max line size has been already approaching 32767, recently, it just cross the 32767 threshold and hit the application bug.</p>
<p>From <a href="http://docs.oracle.com/cd/E11882_01/appdev.112/e25788/u_file.htm#ARPLS70930" target="_blank">online doc:</a></p>
<pre class="brush: plain; title: ; notranslate">
GET_LINE Procedure

This procedure reads text from the open file identified by the file handle
and places the text in the output buffer parameter. Text is read up to,
but not including, the line terminator, or up to the end of the file,
or up to the end of the len parameter. It cannot exceed the max_linesize
specified in FOPEN.

UTL_FILE.GET_LINE (
   file        IN  FILE_TYPE,
   buffer      OUT VARCHAR2,
   len         IN  PLS_INTEGER DEFAULT NULL);

The max_linesize is range from 1 to 32767.
</pre>
<p><strong>Reproduce Case</strong></p>
<p>Here is procedure to reproduce the ORA-04030<br />
1. Prepare a file test.data including only one line, the line size is 37769, greater than 32767.<br />
2. Query the free memory available on OS before testing<br />
3. Prepare a session 36, create a directory object utl_dir.<br />
4. Take a snapshot of uga/pga of session 36. before testing.<br />
5. In session 36, run the plsql to simulate the problem. Session 36 hang as expect.<br />
6. Query uga/pga of session 36 in another session.<br />
7. Query OS CPU activity and memory usage</p>
<p>1. Prepare a file test.data, the line size is 37769.</p>
<pre class="brush: plain; title: ; notranslate">
$ wc -l test.data
1 test.data
$ wc -L test.data
37769 test.data
</pre>
<p>2. Query the free memory available on OS. There is 4G memory on the server, 1.6G free and 646M for page cache. When under memory pressure, the page cache will be shrunk and the memory will used by the processes.</p>
<pre class="brush: plain; title: ; notranslate">
oracle@dbserver:~/scripts$ free -m
             total       used       free     shared    buffers     cached
Mem:          3925       2251       1674          0         98        646
-/+ buffers/cache:       1506       2419
Swap:         1992        717       1275
</pre>
<p>3. Prepare a session 36, create a directory utl_dir.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; CREATE DIRECTORY utl_dir AS '/home/oracle/utl';

Directory created.
</pre>
<p>4. Take a snapshot of uga/pga of session 36, before testing.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; @ses 36 session%mem
       SID NAME                                VALUE
---------- ------------------------------ ----------
        36 session uga memory                 241748
        36 session uga memory max             438284
        36 session pga memory                1007660
        36 session pga memory max            1007660

sid@CS11GR2&gt; select sid from v$mystat where rownum=1;

       SID
----------
        36
</pre>
<p>5. In session 36, run the plsql to simulate the problem. Session 36 hang as expect and burn the CPU.</p>
<pre class="brush: plain; title: ; notranslate">
declare
f    varchar2(10) := 'test.data';
h    utl_file.file_type := utl_file.fopen('utl_dir', f, 'r', 32767);
buf  varchar2(32767);
BEGIN
  LOOP
    BEGIN
      BEGIn
        utl_file.get_line(h, buf);

        exception
        when no_data_found then
          exit;
        when utl_file.read_error then
          utl_file.fclose_all;
          dbms_output.put_line('read error : ' || sqlerrm);
          raise;  -- The read error exception is raised here!
        when others then
          utl_file.fclose_all;
          dbms_output.put_line('others2 : ' || sqlerrm);
          raise;
      end;
      exception
      when others then -- The read_error exception is catched here without exit.
        dbms_output.put_line('others2 : ' || sqlerrm);
    end;
  end loop;
  utl_file.fclose(h);
end;
/
</pre>
<p>6. Query uga/pga of session 36 in another session. The uga/pga keep climbing up.</p>
<pre class="brush: plain; title: ; notranslate">
--The UGA usage climb to 140M
sid@CS11GR2&gt; @ses 36 session%mem

       SID NAME                                VALUE
---------- ------------------------------ ----------
        36 session uga memory              141289084
        36 session uga memory max          141289084
        36 session pga memory              141844524
        36 session pga memory max          141844524
        36 cell num smart IO sessions in           0

--The UGA usage climb to 330M
sid@CS11GR2&gt; /

       SID NAME                                VALUE
---------- ------------------------------ ----------
        36 session uga memory              333239244
        36 session uga memory max          333239244
        36 session pga memory              333865004
        36 session pga memory max          333865004

--The UGA usage climb to 2.4G
sid@CS11GR2&gt; /

       SID NAME                                VALUE
---------- ------------------------------ ----------
        36 session uga memory             2497428164
        36 session uga memory max         2497428164
        36 session pga memory             2498846764
        36 session pga memory max         2498846764
</pre>
<p>7. Query OS CPU activity and memory usage, the physical free memory reduced from 1.6G to 110M, and page cache reduce from 646M to 361M. From the top command result, the process 31063 is the server process of session 36. The process 31063 has already consumed 2G memory and is looping on the syscall getrusage() and kill the CPU cycle.</p>
<pre class="brush: plain; title: ; notranslate">
oracle@dbserver:~/scripts$ free -m
             total       used       free     shared    buffers     cached
Mem:          3925       3815        110          0          4        361
-/+ buffers/cache:       3449        476
Swap:         1992       1009        983

oracle@dbserver:~/scripts$ top -d 5
top - 13:58:56 up 99 days, 19:24,  6 users,  load average: 1.73, 1.25, 0.73
Tasks: 270 total,   2 running, 264 sleeping,   0 stopped,   4 zombie
Cpu0  : 96.0%us,  4.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Cpu1  :  0.8%us,  3.2%sy,  0.0%ni,  0.0%id, 96.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   4020164k total,  3906520k used,   113644k free,     4032k buffers
Swap:  2040212k total,  1054512k used,   985700k free,   360780k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
31063 oracle    20   0 2912m 2.3g  22m R  101 59.3  11:44.25 oraclecs11gR2 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
   33 root      15  -5     0    0    0 D    2  0.0  29:04.40 [kswapd0]
</pre>
<p>8. Finally, The session 36 abort with ORA-04030, as expected.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; declare
  2  f    varchar2(10) := 'test.data';

....

 29  end;
 30  /
read error : ORA-29284: file read error
others2 : ORA-29284: file read error
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
others2 : ORA-29282: invalid file ID
declare
*
ERROR at line 1:
ORA-04030: out of process memory when trying to allocate 16356 bytes (koh-kghu sessi,pl/sql vc2)
ORA-29282: invalid file ID
</pre>
<p>To work around the application bug, we add a pre-process step, to split the lines exceeding 32767 into multiple lines, thus the following call on UTL_FILE.GET_LINE will not encounter any line exceeding 32k.</p>
<p>For a DBA, the best way to troubleshooting such tricky issues is to look into the code, and reproduce the problem, pair with the developer. I wonder if such pair can be called Agile pratice for a DBA. Because it&#8217;s the developers who design and code the application, they know the application better than DBA. The combination of oracle skills from DBA and application domain knowledge from the developer, just makes the troubleshooting much easier and efficient.</p>
<p>2010-09-22 古格王朝遗址<a href="https://lh5.googleusercontent.com/-j56OEgDH_mE/T3WyiBIHahI/AAAAAAAAYsI/OyJDY1Fu1hY/s1440/DSC_2777.JPG" target="_blank"><br />
<img class="Picasa" src="https://lh5.googleusercontent.com/-j56OEgDH_mE/T3WyiBIHahI/AAAAAAAAYsI/OyJDY1Fu1hY/s800/DSC_2777.JPG" alt="" /><br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://sid.gd/ora-04030-and-utl_file-get_line/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Load SQL Plan Baseline</title>
		<link>http://sid.gd/load-sql-plan-baseline/</link>
		<comments>http://sid.gd/load-sql-plan-baseline/#comments</comments>
		<pubDate>Mon, 19 Mar 2012 11:49:28 +0000</pubDate>
		<dc:creator>Sidney Chen</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Baseline]]></category>
		<category><![CDATA[Plan]]></category>

		<guid isPermaLink="false">http://sid.gd/?p=1633</guid>
		<description><![CDATA[coe_load_sql_baseline.sql是Oracle在SQLT中提供的一个脚本(Note 215187.1), 在11g中可以用来锁定执行计划. 10g对应的脚本是coe_load_sql_profile.sql. DESCRIPTION This script loads a plan from a modified SQL into the SQL Plan Baseline of the original SQL. If a good performing plan only reproduces with CBO Hints then you can load the plan of &#8230; <a href="http://sid.gd/load-sql-plan-baseline/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>coe_load_sql_baseline.sql是Oracle在SQLT中提供的一个脚本(Note 215187.1), 在11g中可以用来锁定执行计划. 10g对应的脚本是coe_load_sql_profile.sql.</p>
<blockquote><p>
DESCRIPTION<br />
This script loads a plan from a modified SQL into the SQL Plan Baseline of the original SQL.  If a good performing plan only reproduces with CBO Hints then you can load the plan of the modified version of the SQL into the SQL Plan Baseline of the orignal SQL.  In other words, the original SQL can use the plan that was generated out of the SQL with hints.</p>
<p>PRE-REQUISITES<br />
  1. Have in cache or AWR the text for the original SQL.<br />
  2. Have in cache the plan for the modified SQL<br />
     (usually with hints).</p>
<p>PARAMETERS<br />
  1. ORIGINAL_SQL_ID (required)<br />
  2. MODIFIED_SQL_ID (required)<br />
  3. PLAN_HASH_VALUE (required)
</p></blockquote>
<p><span id="more-1633"></span><br />
以下两个查询7qduyawktys8x和3wbc21k60rdpt, 7qduyawktys8x默认使用全表扫描, 对于3wbc21k60rdpt加上index hint使用索引扫描, 然后用coe_load_sql_baseline让7qduyawktys8x也使用3wbc21k60rdpt的执行计划:索引扫描.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; select
  2  	     max(n1)
  3  from
  4  	     t
  5  where
  6  	     id &gt; 0;

   MAX(N1)
----------
   1000000

sid@CS11GR2&gt;
sid@CS11GR2&gt; select * from table(dbms_xplan.display_cursor
  2  (null,null,'typical'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------
SQL_ID  7qduyawktys8x, child number 0
-------------------------------------
select  max(n1) from  t where  id &gt; 0

Plan hash value: 2966233522

--------------------------------------------------------
| Id  | Operation          | Name | Rows  | Cost (%CPU)|
--------------------------------------------------------
|   0 | SELECT STATEMENT   |      |       | 31314 (100)|
|   1 |  SORT AGGREGATE    |      |     1 |            |
|*  2 |   TABLE ACCESS FULL| T    |  1000K| 31314   (1)|
--------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(&quot;ID&quot;&gt;0)

19 rows selected.

sid@CS11GR2&gt; select /*+ index(t t_pk)*/
  2  	     max(n1)
  3  from
  4  	     t
  5  where
  6  	     id &gt; 0;

   MAX(N1)
----------
   1000000

sid@CS11GR2&gt;
sid@CS11GR2&gt; select * from table(dbms_xplan.display_cursor
  2  (null,null,'typical'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------
SQL_ID  3wbc21k60rdpt, child number 0
-------------------------------------
select /*+ index(t t_pk)*/  max(n1) from  t where  id &gt; 0

Plan hash value: 4270555908

------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Cost (%CPU)|
------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |       |   145K(100)|
|   1 |  SORT AGGREGATE              |      |     1 |            |
|   2 |   TABLE ACCESS BY INDEX ROWID| T    |  1000K|   145K  (1)|
|*  3 |    INDEX RANGE SCAN          | T_PK |  1000K|  2101   (1)|
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(&quot;ID&quot;&gt;0)

20 rows selected.
</pre>
<p>调用coe_load_sql_baseline, 传入三个参数, 除了为7qduyawktys8x导入3wbc21k60rdpt的执行计划4270555908, coe_load_sql_baseline还可以把存放Plan Baseline的staging表导出, 可以很方便地把这个Plan Baseline转移到其他环境的DB.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; @coe_load_sql_baseline 7qduyawktys8x 3wbc21k60rdpt 4270555908

....

 62  ORIGINAL_SQL_ID: &quot;7qduyawktys8x&quot;
 63  MODIFIED_SQL_ID: &quot;3wbc21k60rdpt&quot;
 64  PLAN_HASH_VALUE: &quot;4270555908&quot;

SQL&gt;SELECT signature, sql_handle, plan_name, enabled, accepted, fixed--, reproduced (avail on 11.2.0.2)
  2    FROM dba_sql_plan_baselines WHERE plan_name = :plan_name;

           SIGNATURE SQL_HANDLE           PLAN_NAME                      ENA ACC FIX
-------------------- -------------------- ------------------------------ --- --- ---
 5378234025623299482 SQL_4aa35359e832259a 7QDUYAWKTYS8X_3WBC21K60RDPT    YES YES NO

****************************************************************************
* Enter SID password to export staging table STGTAB_BASELINE_7qduyawktys8x
****************************************************************************

....

If you need to implement this SQL Plan Baseline on a similar system,
import and unpack using these commands:

imp SID file=STGTAB_BASELINE_7qduyawktys8x.dmp tables=STGTAB_BASELINE_7qduyawktys8x ignore=Y

SET SERVEROUT ON;
DECLARE
plans NUMBER;
BEGIN
plans := DBMS_SPM.UNPACK_STGTAB_BASELINE('STGTAB_BASELINE_7qduyawktys8x', 'SID');
DBMS_OUTPUT.PUT_LINE(plans||' plan(s) unpackaged');
END;
/
</pre>
<p>最后, 验证7qduyawktys8x使用了索引扫描的执行计划.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; select
  2          max(n1)
  3  from
  4          t
  5  where
  6          id &gt; 0;

   MAX(N1)
----------
   1000000

sid@CS11GR2&gt; select * from table(dbms_xplan.display_cursor
  2  (null,null,'typical'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------
SQL_ID  7qduyawktys8x, child number 2
-------------------------------------
select  max(n1) from  t where  id &gt; 0

Plan hash value: 4270555908

------------------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Cost (%CPU)|
------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |       |   145K(100)|
|   1 |  SORT AGGREGATE              |      |     1 |            |
|   2 |   TABLE ACCESS BY INDEX ROWID| T    |  1000K|   145K  (1)|
|*  3 |    INDEX RANGE SCAN          | T_PK |  1000K|  2101   (1)|
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(&quot;ID&quot;&gt;0)

Note
-----
   - SQL plan baseline 7QDUYAWKTYS8X_3WBC21K60RDPT used for this statement

24 rows selected.
</pre>
<p>2010-09-22 清晨, 象泉河畔<a href="https://lh4.googleusercontent.com/-egDo2BJftBQ/T2cb3kLu5fI/AAAAAAAAYhY/hjTGDdCIq2k/s912/DSC_2846.JPG" target="_blank"><br />
<img class="Picasa" src="https://lh4.googleusercontent.com/-egDo2BJftBQ/T2cb3kLu5fI/AAAAAAAAYhY/hjTGDdCIq2k/s912/DSC_2846.JPG" alt="" /><br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://sid.gd/load-sql-plan-baseline/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>类型转换</title>
		<link>http://sid.gd/oracle-data-type-conversion/</link>
		<comments>http://sid.gd/oracle-data-type-conversion/#comments</comments>
		<pubDate>Fri, 16 Mar 2012 11:55:49 +0000</pubDate>
		<dc:creator>Sidney Chen</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[internal_function]]></category>

		<guid isPermaLink="false">http://sid.gd/?p=1626</guid>
		<description><![CDATA[数据类型转换是数据库设计和开发中常见的错误, 经过型转换, 优化器可能无法精确地估算cardinality和cost, 导致次优的执行计划, 这里讨论三种情况: 1. Varchar2 &#8211;> Number: 如果把数字存为字符串, 查询的使用Number类型, 优化器会使用to_number函数把存储的字符串转为数字. 这种情况除了修改代码, 建立To_number的函数索引也可以解决. 2. Date &#8211;> Timestamp: 这是我见得最多的, 一些Java开发工具像Toplink, 默认时间类型是timestamp, 如果数据存储使用Date类型, 比较时优化器会加上internal_function转换, 对于internal_function, 无法通过增加函数索引解决. 3. Char &#8211;> Varchar2: 这种情况比较隐蔽, 如果不观察执行计划中的绑定变量, 或者用10046时间记录绑定变量, 可能不会发现. Char和Varchar一样的字符串比较, 可能会导致优化器错误的估算. 测试环境是10.2.0.3 Linux 32bit. 先准备一张表T, 一百万条记录: &#8230; <a href="http://sid.gd/oracle-data-type-conversion/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>数据类型转换是数据库设计和开发中常见的错误, 经过型转换, 优化器可能无法精确地估算cardinality和cost, 导致次优的执行计划, 这里讨论三种情况:</p>
<p>1. Varchar2 &#8211;> Number: 如果把数字存为字符串, 查询的使用Number类型, 优化器会使用to_number函数把存储的字符串转为数字. 这种情况除了修改代码, 建立To_number的函数索引也可以解决.</p>
<p>2. Date &#8211;> Timestamp: 这是我见得最多的, 一些Java开发工具像Toplink, 默认时间类型是timestamp, 如果数据存储使用Date类型, 比较时优化器会加上internal_function转换, 对于internal_function, 无法通过增加函数索引解决.</p>
<p>3. Char &#8211;> Varchar2: 这种情况比较隐蔽, 如果不观察执行计划中的绑定变量, 或者用10046时间记录绑定变量, 可能不会发现. Char和Varchar一样的字符串比较, 可能会导致优化器错误的估算.<br />
<span id="more-1626"></span><br />
测试环境是10.2.0.3 Linux 32bit. 先准备一张表T, 一百万条记录:<br />
1. V1列把 mod(id,1000) 存为Varchar2(10), 与数字类型比较, 测试 Varchar2 &#8211;> Number.<br />
2. V2列九十万数据是字符串&#8217;BIG&#8217;, 存为Varchar2(10), 与类型为Char(3)的&#8217;BIG&#8217;, 测试 Char &#8211;> Varchar2. 在V2上收集柱状图统计信息, 验证char和varchar2比较时优化器会不会得出错误的Cardinality.<br />
3. D1列是Date类型, 一百万数据分布在过去的一千天, 测试 Date &#8211;> Timestamp<br />
4. ID是主键, n1和pad是打酱油的.</p>
<p>在列V1, V2和D1创建索引, 比较全表扫描和索引扫描之间的性能差别.</p>
<pre class="brush: plain; title: ; notranslate">
drop table t purge;

exec dbms_random.seed(UID);

create table t
(
	id	number,
	n1	number,
	v1	varchar2(10),
	v2	varchar2(10),
	d1	date,
	pad	varchar2(1000)
);

insert /*+ append */ into t
with
generator as (
	select
		level
	from
		dual connect by level &lt;= 1000)
select
	rownum id,
	rownum n1,
	to_char(mod(rownum, 1000)) v1,
	case mod(rownum, 10)
		when 0 then lpad(mod(rownum, 1000), 3, '0')
		else 'BIG'
	end v2,
	trunc(sysdate) - dbms_random.value(0,1000) d1,
	lpad(rownum, 1000, '0') pad
from
	generator, generator;

begin
	dbms_stats.gather_table_stats(user,'t', --
 		method_opt=&gt;'for columns size 1 id, n1, v1, d1, pad, v2 size 254');
end;
/

alter table t add constraint t_pk primary key(id)
using index(create unique index t_pk on t(id) nologging);

create index t_v1 on t(v1) nologging;
create index t_v2 on t(v2) nologging;
create index t_d1 on t(d1) nologging;

select
	column_name,
	num_distinct,
	density,
	num_buckets,
	histogram
from
	user_tab_cols
where
	table_name = 'T';

COL NUM_DISTINCT    DENSITY NUM_BUCKETS HISTOGRAM
--- ------------ ---------- ----------- ---------
PAD       998088 1.0019E-06           1 NONE
D1        997595 1.0024E-06           1 NONE
V2           101 5.0630E-07         100 FREQUENCY
V1          1001 .000999001           1 NONE
N1        998088 1.0019E-06           1 NONE
ID        998088 1.0019E-06           1 NONE

6 rows selected.
</pre>
<p>执行计划用dbms_xplan.display_cursor打印, 选项是&#8217;iostats last peeked_binds&#8217;. 这里关注两点:<br />
1. 比较E-Rows和A-Rows, 观察优化器在类型转换时能否做出正确的估算.<br />
2. 通过逻辑读buffers比较执行计划的性能差异.</p>
<p><strong>Varchar2 &#8211;> Number</strong></p>
<p>变量n1是Number, V1是Varchar2; n1和V1列比较时, 条件从 v1 = :n1 变成to_number(v1) = n1, 导致V1列上的索引T_V1无法命中, 只能全表扫描; 与索引扫描相比, 逻辑读是163K比1005. 解决的方法可以修改代码, 使用Varchar2类型, 或者建立函数索引to_number(v1), 或者把V1列存为数据.</p>
<pre class="brush: plain; title: ; notranslate">
sid@V10&gt; set serveroutput off

sid@V10&gt; variable n1 number;
sid@V10&gt; variable v1 varchar2(1)
sid@V10&gt; exec :n1 := 1; :v1 := '1';

PL/SQL procedure successfully completed.

sid@V10&gt; select /*+ gather_plan_statistics*/
  2  	     max(n1)
  3  from
  4  	     t
  5  where
  6  	     v1 = :n1;

   MAX(N1)
----------
    999001

sid@V10&gt; select * from table(dbms_xplan.display_cursor
  2  (null,null,'iostats last peeked_binds'));

---------------------------------------------------------------
| Id  | Operation          | Name | E-Rows | A-Rows | Buffers |
---------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     163K|
|*  2 |   TABLE ACCESS FULL| T    |    997 |   1000 |     163K|
---------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (NUMBER): 1

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(TO_NUMBER(&quot;V1&quot;)=:N1)

23 rows selected.

sid@V10&gt; select /*+ gather_plan_statistics*/
  2  	     max(n1)
  3  from
  4  	     t
  5  where
  6  	     v1 = :v1;

   MAX(N1)
----------
    999001

sid@V10&gt; select * from table(dbms_xplan.display_cursor
  2  (null,null,'iostats last peeked_binds'));

-------------------------------------------------------------------------
| Id  | Operation                    | Name | E-Rows | A-Rows | Buffers |
-------------------------------------------------------------------------
|   1 |  SORT AGGREGATE              |      |      1 |      1 |    1005 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T    |    997 |   1000 |    1005 |
|*  3 |    INDEX RANGE SCAN          | T_V1 |    999 |   1000 |       5 |
-------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (VARCHAR2(30), CSID=873): '1'

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(&quot;V1&quot;=:V1)
</pre>
<p><strong>Date &#8211;> Timestamp</strong></p>
<p>分别用date和timestamp的变量查询2012年第一天最大的n1. 由于绑定变量类型不同, 对于sql 1r12rhar4jt6u产生两个执行计划. Child number=1是使用date类型的执行计划, 优化器正确的估算cardinality=1000, 选择索引扫描路径, 一次执行消耗973个逻辑读, 其中5个用于索引T_D1的扫描, 其他968个逻辑读花在TABLE ACCESS BY INDEX ROWID的操作上. Child number=0是使用timestamp的执行计划, 经过转换之后cardinality是2495, 与1000看起来没有相差太多, 但是优化器却选择了全表扫描, 消耗了163K的逻辑读. 在dbms_xplan.display_curs的输出中, 从Predicate Information中可以看到, 通过函数internal_function, D1列被转成timestamp与:B1和:B2比较, 因此无法使用索引T_D1, 只能全表扫描, Peeked Binds部分对于两个Timestamp的变量没有任何输出. 这种情况, 无法用Index Hint强制使用T_D1, 或者建立函数索引internal_function(D1), 只能修改代码使用Date类型, 或者把日期改存为Timestamp.</p>
<pre class="brush: plain; title: ; notranslate">
sid@V10&gt; declare
  2  v_d1 date      := to_date('2012-01-01','yyyy-mm-dd');
  3  v_d2 date      := to_date('2012-01-02','yyyy-mm-dd');
  4  v_t1 timestamp := to_timestamp('2012-01-01','yyyy-mm-dd');
  5  v_t2 timestamp := to_timestamp('2012-01-02','yyyy-mm-dd');
  6  n number;
  7  begin
  8
  9  select /*+ gather_plan_statistics*/
 10      max(n1) into n
 11  from
 12      t
 13  where
 14      d1 between v_t1 and v_t2;
 15
 16  select /*+ gather_plan_statistics*/
 17          max(n1) into n
 18  from
 19      t
 20  where
 21      d1 between v_d1 and v_d2;
 22
 23  end;
 24  /

PL/SQL procedure successfully completed.

sid@V10&gt; select * from table(dbms_xplan.display_cursor
   2   ('1r12rhar4jt6u',null,'iostats last peeked_binds'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------
SQL_ID  1r12rhar4jt6u, child number 0
-------------------------------------
SELECT /*+ gather_plan_statistics*/ MAX(N1) FROM T WHERE D1 BETWEEN :B2 AND :B1

Plan hash value: 1010173228

----------------------------------------------------------------
| Id  | Operation           | Name | E-Rows | A-Rows | Buffers |
----------------------------------------------------------------
|   1 |  SORT AGGREGATE     |      |      1 |      1 |     163K|
|*  2 |   FILTER            |      |        |    968 |     163K|
|*  3 |    TABLE ACCESS FULL| T    |   2495 |    968 |     163K|
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:B2&lt;=:B1)
   3 - filter((INTERNAL_FUNCTION(&quot;D1&quot;)&gt;=:B2 AND INTERNAL_FUNCTION(&quot;D1&quot;)&lt;=:B1))

SQL_ID  1r12rhar4jt6u, child number 1
-------------------------------------
SELECT /*+ gather_plan_statistics*/ MAX(N1) FROM T WHERE D1 BETWEEN :B2 AND :B1

Plan hash value: 648301450

--------------------------------------------------------------------------
| Id  | Operation                     | Name | E-Rows | A-Rows | Buffers |
--------------------------------------------------------------------------
|   1 |  SORT AGGREGATE               |      |      1 |      1 |     973 |
|*  2 |   FILTER                      |      |        |    968 |     973 |
|   3 |    TABLE ACCESS BY INDEX ROWID| T    |   1000 |    968 |     973 |
|*  4 |     INDEX RANGE SCAN          | T_D1 |   1002 |    968 |       5 |
--------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (DATE): 01/01/12 00:00:00
   2 - (DATE): 01/02/12 00:00:00

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(:B2&lt;=:B1)
   4 - access(&quot;D1&quot;&gt;=:B2 AND &quot;D1&quot;&lt;=:B1)

47 rows selected.
</pre>
<p><strong>Char &#8211;> Varchar2</strong></p>
<p>V2列是&#8217;BIG&#8217;的数据有900K. 使用char(3)比较时, Cardinality只有1, 严重偏离实际的数据量, 选择索引扫描路径, 逻辑读是165K. 使用Varchar2(3)优化器算出正确的Cardinality 899K, 选择全部扫描, 逻辑读是163K. 这个例子虽然逻辑读相差不大, 实际中这种成本估算错误, 很容易导致次优的执行计划. </p>
<pre class="brush: plain; title: ; notranslate">
sid@V10&gt; variable c2 char(3);
sid@V10&gt; variable v2 varchar2(3);
sid@V10&gt; exec :c2 := 'BIG'; :v2 := 'BIG';

PL/SQL procedure successfully completed.

sid@V10&gt; select /*+ gather_plan_statistics*/
  2  	     max(n1)
  3  from
  4  	     t
  5  where
  6  	     v2 = :c2;

   MAX(N1)
----------
    999999

sid@V10&gt; select * from table(dbms_xplan.display_cursor
  2  (null,null,'iostats last peeked_binds'));

-------------------------------------------------------------------------
| Id  | Operation                    | Name | E-Rows | A-Rows | Buffers |
-------------------------------------------------------------------------
|   1 |  SORT AGGREGATE              |      |      1 |      1 |     165K|
|   2 |   TABLE ACCESS BY INDEX ROWID| T    |      1 |    900K|     165K|
|*  3 |    INDEX RANGE SCAN          | T_V2 |      1 |    900K|    1886 |
-------------------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (CHAR(30), CSID=873): 'BIG'

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(&quot;V2&quot;=:C2)

24 rows selected.

sid@V10&gt; select /*+ gather_plan_statistics*/
  2  	     max(n1)
  3  from
  4  	     t
  5  where
  6  	     v2 = :v2;

   MAX(N1)
----------
    999999

sid@V10&gt; select * from table(dbms_xplan.display_cursor
  2  (null,null,'iostats last peeked_binds'));

---------------------------------------------------------------
| Id  | Operation          | Name | E-Rows | A-Rows | Buffers |
---------------------------------------------------------------
|   1 |  SORT AGGREGATE    |      |      1 |      1 |     163K|
|*  2 |   TABLE ACCESS FULL| T    |    899K|    900K|     163K|
---------------------------------------------------------------

Peeked Binds (identified by position):
--------------------------------------

   1 - (VARCHAR2(30), CSID=873): 'BIG'

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(&quot;V2&quot;=:V2)

23 rows selected.
</pre>
<p>观察执行计划时, 如果Cardinality有明显的错误, 可以查看Predicate Information, 确定有没有internal_function/to_number这种转换函数, 或者Peeked Binds中有没有意想不到的数据类型, 判断有没有存在类型转换.</p>
<p>随着版本演进, 优化器越来越智能, 比如第三种情况, 使用CHAR(3)的字符串&#8217;BIG&#8217;和V2比较, 在版本10.2.0.5和11.2.0.2中, 没有导致错误的Cardinality. 不过为了避免类型转换带来的麻烦, 开发或者数据库设计时, 数据类型的选择需要谨慎对待.</p>
<p>2010-09-22 札达县托林寺<a href="https://lh3.googleusercontent.com/-4zFvM7z0F8k/T2MpfILDCzI/AAAAAAAAYb8/nk0Wpab-G6M/s1440/DSC_2874.JPG" target="_blank"><br />
<img class="Picasa" src="https://lh3.googleusercontent.com/-4zFvM7z0F8k/T2MpfILDCzI/AAAAAAAAYb8/nk0Wpab-G6M/s800/DSC_2874.JPG" alt="" /><br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://sid.gd/oracle-data-type-conversion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ORA-01555和延迟块清除</title>
		<link>http://sid.gd/ora-01555_deplaye_block_cleanout/</link>
		<comments>http://sid.gd/ora-01555_deplaye_block_cleanout/#comments</comments>
		<pubDate>Tue, 06 Mar 2012 12:31:06 +0000</pubDate>
		<dc:creator>Sidney Chen</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[delayed_block_cleanout]]></category>
		<category><![CDATA[ora-01555]]></category>
		<category><![CDATA[redo]]></category>
		<category><![CDATA[undo]]></category>

		<guid isPermaLink="false">http://sid.gd/?p=1601</guid>
		<description><![CDATA[出现ORA-01555直接的原因是一致读所需的undo records被覆盖, 一致读失败有两种情况: 1. 数据块中ITL结构对应的undo block被覆盖, 无法构造一致读. 2. transaction table in undo segment header: 延迟块清除发生时, 如果Oracle需要回滚对应的transaction table, 找到事务确切提交的时间. 而且所需的undo record被覆盖, ORA-01555也会发生. 这篇文章不讨论一致读和延迟块清除的实现机制, 有兴趣的朋友可以参考Oracle Core第三章Transactions and Consistency. 这里只讨论因为延迟块清除而发生的ORA-01555. 下面是模拟延迟块清除触发ORA-01555的思路, 准备两个session. 1. session 1: 创建表T1, 插入500条记录分布在500个块上. 更新表T1的500条记录, 把500个脏块刷出缓存. 2. session 1: 提交. &#8230; <a href="http://sid.gd/ora-01555_deplaye_block_cleanout/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<pre class="brush: plain; title: ; notranslate">
01555, 00000, &quot;snapshot too old: rollback segment number %s
with name \&quot;%s\&quot; too small&quot;
// *Cause: rollback records needed by a reader for consistent read
//         are overwritten by other writers
// *Action: If in Automatic Undo Management mode, increase
//	undo_retention setting. Otherwise, use larger rollback segments
</pre>
<p>出现ORA-01555直接的原因是一致读所需的undo records被覆盖, 一致读失败有两种情况:<br />
1. 数据块中ITL结构对应的undo block被覆盖, 无法构造一致读.<br />
2. transaction table in undo segment header: 延迟块清除发生时, 如果Oracle需要回滚对应的transaction table, 找到事务确切提交的时间. 而且所需的undo record被覆盖, ORA-01555也会发生.<br />
<span id="more-1601"></span><br />
这篇文章不讨论一致读和延迟块清除的实现机制, 有兴趣的朋友可以参考<a href="http://www.eygle.com/archives/2011/11/jonathan_lewis_oracle_core.html" target="_blank">Oracle Core</a>第三章Transactions and Consistency. 这里只讨论因为延迟块清除而发生的ORA-01555.</p>
<p>下面是模拟延迟块清除触发ORA-01555的思路, 准备两个session.</p>
<p>1. session 1: 创建表T1, 插入500条记录分布在500个块上. 更新表T1的500条记录, 把500个脏块刷出缓存.</p>
<p>2. session 1: 提交. 这时commit cleanout会失败, Oracle把清理T1磁盘上500个”脏”块的任务留给下一个读这些块的session.</p>
<p>3. session 1: 设置查询的开始时间点: set transaction read only. 在这个事务结束之前, 显式地提交或者回滚, 接下来的查询以这个时间点为基准, 模拟现实中长时间运行的SQL.</p>
<p>4. session 2: 提交大量与表T1无关的事务. 发生transaction table consistent read rollbacks有两个原因. 首先, transaction table slot至少被覆盖一次, session 1之后访问T1&#8243;脏”块做延迟块清除时, 需要回滚transaction table. 其次, 如果我们产生足够多的undo, session 1回滚transaction table所需的undo block被覆盖, 无法对transaction table一致读, 就会触发ORA-01555. </p>
<p>我的环境中, undotbs1有26个segment, 每个transaction table有34个slot(10g版本是48个slot), 如果我们把每个slot覆盖50遍的需要执行44200(=26*34*50)个transaction. 在Automatic Undo Management模式下, 如果undo表空间用满, 最早commit的空间会被覆盖重新利用. undotbs1大小1G, 只要session 2产生超过1G的undo, 用以回滚transaction table的undo block就会被覆盖. 为了产生1G的undo, 执行44200个transaction, 一个transaction大约需要25K的undo. 保险起见确保每个transaction产生30k左右的undo.</p>
<p>5. session 1: 全表扫描T1, 因为延迟块清除, 我们可以观察到transaction tables consistent read rollbacks和transaction tables consistent reads &#8211; undo records applied.</p>
<p>环境: Oracle 11.2.0.2 Linux 32 bit, Automatic Undo Management(AUM), undotbs1表空间大小是1G, 包含26个回滚段.</p>
<pre class="brush: plain; title: ; notranslate">
sys@CS11GR2&gt; @pd2 undo_management

NAME              VALUE  DESCRIPTION
----------------- ------ ---------------------------------------------------
undo_management   AUTO   instance runs in SMU mode if TRUE, else in RBU mode

sys@CS11GR2&gt; @pd2 undo_tablespace

NAME             VALUE     DESCRIPTION
---------------- --------- -----------------------------
undo_tablespace  UNDOTBS1  use/switch undo tablespace

sys@CS11GR2&gt; @df UNDOTBS1

TABLESPACE_NAME         TotalMB     UsedMB     FreeMB % Used
-------------------- ---------- ---------- ---------- ------
UNDOTBS1                   1024       1024          0   100%

sys@CS11GR2&gt; select
  2          count(*)
  3  from
  4          dba_rollback_segs
  5  where
  6          tablespace_name = 'UNDOTBS1'
  7  /

  COUNT(*)
----------
        26
</pre>
<p><strong>观察延迟块清除</strong></p>
<p>1. session 1: 准备表T1和T2. 更新T1之后把500个块刷出缓存.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; create table t1 (
  2  	     id 		     number,
  3  	     small_no	     number(5,2),
  4  	     small_vc	     varchar2(10),
  5  	     padding	     varchar2(1000),
  6  	     constraint t1_pk primary key (id)
  7  )
  8  pctfree 90
  9  pctused 10
 10  ;

Table created.

sid@CS11GR2&gt;
sid@CS11GR2&gt; insert into t1
  2  select
  3  	     rownum,
  4  	     1+ trunc(rownum/10),
  5  	     lpad(rownum,10),
  6  	     rpad('x',1000)
  7  from
  8  	     all_objects
  9  where
 10  	     rownum &lt;= 500
 11  ;

500 rows created.

sid@CS11GR2&gt;
sid@CS11GR2&gt; create table t2 (n1 number, v1 varchar2(1000));

Table created.

sid@CS11GR2&gt; insert into t2 values (0, lpad(0,1000,'0'));

1 row created.

sid@CS11GR2&gt; commit;

Commit complete.

-- gather statistics on T1 and T2

sid@CS11GR2&gt; update
  2  	     /*+ index(t1) */
  3  	     t1
  4  set
  5  	     small_vc = small_vc + 1
  6  ;

500 rows updated.

sid@CS11GR2&gt; alter system checkpoint;

System altered.

sid@CS11GR2&gt; alter system flush buffer_cache;

System altered.
</pre>
<p>2. session 1: 提交. 虽然有500个块被修改了, Oracle尝试100次commit cleanout都失败之后选择放弃. 500个脏块的redo record在刷出缓存之前已经被写到redo log, 所以commit的时候只产生一条164 bytes的redo record.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; commit;

Commit complete.

Name                                   Value
----                                   -----
commit cleanout failures: block lost     100
commit cleanouts                         100
redo entries                               1
redo size                                164

3. session 1: 记录当前的SCN, set transaction read only.

sid@CS11GR2&gt; select
  2  	 sys.dbms_flashback.get_system_change_number post_commit_scn
  3  from
  4  	 dual
  5  ;

POST_COMMIT_SCN
---------------
     7780465327

sid@CS11GR2&gt;
sid@CS11GR2&gt; set transaction read only;

Transaction set.
</pre>
<p>4. session 2: 执行44200个小事务, 把每个transaction table slot覆盖50次. 只产生75M的redo, undo的大小是52M, 因为undotbs1有1G, 52M不足以覆盖之前的undo block, 没有ORA-01555的危险.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; begin
  2  	     for i in 1..44200 loop
  3  		     update t2 set n1 = i;
  4  		     commit;
  5  	     end loop;
  6  end;
  7  /

PL/SQL procedure successfully completed.

Name                                       Value
----                                       -----
user commits                              44,200
redo entries                              88,407
redo size                             75,116,240
undo change vector size               52,863,176
</pre>
<p>5. session 1全表扫描T1, 调用ktugct(Kernel Transaction Undo Get Commit Time)500次, 做500次清除, 1次transaction tables consistent read rollbacks, 这次一致读需要apply 4,869条undo record, 本来我期望的undo record是1700(26*50), 这么大的差别是因为26个回滚段中, 只有10个状态是online的, 4869 接近 4420 (= 1700 * 2.6). 所有5,885(consistent gets)个逻辑读中, 花在undo上的逻辑读是5372(consistent gets &#8211; examination). 延迟块清除操作在实际中, 可能对性能有很大的影响, 见我之前一个例子: <a href="http://sid.gd/transaction-tables-consistent-reads/" target="_blank">Transaction Tables Consistent Reads</a>. 查询语句也可能产生redo, 延迟块清除产生500条redo record, 大小36k, 这500个数据块在缓存中被标记被dirty. 清除之后, 表T1的ora_rowscn取决于查询开始的SCN, 也就是第2步set transaction read only时的SCN. </p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; select
  2  	 sys.dbms_flashback.get_system_change_number after_batch_scn
  3  from
  4  	 dual
  5  ;

AFTER_BATCH_SCN
---------------
     7780564259

sid@CS11GR2&gt;
sid@CS11GR2&gt; execute snap_my_stats.start_snap

PL/SQL procedure successfully completed.

sid@CS11GR2&gt;
sid@CS11GR2&gt; select
  2  	     /*+ full(t1) */
  3  	     count(*)
  4  from
  5  	     t1
  6  ;

  COUNT(*)
----------
       500

sid@CS11GR2&gt; execute snap_my_stats.end_snap

Name                                                         Value
----                                                         -----
consistent gets                                              5,885
consistent gets - examination                                5,372
redo entries                                                   500
redo size                                                   36,044
transaction tables consistent reads - undo records applied   4,869
transaction tables consistent read rollbacks                     1
cleanouts only - consistent read gets                          500
immediate (CR) block cleanout applications                     500
commit txn count during cleanout                               500
cleanout - number of ktugct calls                              500

sid@CS11GR2&gt; select
  2  	     ora_rowscn, count(*)
  3  from
  4  	     t1
  5  group by
  6  	     ora_rowscn
  7  order by
  8  	     count(*)
  9  ;

ORA_ROWSCN   COUNT(*)
---------- ----------
7780465326        500
</pre>
<p><strong>触发ORA-01555</strong></p>
<p>前面的3个步骤不变</p>
<p>4. session 2依然执行44200个事务, 每个事务执行update t2 set v1 = lowner(v1) 30次. 一共产生3.5G的redo, 其中undo change的大小是1.6G, 这样确保transaction table consistent read rollbacks所需的undo会被覆盖.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; begin
  2  	     for i in 1..44200 loop
  3  		     for i in 1..30 loop
  4  			     update t2 set v1 = lower(v1);
  5  			     commit;
  6  		     end loop;
  7  	     end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

Name                                 Value
----                                 -----
user commits                     1,326,000
redo entries                     2,691,795
redo size                    3,578,966,396
undo change vector size      1,668,229,604
</pre>
<p>5. session 1全表扫描T1, ORA-01555如期发生. ktugct只被调用一次, transaction table consistent read rollbacks没有改变, 说明对第一个读到的数据块做清除就失败了. session 1在apply了65,910条undo record之后, 发现回滚需要的下一个undo block已经被覆盖, 这时ORA-01555就发生了.</p>
<pre class="brush: plain; title: ; notranslate">
sid@CS11GR2&gt; select
  2  	     /*+ full(t1) */
  3  	     count(*)
  4  from
  5  	     t1
  6  ;
	t1
	*
ERROR at line 5:
ORA-01555: snapshot too old: rollback segment number 10
with name &quot;_SYSSMU10_3805322843$&quot; too small

Name                                                        Value
----                                                        -----
consistent gets                                            65,926
consistent gets - examination                              65,915
transaction tables consistent reads - undo records applied 65,910
cleanout - number of ktugct calls                               1
</pre>
<p>ORA-01555常见的原因有undo表空间不够和SQL长时间执行, 如果是因为延迟块清除, 可以从会话统计信息中找到线索: transaction tables consistent read rollbacks和transaction tables consistent reads &#8211; undo records applied.</p>
<p>关于延迟块清除还有一点很有趣, 如果走direct path, 同样会做清除, 但不产生redo record, 在buffer pool中的数据块不会被修改, 接下来的查询需要继续做清除. 并行查询在11g已经不一定是direct path, 所以并行查询做清除时是否产生redo record, 取决于有没有走direct path.</p>
<p><strong>更新: 2012/03/07</strong><br />
对于session 2产生的undo change不用猜测, 统计信息undo change vector size记录了undo chnage的大小.</p>
<p>PS. 因为我已经不再写<a href="http://book.sid.gd/html/general/index.html" target="_blank">2010年在云南/西藏/尼泊尔旅行的游记</a>, 我决定以后写Oracle文章的时候在最后放一张照片, 希望看到的朋友会喜欢.</p>
<p>2010-09-20 阿里札达土林<a href="https://lh3.googleusercontent.com/-C2Fxnda50VY/T1YBjrk-V2I/AAAAAAAAYXU/Iky9akEhEOU/s1440/DSC_2699.JPG" target="_blank"><br />
<img class="Picasa" src="https://lh3.googleusercontent.com/-C2Fxnda50VY/T1YBjrk-V2I/AAAAAAAAYXU/Iky9akEhEOU/s800/DSC_2699.JPG" alt="" /><br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://sid.gd/ora-01555_deplaye_block_cleanout/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>In Memory Undo</title>
		<link>http://sid.gd/in-memory-undo/</link>
		<comments>http://sid.gd/in-memory-undo/#comments</comments>
		<pubDate>Wed, 29 Feb 2012 14:08:54 +0000</pubDate>
		<dc:creator>Sidney Chen</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[IMU]]></category>
		<category><![CDATA[log buffer]]></category>
		<category><![CDATA[redo]]></category>
		<category><![CDATA[undo]]></category>

		<guid isPermaLink="false">http://sid.gd/?p=1588</guid>
		<description><![CDATA[早上读老熊的博客Hint的常见错误使用方式, 评论中有朋友提到, 表没有nologging属性, 插入语句之后, 会话的统计信息redo size为0. 是不是统计信息出错? 我猜想是因为In Memory Undo(IMU)的缘故, 下午做了一下测试, 顺便整理IMU的知识. Oracle采用write-ahead logging的策略, 优先写redo, 保证事物的持久性. write-ahead有两层含义 1. Log 优先于data: 对data(表和索引)和undo更改之前, Oracle产生相应的change vecter, 合并成一条redo record, copy到log buffer里, 然后才根据change vector更改data和undo. 2. lgwr优先于dbwr: data和undo从buffer cahce写到磁盘之前, 需要lgwr把它们对应的redo record写到online redo file. 我们这里关注第一点, 10g之前只有一个log buffer, &#8230; <a href="http://sid.gd/in-memory-undo/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>早上读老熊的博客<a href="http://www.laoxiong.net/common-incorrect-using-hints.html" target="_blank">Hint的常见错误使用方式</a>, 评论中有朋友提到, 表没有nologging属性, 插入语句之后, 会话的统计信息redo size为0. 是不是统计信息出错? 我猜想是因为In Memory Undo(IMU)的缘故, 下午做了一下测试, 顺便整理IMU的知识.<br />
<span id="more-1588"></span><br />
Oracle采用write-ahead logging的策略, 优先写redo, 保证事物的持久性. write-ahead有两层含义<br />
<strong>1. </strong> Log 优先于data: 对data(表和索引)和undo更改之前, Oracle产生相应的change vecter, 合并成一条redo record, copy到log buffer里, 然后才根据change vector更改data和undo.<br />
<strong>2. </strong> lgwr优先于dbwr: data和undo从buffer cahce写到磁盘之前, 需要lgwr把它们对应的redo record写到online redo file.</p>
<p>我们这里关注第一点, 10g之前只有一个log buffer, 每一次DML产生的redo record都会马上被copy到log buffer, 并发的小事务容易形成对log buffer的争用. Oracle 10g引入IMU减少小事务对log buffer的使用次数. IMU是在shared pool里两组一一对应的结构: Private undo buffer和 private redo buffer(称为private redo thread或者redo strand). Private undo buffer存放undo的change vector, private redo buffer存放data的change vector. 当private undo buffer或者private redo buffer用完的时候, 或者当用户提交的时候, Oracle会把两组change vector合并成一条redo record, 一起copy到一个公共 log buffer. 还有其他原因会导致change vector被刷到一个公共log buffer, 比如checkpoint的时候. 使用IMU, change Vector产生的数量大小并没有改变, 只是把change vector copy到公共 redo buffer这个动作往后推迟了, 批量处理, 减少对公共的log buffer争用.</p>
<p>对于一个IMU事务, 会话的统计信息, 比如”redo size”, “redo entries”, “IMU Flush”只有在change vector被copy到公共的log buffer时才会增加. 一个insert语句之后, 如果change vector还在IMU里, 相应的统计信息就不会增加. 下面是我在10.2.0.5 Linux 32bit的测试.</p>
<p><strong>1. </strong>确定IMU已经启用. _in_memory_undo这个隐含参数用于打开或者关闭IMU. x$ktifp保存private undo buffer的信息, 其中ktifprpb指向对应private redo buffer的起始位置. x$kcrfstrand保存了所有redo buffer的信息, 开始两个log buffer是公共的. Shared Pool分配了有24组buffer, 每个buffer大小64k左右. 其中6组是活跃, Oracle会根据负载动态调整活跃buffer的数目. x$ktiff记录IMU的统计信息. </p>
<pre class="brush: sql; title: ; notranslate">
sys@SIDGD&gt; @pd2 _in_memory_undo

NAME            VALUE DESCRIPTION
--------------- ----- ------------------------------------------------
_in_memory_undo TRUE  Make in memory undo for top level transactions

select
  indx,
  to_number(ktifpupe,'XXXXXXXXXXXXXXXXXXXXXXX') -
  to_number(ktifpupb,'XXXXXXXXXXXXXXXXXXXXXXX')           undo_size,
  to_number(ktifpupc,'XXXXXXXXXXXXXXXXXXXXXXX') -
          to_number(ktifpupb,'XXXXXXXXXXXXXXXXXXXXXXX')   undo_usage,
  ktifprpb                                                redo_start,
  to_number(ktifprpe,'XXXXXXXXXXXXXXXXXXXXXXX') -
          to_number(ktifprpb,'XXXXXXXXXXXXXXXXXXXXXXX')   redo_size,
  to_number(ktifprpc,'XXXXXXXXXXXXXXXXXXXXXXX') -
          to_number(ktifprpb,'XXXXXXXXXXXXXXXXXXXXXXX')   redo_usage
  from
          x$ktifp
;

 --24组 private undo buffer
 INDX  UNDO_SIZE UNDO_USAGE REDO_STA  REDO_SIZE REDO_USAGE
----- ---------- ---------- -------- ---------- ----------
    0      64000          0 00                0          0
    1      64000          0 00                0          0
    2      64000          0 00                0          0

....

   21      64000          0 00                0          0
   22      64000          0 00                0          0
   23      64000          0 00                0          0

24 rows selected.

select
        indx,
        PNEXT_BUF_KCRFA_CLN,
        PTR_KCRF_PVT_STRAND,
        FIRST_BUF_KCRFA,
        LAST_BUF_KCRFA,
        STRAND_SIZE_KCRFA       strand_size,
        SPACE_KCRF_PVT_STRAND   strand_space
from
        x$kcrfstrand
;

 INDX PNEXT_BU PTR_KCRF FIRST_BU LAST_BUF STRAND_SIZE STRAND_SPACE
----- -------- -------- -------- -------- ----------- ------------
    0 20148400 00       20138000 20296600     1435648            0    &lt;-- 两组公共的log buffer
    1 203F5000 00       20296800 203F4E00     1435648            0
    2 00       43CA50EC 43CA5098 00             66560        62976    &lt;-- 六组活跃的private redo buffer
    3 00       43CB54EC 43CB5498 00             66560        62976
    4 00       43CC58EC 43CC5898 00             66560        62976
    5 00       43CD5CEC 43CD5C98 00             66560        62976
    6 00       43CE60EC 43CE6098 00             66560        62976
    7 00       43CF64EC 43CF6498 00             66560        62976
    8 00       00       43D06898 00             66560            0    &lt;--不活跃的private redo buffer
    9 00       00       43D16C98 00             66560            0
   10 00       00       43D27098 00             66560            0

....

26 rows selected.

select ktiffcat, ktiffflc from x$ktiff order by 2;

KTIFFCAT                              KTIFFFLC
----------------------------------- ----------
....
Recursive txn flushes                        1
Max. chgs flushes                            3
Redo pool overflow flushes                   7
Contention flushes                          11
Undo pool overflow flushes                  15
Bitmap state change flushes                 29
Stack cv flushes                            46
Redo only CR flushes                       208
Rollback flushes                           323
Commit flushes                          531663

18 rows selected.
</pre>
<p><strong>2. </strong>准备一张空表T</p>
<pre class="brush: sql; title: ; notranslate">
sys@SIDGD&gt; drop table t purge;

Table dropped.

sys@SIDGD&gt; create table t (n1 number, v1 varchar2(2000));

Table created.
</pre>
<p><strong>3. </strong>新开一个会话. 先切换日志文件, 等会观察第一个redo record的大小,包含多少change vector. 每次往表T插入6条记录, 可以看到第一个private undo buffer被使用, 对应第一个private redo buffer, 地址是43CA50EC. 因为是插入语句, data产生的change vector比undo的多. data的change vector大小是21208, undo的change vector是2560. redo entries和redo size都为0, change vector还没有被刷到log buffer.</p>
<pre class="brush: sql; title: ; notranslate">
sys@SIDGD&gt; alter system switch logfile;

System altered.

===================
session statistics
===================

sys@SIDGD&gt; select
  2  	 ses.sid,
  3  	 sn.name,
  4  	 ses.value
  5  from
  6  	 v$sesstat ses,
  7  	 v$statname sn
  8  where
  9  	 sn.statistic# = ses.statistic#
 10  and ses.sid in (select sid from v$mystat where rownum=1)
 11  and sn.name in ('redo entries', 'redo size', 'IMU Flushes', 'IMU commits')
 12  /

       SID NAME                                VALUE
---------- ------------------------------ ----------
       199 redo entries                            0
       199 redo size                               0
       199 IMU commits                             0
       199 IMU Flushes                             0

sys@SIDGD&gt;
sys@SIDGD&gt; insert into t
  2  select
  3  	     level,
  4  	     lpad(level, 2000, '0')
  5  from
  6  	     dual
  7  connect by level &lt;= 6;

6 rows created.

===================
session statistics
===================

       SID NAME                                VALUE
---------- ------------------------------ ----------
       199 redo entries                            0
       199 redo size                               0
       199 IMU commits                             0
       199 IMU Flushes                             0

====================
private undo buffers
====================
 INDX  UNDO_SIZE UNDO_USAGE REDO_STA  REDO_SIZE REDO_USAGE
----- ---------- ---------- -------- ---------- ----------
    0      64000       2560 43CA50EC      62976      21208
    1      64000          0 00                0          0
    2      64000          0 00                0          0
    3      64000          0 00                0          0
....

==========================
all the redo buffers
==========================
 INDX PNEXT_BU PTR_KCRF FIRST_BU LAST_BUF STRAND_SIZE STRAND_SPACE
----- -------- -------- -------- -------- ----------- ------------
    0 20296800 00       20138000 20296600     1435648            0
    1 203F5000 00       20296800 203F4E00     1435648            0
    2 00       43CA50EC 43CA5098 00             66560        62976
    3 00       43CB54EC 43CB5498 00             66560        62976
....
</pre>
<p><strong>4.</strong> 继续插入六条记录. Private redo buffer增到到42876, undo是5068.  redo size和redo entry没有改变, x$kcrfstrand.PNEXT_BUF_KCRFA_CLN表示log buffer中空闲空间的开始, 两个公共log buffer这个指针都没有改变(0&#215;20296800 和 0x203F5000), 这段时间没有redo record进入公共log buffer.</p>
<pre class="brush: sql; title: ; notranslate">
sys@SIDGD&gt; insert into t
  2  select
  3  	     level,
  4  	     lpad(level, 2000, '0')
  5  from
  6  	     dual
  7  connect by level &lt;= 6;

6 rows created.

===================
session statistics
===================

       SID NAME                                VALUE
---------- ------------------------------ ----------
       199 redo entries                            0
       199 redo size                               0
       199 IMU commits                             0
       199 IMU Flushes                             0

====================
private undo buffers
====================

 INDX  UNDO_SIZE UNDO_USAGE REDO_STA  REDO_SIZE REDO_USAGE
----- ---------- ---------- -------- ---------- ----------
    0      64000       5068 43CA50EC      62976      42876

==========================
all the redo buffers
==========================

 INDX PNEXT_BU PTR_KCRF FIRST_BU LAST_BUF STRAND_SIZE STRAND_SPACE
----- -------- -------- -------- -------- ----------- ------------
    0 20296800 00       20138000 20296600     1435648            0
    1 203F5000 00       20296800 203F4E00     1435648            0
    2 00       43CA50EC 43CA5098 00             66560        62976
</pre>
<p><strong>5. </strong>继续插入六条记录. 可以看到private redo buffer用满后(62972/62976), 触发一次IMU Flushes, IMU中的change vector被刷到公共的log buffer, 第一个log buffer的PNEXT_BUF_KCRFA_CLN往前推进了(0&#215;20296800 -> 0x2014212C). 有趣的是redo size只有39944, 远远低于69712(6740+62972), 说明IMU的对change vector管理需要额外的空间, 多余的部分在进入公共的log buffer时被去掉. redo entries为2, 除了从IMU刷出来的一个redo record, 还有另外一条redo record. 因为IMU只有使用一次机会, private redo buffer用满之后切换回传统的方式, 超出的一条redo record会马上被copy到log buffer. </p>
<pre class="brush: sql; title: ; notranslate">
sys@SIDGD&gt; insert into t
  2  select
  3  	     level,
  4  	     lpad(level, 2000, '0')
  5  from
  6  	     dual
  7  connect by level &lt;= 6;

6 rows created.

===================
session statistics
===================

       SID NAME                                VALUE
---------- ------------------------------ ----------
       199 redo entries                            2
       199 redo size                           39944
       199 IMU commits                             0
       199 IMU Flushes                             1

====================
private undo buffers
====================

 INDX  UNDO_SIZE UNDO_USAGE REDO_STA  REDO_SIZE REDO_USAGE
----- ---------- ---------- -------- ---------- ----------
    0      64000       6740 43CA50EC      62976      62972

==========================
all the redo buffers
==========================

 INDX PNEXT_BU PTR_KCRF FIRST_BU LAST_BUF STRAND_SIZE STRAND_SPACE
----- -------- -------- -------- -------- ----------- ------------
    0 2014212C 00       20138000 20296600     1435648            0
    1 203F5000 00       20296800 203F4E00     1435648            0
    2 00       43CA50EC 43CA5098 00             66560        62976
</pre>
<p><strong>6.</strong> 继续插入六条记录. redo record的产生方式已经切换回传统方式. redo entries增加到29, redo size增加到55720, 第一个log buffer的PNEXT_BUF_KCRFA_CLN继续往前推进(0x2014212C -> 0x201460CC).</p>
<pre class="brush: sql; title: ; notranslate">
sys@SIDGD&gt; insert into t
  2  select
  3  	     level,
  4  	     lpad(level, 2000, '0')
  5  from
  6  	     dual
  7  connect by level &lt;= 6;

6 rows created.

===================
session statistics
===================

       SID NAME                                VALUE
---------- ------------------------------ ----------
       199 redo entries                           29
       199 redo size                           55720
       199 IMU commits                             0
       199 IMU Flushes                             1

====================
private undo buffers
====================

 INDX  UNDO_SIZE UNDO_USAGE REDO_STA  REDO_SIZE REDO_USAGE
----- ---------- ---------- -------- ---------- ----------
    0      64000       6740 43CA50EC      62976      62972

==========================
all the redo buffers
==========================

 INDX PNEXT_BU PTR_KCRF FIRST_BU LAST_BUF STRAND_SIZE STRAND_SPACE
----- -------- -------- -------- -------- ----------- ------------
    0 201460CC 00       20138000 20296600     1435648            0
    1 203F5000 00       20296800 203F4E00     1435648            0
    2 00       43CA50EC 43CA5098 00             66560        62976
</pre>
<p><strong>7.</strong> 提交. 这时增加一条redo record, 对这个事物对应transaction table slot进行清除. 同时IMU的使用信息也在这时被清除.</p>
<pre class="brush: sql; title: ; notranslate">
sys@SIDGD&gt; commit;

Commit complete.

===================
session statistics
===================

       SID NAME                                VALUE
---------- ------------------------------ ----------
       199 redo entries                           30
       199 redo size                           55816
       199 IMU commits                             0
       199 IMU Flushes                             1

====================
private undo buffers
====================

 INDX  UNDO_SIZE UNDO_USAGE REDO_STA  REDO_SIZE REDO_USAGE
----- ---------- ---------- -------- ---------- ----------
    0      64000          0 00                0          0

==========================
all the redo buffers
==========================

 INDX PNEXT_BU PTR_KCRF FIRST_BU LAST_BUF STRAND_SIZE STRAND_SPACE
----- -------- -------- -------- -------- ----------- ------------
    0 20146200 00       20138000 20296600     1435648            0
    1 203F5000 00       20296800 203F4E00     1435648            0
    2 00       43CA50EC 43CA5098 00             66560        62976
</pre>
<p><strong>8.</strong> 转储日志文件, 第一条redo record包含53个chagne vector, 大小是35692(=0x8b6c). 之后的redo record, 大都只包含两个change vector.</p>
<pre class="brush: sql; title: ; notranslate">
alter system dump logfile '/home/u02/app/oracle/product/11.1.0/oradata/SIDGD/SIDGD_redo_02a.log';

REDO RECORD - Thread:1 RBA: 0x0004be.00000002.0010 LEN: 0x8b6c VLD: 0x0d
SCN: 0x0001.cf5dade9 SUBSCN:  1 02/29/2012 19:28:03
CHANGE #1 TYP:1 CLS: 1 AFN:1 DBA:0x00407d2a OBJ:63587 SCN:0x0001.cf5dade8 SEQ:  1 OP:13.5
CHANGE #2 TYP:0 CLS: 1 AFN:1 DBA:0x00407d2a OBJ:63587 SCN:0x0001.cf5dade9 SEQ:  1 OP:13.6
CHANGE #3 TYP:0 CLS: 1 AFN:1 DBA:0x00407d2a OBJ:63587 SCN:0x0001.cf5dade9 SEQ:  2 OP:13.6
....
CHANGE #50 TYP:0 CLS:24 AFN:2 DBA:0x0080e2b4 OBJ:4294967295 SCN:0x0001.cf5dade9 SEQ:  4 OP:5.1
CHANGE #51 TYP:0 CLS:24 AFN:2 DBA:0x0080e2b4 OBJ:4294967295 SCN:0x0001.cf5dade9 SEQ:  5 OP:5.1
CHANGE #52 TYP:0 CLS:24 AFN:2 DBA:0x0080e2b4 OBJ:4294967295 SCN:0x0001.cf5dade9 SEQ:  6 OP:5.1
CHANGE #53 TYP:0 CLS:24 AFN:2 DBA:0x0080e2b4 OBJ:4294967295 SCN:0x0001.cf5dade9 SEQ:  7 OP:5.1
</pre>
]]></content:encoded>
			<wfw:commentRss>http://sid.gd/in-memory-undo/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Side Effect Of Optimizer_capture_sql_plan_baselines</title>
		<link>http://sid.gd/side-effect-of-optimizer_capture_sql_plan_baselines/</link>
		<comments>http://sid.gd/side-effect-of-optimizer_capture_sql_plan_baselines/#comments</comments>
		<pubDate>Tue, 21 Feb 2012 06:32:39 +0000</pubDate>
		<dc:creator>Sidney Chen</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[Optimizer_capture_sql_plan_baselines]]></category>
		<category><![CDATA[sqlobj$data]]></category>
		<category><![CDATA[sysaux]]></category>

		<guid isPermaLink="false">http://sid.gd/?p=1481</guid>
		<description><![CDATA[I got the report that a test DB cannot be logined. The error message indicates that sysaux is full. The tablespace sysaux have 4G, while the top one LOBSEGMENT SYS_LOB0000164261C00005 consumes 1.8G. The system-generated names for lob segment default to &#8230; <a href="http://sid.gd/side-effect-of-optimizer_capture_sql_plan_baselines/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I got the report that a test DB cannot be logined. The error message indicates that sysaux is full. The tablespace sysaux have 4G, while the top one LOBSEGMENT SYS_LOB0000164261C00005 consumes 1.8G. The system-generated names for lob segment default to the format:</p>
<p>SYS_LOB {10 digit object_id} C {5 digit col#} $$<br />
<span id="more-1481"></span><br />
We can extract the object_id 164261 from the segment_name SYS_LOB0000164261C00005. Query on dba_objects shows the lob segment locates in the table SYS.SQLOBJ$DATA.</p>
<pre class="brush: sql; title: ; notranslate">
testsupp@testPRD&gt; @df SYSAUX

TABLESPACE_NAME         TotalMB     UsedMB     FreeMB % Used
-------------------- ---------- ---------- ---------- ------
SYSAUX                     4072       4072          0   100%
                     ---------- ----------
                           4072       4072

testsupp@testPRD&gt; @topseg sysaux

TABLESPACE_NAME      OWNER   SEGMENT_NAME               SEGMENT_TYPE      MB
-------------------- ------- -------------------------- ------------- ------
SYSAUX               SYS     SYS_LOB0000164261C00005$$  LOBSEGMENT      1855
SYSAUX               SYS     WRH$_SQL_PLAN              TABLE            363
SYSAUX               SYS     SYS_LOB0000008981C00004$$  LOBSEGMENT       341
....

50 rows selected.

testsupp@testPRD&gt; @o2 164261

OWNER OBJECT_TYPE  OBJECT_ID DATA_OBJECT_ID OBJECT_NAME  STATUS LAST_DDL_TIME
----- ----------- ---------- -------------- ------------ ------ -------------------
SYS   TABLE           164261                SQLOBJ$DATA  VALID  2011-08-07 19:51:30

testsupp@testPRD&gt; @lob sys.SQLOBJ$DATA

TABLE_NAME   COLUMN_NAME SEGMENT_NAME              TABLESP INDEX_NAME
------------ ----------- ------------------------- ------- ------------------------
SQLOBJ$DATA  SPARE2      SYS_LOB0000164261C00007$$ SYSAUX  SYS_IL0000164261C00007$$
SQLOBJ$DATA  COMP_DATA   SYS_LOB0000164261C00005$$ SYSAUX  SYS_IL0000164261C00005$$

sys@CS11GR2&gt; @ddl sys.SQLOBJ$DATA

CREATE TABLE SYS.SQLOBJ$DATA
(    SIGNATURE NUMBER,
     CATEGORY VARCHAR2(30),
     OBJ_TYPE NUMBER,
     PLAN_ID NUMBER,
     COMP_DATA CLOB NOT NULL ENABLE,
     SPARE1 NUMBER,
     SPARE2 CLOB,
     CONSTRAINT SQLOBJ$DATA_PKEY PRIMARY KEY (SIGNATURE, CATEGORY, OBJ_TYPE, PLAN_ID) ENABLE
) ORGANIZATION INDEX ......

1 row selected.
</pre>
<p>Search SQLOBJ$DATA on MOS I find the bug 9910484, the size of the table SQLOBJ$data is probably related to the baseline capture. Query on v$sql releases there are excess executions on the sql 1vxm21mhmgy07, which merges the baseline data into SQLOBJ$DATA. I enable 10046 trace and run some query, the sql 1vxm21mhmgy07 is issued during every hard parse. This explains the excess executions on 1vxm21mhmgy07 and the size of the lob segment SYS_LOB0000164261C00005$$. The production DB does not have such issue because the optimizer_capture_sql_plan_baselines is set to false.</p>
<p><a href="https://support.oracle.com/CSP/main/article?cmd=show&#038;type=NOT&#038;doctype=PROBLEM&#038;id=1304775.1" target="_blank">BUG:9910484 &#8211; UNNECESSARY UPDATES ON SQLOBJ$DATA CAUSING OBJECT AND TABLESPACE (SYSAUX) GROWTH</a></p>
<pre class="brush: sql; title: ; notranslate">
testsupp@testPRD&gt; show parameter baseline

NAME                                 TYPE     VALUE
------------------------------------ -------- ------
optimizer_capture_sql_plan_baselines boolean  TRUE
optimizer_use_sql_plan_baselines     boolean  TRUE

testsupp@testPRD&gt; @sqlt SQLOBJ$DATA

SQL_ID        CH#       PLAN       EXEC  rows/exec ela_tm(cs)/exec  gets/exec reads/exec
------------- --- ---------- ---------- ---------- --------------- ---------- ----------
7xa8wfych4mad   0 2615480013          1          1              13          0          0
1vxm21mhmgy07   0 3193071292         24          1               5         28          0
1vxm21mhmgy07   1 3193071292     146622          1               0         39          0
...

29 rows selected.

==============================================================================
The many versions of 1vxm21mhmgy07 is due to multi users connect to DB.
==============================================================================
testsupp@testPRD&gt; @sql 1vxm21mhmgy07

SQL_ID        CH#       PLAN       EXEC  rows/exec ela_tm(cs)/exec  gets/exec reads/ex
------------- --- ---------- ---------- ---------- --------------- ---------- ----------
1vxm21mhmgy07   1 3193071292     146622          1               0         39          0
1vxm21mhmgy07   4 3193071292     130269          0               1        190          0
1vxm21mhmgy07  11 3960238002         19          1               1         48          1
1vxm21mhmgy07   9 3960238002       9082          1               1         67          1
1vxm21mhmgy07   2 3193071292         23          1               2         42          0
1vxm21mhmgy07   6 3960238002      11485          0               2         92          1
1vxm21mhmgy07  10 3960238002         23          1               2         49          1
1vxm21mhmgy07   8 3960238002       4435          0               2         79          1
1vxm21mhmgy07   7 3960238002          4          1               2         55          1
1vxm21mhmgy07   0 3193071292         24          1               5         28          0
1vxm21mhmgy07  12 3960238002          8          1               9        116          3
1vxm21mhmgy07   3 3193071292         95          1              17         23          0
1vxm21mhmgy07   5 3960238002          2          1              46        109          2

13 rows selected.

testsupp@testPRD&gt; @sqlf 1vxm21mhmgy07

SQL_FULLTEXT
----------------------------------------------------------------------------------------------------
MERGE INTO sqlobj$data USING dual ON (:1 IS NULL) WHEN MATCHED THEN UPDATE SET
  comp_data = :2
WHERE signature = :3 AND category = :4 AND obj_type = :5 AND plan_id = :6 WHEN
  NOT MATCHED THEN INSERT (signature, category, obj_type, plan_id, comp_data,
  spare1, spare2) VALUES (:7, :8, :9, :10, :11, null, null)

==========================
From the 10046 trace file
==========================

PARSING IN CURSOR #3063571832 len=901 dep=1 uid=0 oct=189 lid=0 tim=1329804200425559 hv=3778541575 ad='2f34fdc8' sqlid='1vxm21mhmgy07'
MERGE INTO sqlobj$data ...
</pre>
<p><strong>Solution</strong><br />
1.	Disable the baseline capture.</p>
<pre class="brush: sql; title: ; notranslate">
alter system set optimizer_capture_sql_plan_baselines=false scope=both;
</pre>
<p>2.	 Extend the sysaux to 5G.</p>
]]></content:encoded>
			<wfw:commentRss>http://sid.gd/side-effect-of-optimizer_capture_sql_plan_baselines/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Small Tables</title>
		<link>http://sid.gd/small-tables/</link>
		<comments>http://sid.gd/small-tables/#comments</comments>
		<pubDate>Thu, 16 Feb 2012 14:00:11 +0000</pubDate>
		<dc:creator>Sidney Chen</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[buffer_pool]]></category>
		<category><![CDATA[direct]]></category>
		<category><![CDATA[tablescan]]></category>
		<category><![CDATA[_small_table_threshhold]]></category>

		<guid isPermaLink="false">http://sid.gd/?p=1465</guid>
		<description><![CDATA[Jonathan Lewis Write a section about the Oracle&#8217;s behavior for tablescan and index fast full scan, in his new book “Oracle Core”. Why does Oracle need to take care the tablescan seriously? If you scan a large object, you could &#8230; <a href="http://sid.gd/small-tables/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Jonathan Lewis Write a section about the Oracle&#8217;s behavior for tablescan and index fast full scan, in his new book “Oracle Core”.</p>
<p>Why does Oracle need to take care the tablescan seriously?</p>
<blockquote><p>
If you scan a large object, you could flush a huge amount of useful data from the cache,data that you may then have to reread very promptly. Tablescans shouldnt really happen often in OLTP systems, and when they do happen, you need to ensure that they dont cause problems. The potential for performance problems relating to tablescans resulted in Oracle Corp. writing code to distinguish between “short and “long tables (and, recently, “medium tables, although there are no statistics collected to record that particular option).</p></blockquote>
<p><span id="more-1465"></span><br />
For the short tables [1, 2]: </p>
<blockquote><p>
There is still a 2 percent limit (dictated by the hidden parameter _small_table_threshold) and the blocks are still loaded into the data cache in the normal way, but the touch count is incremented for these buffers.
</p></blockquote>
<p>For the medium tables [2, 10]: </p>
<blockquote><p>
The second case has a 10 percent limit (though I dont know how its set) where the tablescan is initially considered to be a long tablescan, so touch counts will not be incremented as the blocks are read into buffers and (most of) the buffers will immediately be moved to REPL_AUX; however, if you repeat the tablescan while the blocks are still in the data cache, the touch count on the buffers will be incremented at that point and the tablescan will be reported as a short tablescan.
</p></blockquote>
<p>For the long table: [10, ∞]</p>
<blockquote><p>
The final case has a limit of 25 percent where the touch count is never incremented and the buffers are cycled to REPL_AUX very quickly—basically you do a multiblock read pulling a few buffers from REPL_AUX, you do the next multiblock read pulling more buffers from REPL_AUX but pushing the first batch of tablescanned buffers into REPL_AUX, and keep repeating this cycle. For a very large table you will end up with the entire REPL_AUX loaded with blocks from that table, and a small number of blocks from the table in REPL_MAIN. Thus, Oracle protects a very large fraction of your data cache from overaggressive tablescans.
</p></blockquote>
<p>One month ago I tried to setup a test case in 10.2.0.4 to verify what Jonathan suggested. The result does not match the description in the book. After <a href="http://jonathanlewis.wordpress.com/oracle-core/oc-5-caches-and-copies/#comment-43840" target="_blank">feedback</a> to Jonathan, the difference is found because I did not actually fill up the buffer pool in advance. In such a case, the threshhold for short tables is 10 percent, rathan than 2 percent. To simulate normal behavior, the buffer pool need to be filled up first. To fill up the buffer pool, We cannot simply do a big tablescan. A convenient way is a index scan on a big table. Jonathan already designed a test case, <a href="http://jonathanlewis.wordpress.com/2011/03/24/small-tables/" target="_blank">the blog post of small tables</a>. I decided to repeat the test case on 11.2.0.2.0 Linux 32 bit. I think it&#8217;s worth a blog post.</p>
<p>Here is steps to verify:<br />
1. Create the tables as 2, 5, 10, 25, 50, 100 percent of the buffer pool, gather the statistics on the table.<br />
2. Flush the buffer pool;<br />
3. Fill up the buffer pool by index scan on the table t[100], by “db file sequential read”.<br />
4. Scan the table t[2] 3 times.<br />
5. Scan the table t[5] 3 times.<br />
6. Scan the table t[10] 3 times.<br />
7. Scan the table t[25] 3 times.<br />
8. Scan the table t[50] 3 times.</p>
<p>After every tablescan, sleep 4 seconds to allow the TCH increment, and check the number of buffers to cache the table and it&#8217;s TCH. We will monitor below session statistics.<br />
physical reads<br />
physical reads cache<br />
physical reads direct<br />
table scans (long tables)<br />
table scans (direct read)<br />
table scans (short tables)</p>
<p>Here is the step by step log.</p>
<p>1. Create the tables as 2, 5, 10, 25, 50, 100 percent of the buffer pool, gather the statisitics on the table. The _small_table_threshold is default 2 percent of buffer_pool.</p>
<pre class="brush: sql; title: ; notranslate">
select
	buffers,
	buffers/50 &quot;2_percent&quot;,
	buffers/20 &quot;5_percent&quot;,
	buffers/10 &quot;10_percent&quot;,
	buffers/4 &quot;25_percent&quot;,
	buffers/2 &quot;50_percent&quot;
from
	v$buffer_pool;

   BUFFERS  2_percent  5_percent 10_percent 25_percent 50_percent
---------- ---------- ---------- ---------- ---------- ----------
     14880      297.6        744       1488       3720       7440

sys@CS11GR2&gt; @pd small_table

NAME                    VALUE DESCRIPTION
----------------------- ----- -------------------------------------------------------
_small_table_threshold  297   lower threshold level of table size for direct reads

============================
Create 6 tables
============================
create table t_297
pctfree 99
pctused 1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		rownum &lt;= 10000
)
select
	rownum			id,
	lpad(rownum,10,'0')	small_vc,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum &lt;= 297
;

create table t_744  ...
create table t_1488  ...
create table t_3720  ...
create table t_7440  ...
create table t_14880  ... 

create index t_14880_id on t_14880(id);

===================================
Gather the statistics for 6 tables
===================================

begin
	dbms_stats.gather_table_stats(
		ownname		 =&gt; user,
		tabname		 =&gt;'T_297',
		estimate_percent =&gt; 100,
		method_opt 	 =&gt; 'for all columns size 1'
	);
end;
/

.....

===================================
Query the data_object_id
===================================
select
	object_name, object_id, data_object_id
from
	user_objects
where
	object_name in  (
		'T_297',
		'T_744',
		'T_1488',
		'T_3720',
		'T_7440',
		'T_14880',
		'T_14880_ID'
	)
order by
	object_id
;

OBJECT_NAME                     OBJECT_ID DATA_OBJECT_ID
------------------------------ ---------- --------------
T_297                               87014          87014
T_744                               87015          87015
T_1488                              87016          87016
T_3720                              87017          87017
T_7440                              87018          87018
T_14880                             87019          87019
T_14880_ID                          87020          87020

7 rows selected.
</pre>
<p>2. Flush the buffer pool;</p>
<pre class="brush: sql; title: ; notranslate">
sid@CS11GR2&gt; alter system flush buffer_cache;

System altered.
</pre>
<p>3. Fill up the buffer pool by index scan on the table t_14880. Although the query run 3 times, the TCH is not incremented.</p>
<pre class="brush: sql; title: ; notranslate">
sid@CS11GR2&gt; set autotrace on
sid@CS11GR2&gt; select
  2  	     /*+ index(t) */
  3  	     max(small_vc)
  4  from
  5  	     t_14880 t
  6  where
  7  	     id &gt; 0
  8  ;

MAX(SMALL_VC)
----------------------------------------
0000014880

Execution Plan
----------------------------------------------------------
Plan hash value: 2029532131

-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |     1 |    16 | 14919   (1)| 00:03:00 |
|   1 |  SORT AGGREGATE              |            |     1 |    16 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T_14880    | 14880 |   232K| 14919   (1)| 00:03:00 |
|*  3 |    INDEX RANGE SCAN          | T_14880_ID | 14880 |       |    33   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access(&quot;ID&quot;&gt;0)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      14913  consistent gets
      14915  physical reads
          0  redo size
        435  bytes sent via SQL*Net to client
        420  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

sid@CS11GR2&gt; select
  2  	     obj, tch, count(*)
  3  from    x$_bh
  4  where
  5  	     obj between 87014 and 87020
  6  group by
  7  	     obj, tch
  8  order by
  9  	     count(*)
 10  ;

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     87020          1         32
     87019          1      14844
</pre>
<p>The table scan is wrapped with snap_my_stats to capture the snapshot of the session statistics. Query the content of the buffer pool after every 3 scans.</p>
<pre class="brush: sql; title: ; notranslate">
exec snap_my_stats.start_snap;
select
	max(small_vc)
from
	t_297 t;
exec snap_my_stats.end_snap;

exec dbms_lock.sleep(4);

exec snap_my_stats.start_snap;
select
	max(small_vc)
from
	t_297 t;
exec snap_my_stats.end_snap;

exec dbms_lock.sleep(4);

exec snap_my_stats.start_snap;
select
	max(small_vc)
from
	t_297 t;
exec snap_my_stats.end_snap;

select
	obj, tch, count(*)
from	x$_bh
where
	obj between 87014 and 87020
group by
	obj, tch
order by
	count(*);
</pre>
<p>4. Scan the table T_297 3 times(2 percent). All the blocks are cached and the TCH is incremented on every access.</p>
<pre class="brush: sql; title: ; notranslate">
Name                              Value
----                              -----
physical reads                      311
physical reads cache                311
table scans (short tables)            1

Name                              Value
----                              -----
table scans (short tables)            1

Name                              Value
----                              -----
table scans (short tables)            1

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     87020          1         32
     87014          3        298	&lt;-- the blocks of T_297
     87019          1      14479
</pre>
<p>5. Scan the table T_744 3 times(5 percent). The first scan is by direct path as long tables. The follow two scans are handled as short tables. After 3 scans, the TCH of the segment header is 3, the TCH of the data blocks is 2.</p>
<pre class="brush: sql; title: ; notranslate">
Name                              Value
----                              -----
physical reads                      745
physical reads cache                  1
physical reads direct               744
table scans (long tables)             1
table scans (direct read)             1

Name                              Value
----                              -----
physical reads                      744
physical reads cache                744
table scans (short tables)            1

Name                              Value
----                              -----
table scans (short tables)            1

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     87015          3          1	&lt;-- the segment header of T_744
     87020          1         30
     87014          3        298
     87015          2        744	&lt;-- the data blocks of T_744
     87019          1      13736
</pre>
<p>6. Scan the table T_1488 3 times(10 percent). The 3 scans are by direct path as long tables, except the segment header block, no blocks is cached.</p>
<pre class="brush: sql; title: ; notranslate">
Name                              Value
----                              -----
physical reads                    1,490
physical reads cache                  2
physical reads direct             1,488
table scans (long tables)             1
table scans (direct read)             1

Name                              Value
----                              -----
physical reads                    1,488
physical reads direct             1,488
table scans (long tables)             1
table scans (direct read)             1

Name                              Value
----                              -----
physical reads                    1,488
physical reads direct             1,488
table scans (long tables)             1
table scans (direct read)             1

       OBJ        TCH   COUNT(*)
---------- ---------- ----------
     87015          3          1
     87016          3          1	&lt;-- the segment header of T_1488
     87020          1         30
     87014          3        298
     87015          2        744
     87019          1      13734
</pre>
<p>7. Start from 10 percent, the behavior is the same. The statistics for tables T_3720 and  T_7440 is the same pattern as T_1488.</p>
<p>So, the result matches the description in the book. The optimizer makes the decision base on the table statistics, that&#8217;s the number of blocks in this case. If the numlblks of T_297 is updated to 744 and 1488, manually by dbms_stat.set_table_stats. The behavior of tablescan on T_297 is the same as T_744 and T_1488. This is another reason why DBA need to keep on eye closely on the statistics.</p>
]]></content:encoded>
			<wfw:commentRss>http://sid.gd/small-tables/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Latch Level</title>
		<link>http://sid.gd/latch-level/</link>
		<comments>http://sid.gd/latch-level/#comments</comments>
		<pubDate>Tue, 14 Feb 2012 12:50:17 +0000</pubDate>
		<dc:creator>Sidney Chen</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[cache buffers chains]]></category>
		<category><![CDATA[cache buffers lru chain]]></category>
		<category><![CDATA[latch]]></category>
		<category><![CDATA[levle#]]></category>

		<guid isPermaLink="false">http://sid.gd/?p=1436</guid>
		<description><![CDATA[Cuihua pointed out that there is a mistake in a latch level description in Janathan Lewis&#8217;s new book “Oracle Core”. Here is his blog. In page 116 of the section “Loading a Hash Chain” in Chapter 5. The cache buffers &#8230; <a href="http://sid.gd/latch-level/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.dbsnake.net" target="_blank">Cuihua</a> pointed out that there is a mistake in a latch level description in Janathan Lewis&#8217;s new book “Oracle Core”. <a href="http://www.dbsnake.net/jonathan-lewis-latch-level-mistake.html" target="_blank">Here</a> is his blog. In page 116 of the section “Loading a Hash Chain” in Chapter 5.</p>
<blockquote>
<pre class="brush: sql; title: ; notranslate">
SQL&gt; select     name, level#
  2  from       v$latch
  3  where      name in ('cache buffers lru chain','cache buffers chains')
  4  /

NAME                               LEVEL#
-------------------------------    ------
cache buffers lru chain                 2
cache buffers chains                    1

2 rows selected.</pre>
<p>The cache buffers chains latch has a lower level than the cache buffers lru chain latch, so we can&#8217;t request the cache buffers lru chain latch in willing-to-wait mode if we&#8217;re already holding the cache buffers chains latch. Think about what this means: we&#8217;re holding the cache buffers chains latch (which I will call the hash latch for the rest of this subsection) because we&#8217;ve just searched the hash chain for a buffer and discovered that, for whatever reason, we need to add another buffer to the chain. So we have to acquire the cache buffers lru chain latch (which I will call the lru latch for the rest of this subsection) to move a buffer from the REPL_AUX list to the midpoint of the REPL_MAIN list; but we can&#8217;t request it in willing-to-wait mode because we&#8217;re already holding a lower-level latch.</p>
<p>&#8230;<br />
But if you can&#8217;t get the lru latch with an immediate get, you have to drop the hash latch, get the lru latch, and then get the hash latch again.<br />
&#8230;
</p></blockquote>
<p><span id="more-1436"></span><br />
(I&#8217;ll refer the “cache buffers lru chain” as lru latch, and “cache buffers chains” as hash latch in this blog)<br />
To avoid latch deadlock, Oracle acquire the latches by the order of ascending level#. After holding the hash latch(level#=1), Oracle will acquire the lru latch(level#=2) in willing-to-wait mode. The similar procedure can be observed on “library cache”(level=5) and “shared pool”(level#=7) latch in 10g. <a href="http://www.dbsnake.net" target="_blank">Cuihua</a> use below method to verify that in fact, Oracle will acquire the lru latch after holding the hash latch, opposite to what Jonathan suggests.<br />
Open 3 sessions:<br />
1. Session 1 hold all the cache buffers lru chain latch;<br />
2. After session 1 hold all the lru latch, session  2 issue an update and hang, the update need to do eithor physical reads or switch current to new buffer, to make sure the session 2 need to acquire the lru latch.<br />
3. Session 3 dump the process state of session 2, from the trace file we can find the latch holding/waiting information.</p>
<p>In the dump file from <a href="http://www.dbsnake.net" target="_blank">Cuihua</a>, session 2 is found waiting on the lru latch, the the same time, holding the hash latch. Oracle did not drop the hash latch before requing lru latch.</p>
<pre class="brush: sql; title: ; notranslate">
waiting for 3493da3c Child cache buffers lru chain level=2 child#=11
holding    (efd=23) 34848c10 Child cache buffers chains level=1 child#=2313</pre>
<p>There are two method to manually hold a latch:<br />
1. oradebug poke <latch_addr> 1, to set the value of latch address to 1<br />
2. oradebug call kslgetl <latch_addr> 1,  to simulate the latch get call.</p>
<p><a href="http://www.dbsnake.net" target="_blank">Cuihua</a> use the oradebug poke method in his blog. The kslgetl call method will increase the latch statistics, just as normal latch activity. While oradebug poke is more like backdoor hacking, it changes the value in memory directly, holds the latch silently, and does not increase the latch gets. The difference prompt me to repeat <a href="http://www.dbsnake.net" target="_blank">Cuihua</a>&#8216;s test case, using the kslgetl call method, to check if there is any variation.</p>
<p>Here is a small test on the two idle lru latches to show the difference. 0x39982AF8 and 0x39982F98 are picked up for testing. Only the gets of latch 0x39982AF8 moved from 0 to 1. The testing is done on 11.2.0.2.0 on Linux 32 bit.</p>
<pre class="brush: sql; title: ; notranslate">
sys@CS11GR2&gt; select * from (select addr,to_number(addr,'XXXXXXXXXXXX') addr_dec,gets,misses,immediate_gets,immediate_misses
	from v$latch_children where name = 'cache buffers lru chain' order by addr asc) where rownum&lt;5;

ADDR       ADDR_DEC       GETS     MISSES IMMEDIATE_GETS IMMEDIATE_MISSES
-------- ---------- ---------- ---------- -------------- ----------------
39982A74  966273652         36          0              0                0
39982AF8  966273784          0          0              0                0
39982F14  966274836         36          0              0                0
39982F98  966274968          0          0              0                0

sys@CS11GR2&gt; oradebug setmypid
Statement processed.
sys@CS11GR2&gt; oradebug peek 0x39982AF8 300
[39982AF8, 39982C24) = 00000000 00000000 00000096 00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ...
sys@CS11GR2&gt; oradebug call kslgetl 966273784 1
Function returned 1
sys@CS11GR2&gt; oradebug peek 0x39982AF8 300
[39982AF8, 39982C24) = 00000011 00000001 00000096 00000002 00000001 BFBA01EF 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ...
sys@CS11GR2&gt; oradebug call kslfre 966273784 1
Function returned 0
sys@CS11GR2&gt; oradebug peek 0x39982AF8 300
[39982AF8, 39982C24) = 00000000 00000001 00000096 00000002 00000001 BFBA01EF 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ...
sys@CS11GR2&gt;
sys@CS11GR2&gt; oradebug peek 0x39982F98 300
[39982F98, 399830C4) = 00000000 00000000 00000096 00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ...
sys@CS11GR2&gt; oradebug poke 0x39982F98 4 0x00000001
BEFORE: [39982F98, 39982F9C) = 00000000
AFTER:  [39982F98, 39982F9C) = 00000001
sys@CS11GR2&gt; oradebug peek 0x39982F98 300
[39982F98, 399830C4) = 00000001 00000000 00000096 00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ...
sys@CS11GR2&gt; oradebug poke 0x39982F98 4 0x00000000
BEFORE: [39982F98, 39982F9C) = 00000001
AFTER:  [39982F98, 39982F9C) = 00000000
sys@CS11GR2&gt; oradebug peek 0x39982F98 300
[39982F98, 399830C4) = 00000000 00000000 00000096 00000002 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ...
sys@CS11GR2&gt;
sys@CS11GR2&gt; select * from (select addr,to_number(addr,'XXXXXXXXXXXX') addr_dec,gets,misses,immediate_gets,immediate_misses
		from v$latch_children where name = 'cache buffers lru chain' order by addr asc) where rownum&lt;5;

ADDR       ADDR_DEC       GETS     MISSES IMMEDIATE_GETS IMMEDIATE_MISSES
-------- ---------- ---------- ---------- -------------- ----------------
39982A74  966273652         36          0              0                0
39982AF8  966273784          1          0              0                0
39982F14  966274836         36          0              0                0
39982F98  966274968          0          0              0                0
</pre>
<p>The procedure is basically the same, to confirm that Oracle will acquire the lru latch while holding the hash latch.<br />
<strong>Preparation.</strong><br />
1. Create a table for switch current to new buffer. When update by tablescan, the session will move a buffer from auxiliary replacement list to the main list, thus need to acquire the lru latch first.<br />
2. Query the active lru latches<br />
3. Get the spid of session 2<br />
4. Preparation in session 3 do a process dump</p>
<p><strong>Verification.</strong><br />
1. In session 1, hold the active lru latches.<br />
2. In session 2, issue update on the table T<br />
3. In session 3. dump the process state of session 2</p>
<p>Step by step output.<br />
<strong>Preparation:</strong><br />
1. Create a table for switch current to new buffer.</p>
<pre class="brush: sql; title: ; notranslate">
sys@CS11GR2&gt; drop table t purge;

Table dropped.

sys@CS11GR2&gt; create table t (id number, padding varchar2(1000));

Table created.

sys@CS11GR2&gt; insert into t values(1,lpad('1',1000,'0'));

1 row created.

sys@CS11GR2&gt; commit;

Commit complete.
</pre>
<p>2. Query the active lru latches from x$kcbwd, we can see the two active lru latch are 0x399dc524 and 0x399dc9c4.</p>
<pre class="brush: sql; title: ; notranslate">
sys@CS11GR2&gt; select
  2  	     SET_ID,
  3  	     DBWR_NUM,
  4  	     SET_LATCH,
  5  	     CNUM_REPL,
  6  	     ANUM_REPL,
  7  	     'oradebug call kslgetl ' || to_number(SET_LATCH,'xxxxxxxxxxxx') || ' 1' get_statment
  8  from
  9  	     x$kcbwds;

    SET_ID   DBWR_NUM SET_LATC  CNUM_REPL  ANUM_REPL GET_STATMENT
---------- ---------- -------- ---------- ---------- ----------------------------------------------------------------
         1          0 39982A74          0          0 oradebug call kslgetl 966273652 1
         2          0 39982F14          0          0 oradebug call kslgetl 966274836 1
         3          0 399AF7CC          0          0 oradebug call kslgetl 966457292 1
         4          0 399AFC6C          0          0 oradebug call kslgetl 966458476 1
         5          0 399DC524       7440       3322 oradebug call kslgetl 966640932 1
         6          0 399DC9C4       7440       3341 oradebug call kslgetl 966642116 1
         7          0 39A0927C          0          0 oradebug call kslgetl 966824572 1
         8          0 39A0971C          0          0 oradebug call kslgetl 966825756 1
         9          0 39A35FD4          0          0 oradebug call kslgetl 967008212 1
        10          0 39A36474          0          0 oradebug call kslgetl 967009396 1
        11          0 39A62D2C          0          0 oradebug call kslgetl 967191852 1
        12          0 39A631CC          0          0 oradebug call kslgetl 967193036 1
        13          0 39A8FA84          0          0 oradebug call kslgetl 967375492 1
        14          0 39A8FF24          0          0 oradebug call kslgetl 967376676 1
        15          0 39ABC7DC          0          0 oradebug call kslgetl 967559132 1
        16          0 39ABCC7C          0          0 oradebug call kslgetl 967560316 1

16 rows selected.
</pre>
<p>3. Get the spid of session 2.</p>
<pre class="brush: sql; title: ; notranslate">
sys@CS11GR2&gt; select spid from v$process where addr in (select paddr from v$session where sid in (select sid from v$mystat where rownum=1));

SPID
------------------------
6741
</pre>
<p>4. preparation in session 3</p>
<pre class="brush: sql; title: ; notranslate">
sys@CS11GR2&gt; oradebug setospid 6741
Oracle pid: 23, Unix process pid: 6741, image: oracle@cargosmart.org (TNS V1-V3)
</pre>
<p><strong>Verification:</strong><br />
1. In session 1 hold the two active lru latches. ORA-600 error will be triggerred when the required lru latch is currently held. The latches holding by the session is released. The two lru latches need to be acquired from scratch, until both the first 4 bytes is changed to 0x00000011.</p>
<pre class="brush: sql; title: ; notranslate">
sys@CS11GR2&gt; oradebug call kslgetl 966640932 1
Function returned 1
sys@CS11GR2&gt; oradebug call kslgetl 966642116 1
ORA-00600: internal error code, arguments: [526], [0x399DC9C4], [2], [cache buffers lru chain], [11], [0x399DC524], [9], [], [], [], [], []
sys@CS11GR2&gt; oradebug call kslgetl 966642116 1
Function returned 1
sys@CS11GR2&gt; oradebug peek 0x399DC524 300
[399DC524, 399DC650) = 00000000 00000501 00000096 00000002 000005DC 00000000 00001725 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ...
sys@CS11GR2&gt; oradebug peek 0x399DC9C4 300
[399DC9C4, 399DCAF0) = 00000011 00000593 00000096 00000002 00000001 BFBA01EF 0000175C 00000001 00000001 00000000 00000000 00000000 00000000 00000001 ...
sys@CS11GR2&gt; oradebug call kslgetl 966640932 1
Function returned 1
sys@CS11GR2&gt; oradebug peek 0x399DC524 300
[399DC524, 399DC650) = 00000011 00000502 00000096 00000002 00000001 BFBA01EF 00001725 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ...
sys@CS11GR2&gt; oradebug peek 0x399DC9C4 300
[399DC9C4, 399DCAF0) = 00000011 00000593 00000096 00000002 00000001 BFBA01EF 0000175C 00000001 00000001 00000000 00000000 00000000 00000000 00000001 ...
</pre>
<p>2. In session 2, issue an update on the table t and the session hang. </p>
<pre class="brush: sql; title: ; notranslate">
sys@CS11GR2&gt; update t set id = 2, padding = lpad('2',1000,'0');
</pre>
<p>3. In session 3. dump the process state of session 2.</p>
<pre class="brush: sql; title: ; notranslate">
sys@CS11GR2&gt; oradebug dump processstate 10
Statement processed.
</pre>
<p>The process dump of session 2</p>
<pre class="brush: sql; title: ; notranslate">
PROCESS STATE
-------------
Process global information:
     process: 0x3abdbf58, call: 0x3a76ac20, xact: 0x39ba4f0c, curses: 0x3a700fc0, usrses: 0x3a700fc0
     in_exception_handler: no
  ----------------------------------------
  SO: 0x3abdbf58, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
   proc=0x3abdbf58, name=process, file=ksu.h LINE:12451, pg=0
  (process) Oracle pid:23, ser:3, calls cur/top: 0x3a76ac20/0x3a76ac20
            flags : (0x0) -
            flags2: (0x0),  flags3: (0x0)
            intr error: 0, call error: 0, sess error: 0, txn error 0
            intr queue: empty
    ksudlp FALSE at location: 0
  (post info) last post received: 0 0 0
              last post received-location: No post
              last process to post me: none
              last post sent: 0 0 0
              last post sent-location: No post
              last process posted by me: none
    (latch info) wait_event=0 bits=2
        Location from where call was made: kcb2.h LINE:3795 ID:kcbzgb:
      waiting for 399dc524 Child cache buffers lru chain level=2 child#=9
        Location from where latch is held: kywm2.h LINE:185 ID:kywmcrpln: creating new WLM plan:
        Context saved from call: 3216638447
        state=busy [holder orapid=17] wlstate=free [value=0]
          waiters [orapid (seconds since: put on list, posted, alive check)]:
           23 (6, 1329203954, 6)
           waiter count=1
          gotten 1282 times wait, failed first 0 sleeps 0
          gotten 5925 times nowait, failed: 1
        possible holder pid = 17 ospid=6390
      on wait list for 399dc524
      holding    (efd=8) 39923cec Child cache buffers chains level=1 child#=586
        Location from where latch is held: kcb2.h LINE:3166 ID:kcbgcur_2:
        Context saved from call: 4255689
        state=busy(exclusive) [value=0x20000017, holder orapid=23] wlstate=free [value=0]
    Process Group: DEFAULT, pseudo proc: 0x3a47f3dc
    O/S info: user: oracle, term: UNKNOWN, ospid: 6741
    OSD pid info: Unix process pid: 6741, image: oracle@smart.org (TNS V1-V3)
</pre>
<p>Session 2 is waiting for “cache buffers lru chain” while holding the “cache buffers chains”.</p>
<pre class="brush: sql; title: ; notranslate">
      waiting for 399dc524 Child cache buffers lru chain level=2 child#=9
      holding    (efd=8) 39923cec Child cache buffers chains level=1 child#=586
</pre>
<p>The testing confirms Oracle will request the lru latch in willing-to-wait mode while holding the hash latch. Because of the existence of multiple active lru latches, it is possible that Oracle will get the lru latch in immediate mode; if it fails on first get, then get anothers lru latch in willing-to-wait mode. To verify if Oracle get the lru latch in immediate mode. I wrap the update in session 2 with package snap_latch, to take the snapshot of the latch statistics. After session 2 hang on the update, I go back to session 1 to release the two lru latches. Then the update on session 2 complete as expected. There is no immediate gets or misses on the lru latch, so Oracle does not try immediate mode in this case. (The snap_latch package is provided in the book)</p>
<pre class="brush: sql; title: ; notranslate">
sys@CS11GR2&gt; set serveroutput on size 1000000 format wrapped
sys@CS11GR2&gt; set linesize 168
sys@CS11GR2&gt; set trimspool on
sys@CS11GR2&gt; execute snap_latch.start_snap;

PL/SQL procedure successfully completed.

sys@CS11GR2&gt; update t set id = 2, padding = lpad('2',1000,'0');

1 row updated.

sys@CS11GR2&gt; execute snap_latch.end_snap;
---------------------------------
Latch waits:-   14-Feb 17:01:05
Interval:-      7 seconds
---------------------------------
Latch                              Gets      Misses     Sp_Get     Sleeps     Im_Gets   Im_Miss Holding Woken Time ms
-----                              ----      ------     ------     ------     -------   ------- ------- ----- -------
cache buffers lru chain               3           1          0          1           0         0       0     0 7,281.9

PL/SQL procedure successfully completed.
</pre>
<p>P.S. After feedback to Jonathan, He has added this into the Errata. I copy <a herf="http://jonathanlewis.wordpress.com/oracle-core/oc-5-caches-and-copies/" target="_blank">his corrections</a> here.</p>
<blockquote><p>Section “Loading a Hash Chain” says: “As I commented in Chapter 4, a process is not allowed to request a latch in willing-to-wait mode if it is already holding a lower-level latch.” This statement is the wrong way round &#8211; you cannot request a latch in willing-to-wait mode if you are already holding a higher level latch; this error makes the subsequent comments about the complexity involved in dropping and re-acquiring cache buffers chains latches irrelevant – you don’t have to drop the latch.<br />
There is another odd error in the same sentence – I don’t say anything about the latch level and it’s use in controlling the order in which willing-to-wait gets can be made.
</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://sid.gd/latch-level/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

