Someone asked to help in identifying a strange problem. The problem found to be a combination of two issues, ASSM and the BLOB loading software, that manifested as an excessive CPU utilization. In this post we should see how different write buffer size can affect write efficiency.
Unfortunately, Tkprof and Statspack would often fail to point in proper direction when it gets to profiling activity performed by a Call-Less Cursor. In this case such cursor was responsible for writing and reading BLOBs. This issue was not specific to JDBC API. Later it was reproduced with DBMS_LOB API.
Just as an additional point to the mentioned above topic, here is an example of Tkprof and Statspack reports from the system where this issue was reproduced with help of single threaded application in an idle environment.
Tkprof
OVERALL TOTALS FOR
ALL
NON-RECURSIVE STATEMENTS
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 16 0.00 0.02 0 0 0 0
Execute 23 3.09 4.57 6 13894 18366 17
Fetch 3 0.00 0.00 0 6 0 3
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 42 3.10 4.59 6 13900 18366 20
OVERALL TOTALS FOR
ALL
RECURSIVE STATEMENTS
call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 230212 5.39 5.14 0 0 0 0
Execute 230215 176.87 173.89 47 692034 175390 176321
Fetch 153422 4.22 3.96 5 230821 3 77048
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 613849 186.49 183.01 52 922855 175393 253369
Statspack
Load Profile
Per Second Per Transaction
~~~~~~~~~~~~ --------------- ---------------
Redo size: 1,553,489.88 668,736,513.89
Logical reads: 630,504.58 271,415,631.42
Block changes: 782.47 336,833.37
Physical reads: 171.33 73,752.42
Physical writes: 181.92 78,311.00
User calls: 42.80 18,424.58
Parses: 44.40 19,114.63
Hard parses: 0.00 1.00
Sorts: 0.39 169.89
Logons: 0.00 0.84
Executes: 44.94 19,344.89
Transactions: 0.00
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
CPU time 7,764 95.9
log file parallel write 85,190 147 2 1.8
log file switch (checkpoint incomplete) 246 88 357 1.1
log file switch completion 502 76 151 .9
control file parallel write 7,973 9 1 .1
-------------------------------------------------------------
Instance Activity Stats
Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
...
db block gets 5,116,321,961 625,543.7 ############
db block gets direct 1,442,703 176.4 75,931.7
db block gets from cache 5,114,879,258 625,367.3 ############
...
physical writes 1,487,909 181.9 78,311.0
physical writes direct 1,444,083 176.6 76,004.4
physical writes direct (lob) 1,442,684 176.4 75,930.7
session logical reads 5,156,896,997 630,504.6 ############
...
Segments by Logical Reads
Subobject Obj. Logical Pct
Owner Tablespace Object Name Name Type Reads Total
---------- ---------- -------------------- ------------ ----- ------------ -----
SYS SYSTEM SEG$ TABLE 434,992 23.6
SYS SYSTEM TSQ$ TABLE 432,816 23.5
TEST TEST SYS_LOB0000056448C00 LOB 289,808 15.7
SYS SYSTEM I_FILE#_BLOCK# INDEX 288,448 15.7
SYS SYSTEM FILE$ TABLE 220,416 12.0
-------------------------------------------------------------
As it can be seen, the number of session logical reads in the test doesn’t align well with figures in Segments by Logical Reads section. Tkprof doesn’t even have a clue about 7K CPU seconds.
But let’s get back to the business. What could be happening that triggered such a vast amount of reads when writing BLOBs? A call to the vendor of the software revealed that the BLOB data is written in chunks and for each chunk the BLOB was closed and reset with an updated position. So, we duplicated this behavior and have gotten following statistics using “runstats” and DbmsOutput.java shared by Tom Kyte and Java API shared by R. M. Menon on AskTom.
BLOB...chunk size 32,768 32,768 32,768 32,768 32,768 32,768 32,768 32,768 32,768
BLOB...write buffer size
2,048 4,096 8,132 8,192 16,264 16,384 24,396 32,528 32,768
STAT
...consistent gets 971 375 119 319 51 234 27 190 222
STAT
...consistent gets direct 252 159 52 79 16 32
9 4 20
STAT
...db block gets 6,146 884 423 479 209 268 109 100 208
STAT
...db block gets direct
268 140 64 76 32 44 32 16 36
STAT
...physical reads direct (lob)
252 124 48 60 16 28 16 0 20
STAT
...physical writes direct (lob)
268 140 64 76 32 44 32 16 36
STAT
...session logical reads 7,117 1,259 542 798 260 502 136 290 430
BLOB...chunk size 16,384 16,384 16,384 16,384 16,384 16,384 16,384 16,384 16,384
BLOB...write buffer size
2,048 4,096 8,132 8,192 16,264 16,384 24,396 32,528 32,768
STAT
...consistent gets 449 222 99 213 47 74 25 35 63
STAT
...consistent gets direct 126 80 24 53 8 22
3 8 19
STAT
...db block gets 3,904 733 397 813 182 295 113 143 250
STAT
...db block gets direct
142 78 32 46 16 30 22 16 26
STAT
...physical reads direct (lob)
126 62 16 30 0 14 6 0 10
STAT
...physical writes direct (lob)
142 78 32 46 16 30 22 16 26
STAT
...session logical reads 4,353 955 496 1,026 229 369 138 178 313
BLOB...chunk size 8,192 8,192 8,192 8,192 8,192 8,192 8,192 8,192 8,192
BLOB...write buffer size
2,048 4,096 8,132 8,192 16,264 16,384 24,396 32,528 32,768
STAT
...consistent gets 347 264 95 128 71 91 23 17 39
STAT
...consistent gets direct 63 78 16 35 16 27 0 0 7
STAT
...db block gets 2,657 996 349 468 273 369 103 85 161
STAT
...db block gets direct
79 47 16 31 16 23 16 16 21
STAT
...physical reads direct (lob)
63 31 0 15 0 7 0 0 5
STAT
...physical writes direct (lob)
79 47 16 31 16 23 16 16 21
STAT
...session logical reads 3,004 1,260 444 596 344 460 126 102 200
By looking at physical reads direct (lob) statistics we can see that there are certain write buffer sizes for each BLOB chunk size that do no produce any additional reading. For our tests with chunk sizes 8K, 16K and 32K it appears that these local minimum points can be derived from BLOB chunk size and certain chunk overhead as in N*(chunk size – K), where N is any natural number and K overhead which dependents on the chunk size as in 8K – 60 Bytes, 16K – 120 Bytes and 32K – 240 Bytes. In these points the amount of physical writes direct (lob) is minimal and what is interesting depends only on the size of the incoming data and the Block Size as in Blob Size/Block Size. From which we can see that each BLOB chunk is written only once.
Here is one more interesting thing to mention, although it is not confirmed. Number of additional operations in other cases exactly matches following scenario that can be clearly described as rewriting BLOB chunks, when ongoing operation reads entire BLOB chunk that is left incompleteby the previous write operation, appends data from current buffer and stores all chunk’s blocks back into the DB. This also triggers significant degradation of LIO per effective amount of written data ratio, which improves with the increase of the write buffer size. This behavior was one of the culprits of the original issue. Of course, not closing BLOB stream on each write would be a better way. However, sometime it isn’t possible to rewrite the code. If an application can be configured to use buffer of a certain length, the issue can be alleviated.
ASSM was playing not the least role in this situation. It has known for causing process “spinning” behavior. In our case the session was sitting in the following stack.
#0 0x000000000159fa03 in kcbgcur ()
#1 0x0000000001007f3b in ktugusc ()
#2 0x0000000001019032 in ktugti () - KTU: Kernel Transaction Undo Get Transaction table entry Information
#3 0x0000000001047e66 in ktbIsItlEntryCommitted ()
#4 0x000000000104db76 in ktrIsItlEntryCommitted ()
#5 0x00000000027cfa2e in kdlxgs_init () - reclaim space from transaction freelist in index
#6 0x0000000000bfab81 in ktsplbfmb ()
#7 0x0000000000bfbd80 in ktsplbrecl ()
#8 0x0000000000bd36e9 in ktspgsp_cbk1 ()
#9 0x00000000027e740f in kdlgsp_init () - Space management batching
#10 0x00000000027e45cb in kdl_write1 ()
#11 0x0000000001d2ece3 in koklwrite () - KOK Lob WRITE
#12 0x00000000022f2094 in kpolob () -
KPP
Lob operations
#13 0x0000000000711ef8 in opiodr ()
#14 0x00000000039a973b in ttcpip ()
#15 0x000000000070df90 in opitsk ()
#16 0x0000000000710e36 in opiino ()
#17 0x0000000000711ef8 in opiodr ()
#18 0x000000000070bc23 in opidrv ()
#19 0x000000000070a0ce in sou2o ()
#20 0x00000000006d008b in opimai_real ()
#21 0x00000000006cffdc in main ()
Moving hot BLOBs to tablespaces with manual segment space management further improved the situation.
Here are few papers on Metalink that could be helpful in resolving similar BLOB issues.
Note:162345.1 “LOBS – Storage, Read-consistency and Rollback”
Note:66431.1 “LOBS – Storage, Redo and Performance Issues”
Note:268476.1 “LOB Performance Guideline”
Bug: 5253061 “LOB INSERT PERFORMANCE DIFFERS AFTER INSERT DEPENDING ON COMMIT”
Bug: 6128525 “ELAPSED TIME IS TOO HIGHER THAN “WAITED TIME + CPU TIME” IN ASSM LOB ACCESS”
Bug: 5131464 “10.1.0.4 RDBMS 10.1.0.4 SPACE PRODID-5 PORTID-226″