Waits on buffer exterminate (Automated Memory Management Issues)
From OraWiki
Contents |
Author
| (c) 2007 by Steven of Oz | ||
|---|---|---|
| Author | ||
| Name | Gerald Roehrbein | |
| http | www.OraForecast.com | |
| Gerald.Roehrbein@OraForecast.com | ||
| Release | 0.9 | |
| Created | 2009.06.12 | |
| Last update | 2009.06.12 | |
| Changes | Started article | |
| Estimated duration of installation | none | |
| License | Freeware | |
Purpose
Description of what happened if an Oracle database is configured using "Automated Memory Management".
Installation
- None
Todo
- Nothing
Recommendations
- Do not use Oracle Automated Memory Management for databases with heavy load without proper sizing
Example(s)
Graph below show what happened when using AMM if a idle system becomes busy:
How to create such an image? That's just simple. I've generated six one hour AWR reports between 12:00 (12PM) and 18:00 (06PM) and pasted the buffer cache and shared pool size into an Excel table. Than I've inserted an Excel graph object and configured it to use the table as base data.
Image below shows the wait events between 3PM and 4PM:
Between 3PM and 4PM a simple connect took up to 15 seconds! Systems performance was at low level but CPU usage and PIO too!
Script fsora SYSTEM_EVENT_NON_IDLE shows:
EVENT TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT WP -------------------------------------------------- ----------- -------------- ----------- -------------- ---------- cursor: pin S wait on X 1291682 1291202 2436010 5,075.02083 .01% control file parallel write 1811837 0 1767751 .97567 .01% enq: TX - row lock contention 5860 3479 1074207 451.15792 .01% enq: TX - index contention 6149 3320 1035716 366.10675 .01% log file sequential read 532868 0 1092605 2.05042 .01% wait for unread message on broadcast channel 80416 34824 3700294 81.16104 .02% buffer busy waits 271812 35122 3520305 14.87306 .02% write complete waits 62237 61490 6093395 8,157.15529 .03% db file parallel write 7027790 0 6680407 .95057 .04% log file parallel write 35179515 0 10873553 .30909 .06% buffer exterminate 131618 128177 12754628 3,706.66318 .07% log file sync 27385903 29859 12294591 .44943 .07% db file scattered read 167238316 0 20828616 .12454 .12% db file sequential read 394791081 0 25699496 .06510 .14% io done 11433955 5616028 522529590 89.81371 2.93% SQL*Net message from client 136057869 0 1.7189E+10 126.33955 96.41%
This output shows a totally false configured cache because the averages for "sequential and scattered read" are faster as expected. Normally a "scattered read" should take with 8k db_block_size and DB_FILE_MULTIBLOCK_READ_COUNT=16 roundabout 0.65cs and a healthy "sequential read" should take 0,3cs (as calculated by my scripts). If they are faster than there are a lot of filesystem and storage system cache reads.
The image above shows the averages of "most of the non idle events" since instance startup. This image is useful to understand one of the benefits of AWR or ADDM or the requirement of having a lot of experiences with a special application and the Oracle statistics. The averages do not point direct to the problem. An experienced Oracle DBA would understand the problem without a Statspack, AWR or ADDM report because he have to know the normal output of V$SYSTEM_EVENT. But this is nothing someone can teach.
The cache is too small but the hit ratio is pretty good (caclulated using fsora tps using "V$SYSSTAT"):
========================================= === Total IO statistics from V$SYSSTAT ========================================= Cache hit ratio :... 99,95 Physical read/s :....707.53 Consistent get/s :.12915.221
Since my tuning goal is always to have less than 100 PIO's this system shows a lot more PIO's per second in average. Application executes a lot of "short full scans" per second and "full index scans". To increase cache size is not best because a SQL review is really required. To increase cache size will help as a workaround but it is not the solution for the problem. One part of the problem is "literal SQL". The application uses a lot of literal statements. Bad practise.
Especially for systems with a small load it's difficult to size cache and IO correct. So that's a recommendation (< 100 PIO/s) which varies from system to system. For example: If database is for a middleware which runs only a few hours at night than 60 PIO's per second as calculated using V$SYSSTAT could be too much. In such a case it makes sense to use STATSPACK and or AWR/ADDM or to have a database trace for a few days to understand workload characteristics.
My writings are not the "holy bible". Just some ideas because real performance analysis and tuning is a difficult task which requires a lot more know how as you would expect. This is just the reason for me to publish such facts because there is a lot more to know and tune to fix such issues and require to hire a real performance expert and not one of those which do this "en passant".
After disabling AMM performance increases and system behave normal:
Security related issues
- None
How to make this feature as secure as possible
- None.
Source code of this extension
- See above
Planned enhancements
- Will follow
Discussions
Images above show that a system having one week or a few days "normal load" will have huge problems if the load changes. At 3PM buffer cache usage increases so AMM decide to grow buffer cache and reduce shared pool size. When this happens a lot of "buffer exterminate" waits occur.
To solve this issue my recommendations are:
- Change programming strategies. Do not use literal SQL and use bulk operations whenever possible.
- Create proper indexes for your queries
- Size SGA correct.
- Set a usable SGA_MAX_SIZE and SGA_TARGET_SIZE
- Set a minimum for DB_CACHE_SIZE (and the other caches if used)
- Set a minimum for SHARED_POOL_SIZE
- Set a maximum for shared pool using _SHARED_POOL_MAX_SIZE (calculate correct, only used if using auto SGA/ Automated Memory Management)
- Use a frequently executed script which flushs shared pool (if you do not want to set _SHARED_POOL_MAX_SIZE)
If the traditional sizing was made correct than it is not required to use AMM. The major problem of AMM is that it will allways resize caches and this in some cases not very efficient.
Consider this: You bought a dedicated Computer for a database with an amount of memory. Why should Oracle resize caches? If resize occur than the database will destroy content of the caches and this will reduce the efficiency of caching.
In my opinion AMM is a stupid feature regardless to the systems configuration. If there are a few instances at one servers hardware than you won't get a benefit from AMM because you have to size all instances to use the required memory.
The major reason for this problem are design changes from Oracle 8.x to 10.x related to the SHARED_POOL. In Oracle 8 SHARED_POOL was allocated inside SGA with size of SHARED_POOL_SIZE. But in some cases SHARED_POOL increases and my experience was that Oracle allocated additional memory for the SHARED_POOL outside SGA. For those made an update from 9.x to 10.x the recommendation of Oracle was to calculate the correct SHARED_POOL size because 10G allocate SHARED_POOL now inside the SGA and if there is not enough space than this will give (for example) ORA-4030 errors.
Since there are only a few guys at planet earth able to make a proper sizing most DBA's decided to use AMM instead. This wastes world wide hundreds of millions of dollars because in most cases companys not having performance specialists which really understand the way Oracle works their staff try to compensate the loss of performance with faster CPU's, faster storage and so on.
Whatever. In my opinion this feature is useless and only good for those who want to blame an old DBA as a stupid because "they know some new features" and argue that traditional sizing is expensive not required anymore.
Well look in a mirror. There you (and me too) will see one or more stupids! ;-) Because using Oracle it's possible to make things allways a little bit better!
My job is to make Oracle look as perfect as possible. So we at OraForecast.com recommend only well tested features. Since Oracle have competitors like MS-SQL server, DB/2 and MySQL (and a lot of others) it is required to give only best advice because I've seen some frustrated Companys migrated to other databases because of bad consulting.
Oracle is less expensive and much more powerful as the other databases but it is complex, a DBA have to know what he's doing and data modellers and programmers too! But if a Company pay for an expert than the Company have the right to get really expert knowledge and today there work too many novices in the Oracle community.
History
- 2009.06.12 Start of documentation
Known bugs
- Reported Description of problem Fixed Release
- 2009.06.12 No issues known
Running example
- Not required



