Tuesday, January 15, 2008

APPLICATION TUNING BEST PRACTICES

  1. EXPLAIN PLAN

  1. It is a utility which is useful for both DBA and application programmers.
  2. This is pre execution, we are only asking oracle to offer us execution plan without really executing the statement.
  3. We create a table call PLAN_TABLE by running the script $ORACLE_HOME/netwrk/admin/utlxplan.sql
  4. When we ask Oracle to explain plan for a given SQL statement, Oracle dumps the execution plan in the plan table.
  5. Later we can visit the plan table to get the execution plan.
  6. The only limitation is we need to have the SQL statement.
  7. The explain plan command allows us to view the query execution plan that the optimizer will use to execute SQL statement.
  8. By having the "Explain Plan" from the SQL statement we can get the information about the execution of the SQL statement that the optimizer is using which execution plan and how much it cost.
  1. TKPROF

  1. This is useful whether we have the source code or not
  2. Before submitting the SQL statement we enable SQL_TRACE=true at session level
  3. When ever the SQL statements are submitted to the database, oracle is generating trace files in UDUMP directory.
  4. This trace file is in binary, we use this utility TKPROF on top of the generated trace file and create an output file which is readable(text format).
  5. This output file basically gives us the complete execution plan for the submitted SQL statement.
  6. If the SQL statement is not available since we are using a shrink grab software package than we need to enable SQL_TRACE at instance level.
  7. TKPROF is always post execution command
  8. You have to set the parameter TIMED_STATISTICS=TRUE in init.ora
  9. A trace query with a large number of physical reads usually indicates a missing index
  10. A traced query output with only memory reads usually indicates that an index is being used.

OPTIMIZATION

A query is a non-procedural request of information from the database. To process a query the kernel has to characterize the retrieval strategy or formulate an execution plan for fetching the candidate rows. Typically, to execute a query, several execution plans may be feasible. For example, tables participating in a join could be processed in a number of different orders depending on the join conditions and the join methods implemented in the kernel. To choose between alternative plans, the kernel must use some realistic unit to measure the resources used by each plan. It can then choose between competing plans on the basis of costs and discard all except the one which consumes the least.

Optimizer executes SQL statements; there are two types of optimizer a) RULE BASED, b) COST BASED

RULE BASED OPTIMIZER (Heuristic) Optimization

Prior to the introduction of the cost based optimizer, Oracle used the Heuristic Approach to query optimization. Execution plans were generated on the basis of certain heuristics or "Rules". For example, if the "where clause" of a query contained a predicate referencing an indexed column, and the index was the best index to use based on predetermined ranking of indexes, then the table would be accessed using that index irrespective of other considerations such as the size of the table, variability of the data for that column, selectivity of the index etc.

There was no knowledge on the statistical descriptions of the data in tables and indexes. Instance specific parameters such as the capability of multi block i/o, available sort memory etc were not considered. If data in the tables was inconsistent with the "Rules", then sub-optimal plans would be generated.

  1. OPTIMIZER_MODE=RULE, the first and the oldest mode is rule
  2. Oracle uses heuristics from the data dictionary in order to determine the most effective way to service an oracle query and translate the declarative SQL command in to an actual navigation plan to extract the data.
  3. Oracle will execute the RULE_BASED optimizer if there are no statistics present for the table.

To explain when a full Table scan is used under the rule-based optimization.

* No Indexes exist on the table.

* No limiting conditions are placed on the rows (that is, the query asks for all the rows). Ex. if the query has no where clause, all the rows will be returned.

* No limiting conditions are placed on the rows corresponding to the leading column of any index on the table. ex. if a three-column concatenated index is created on the City-State-Zip columns, then a query that had a limiting condition on the State column only, would not be able to use the Index, since State is not the leading column of the Index

* Limiting conditions are laced on the lead column of an Index, but the conditions are used inside expressions. ex. If an Index exists on the City column, then a limiting condition of:

WHERE CITY = 'NEW YORK'

Could use the index, But, if the limiting condition was instead:

WHERE UPPER(CITY) = 'NEW YORK'

Then the index on the City column would not be used because the City column is inside the UPPER function. If you had concatenated the City column with a text string, the index would not be used.

Ex. if the limiting condition was:

WHERE CITY ||'X' LIKE 'ORLANDO%'

Then the index on the City column would not be used.

* Limiting conditions placed on the rows correspond to the leading column of an index, but the conditions are either NULL checks or inequalities. For example, if an index exists on the City column, none of the following will be able to use the index:

WHERE CITY IS NULL

WHERE CITY IS NOT NULL

WHERE CITY != 'ORLANDO'

* Limiting conditions placed on the rows correspond to the leading columns of the index, but the conditions use the LIKE operatorand the value starts with '%' or the value is a bind variable. ex. neither of the following will be able to use the index:

WHERE CITY LIKE '%YORK%'

WHERE CITY LIKE : CITY_BIND_VARIABLE

NOTE: The bind variable may contain trailing '%' or no '%' at all. Regardless, an index will not be used.

If cost-based optimization is used, Oracle will use full table scans for all of the above cases shown for rule-based optimzation. Additionally, the cost-based optimizer may decide to use full table scans if the table has not been analyzed, if the table is small, if the indexed columns are not selective, or if the optimization goal is set to ALL_ROWS.

To avoid unplanned full table scans, make sure the query can use an index

The Rule Based Optimizer (RBO) is quite limited in it scope for query access path manipulation. The RBO uses a ranking system which pushes it heavily towards indexes and is best suited for OLTP type systems. See the Concepts manual for more detail on the RBO Rankings. The following is a list of ways in which the RBO can be influenced:

  • Drop or add indexes on columns that are selective and are referenced in queries

  • Disable indexes by concatenating null or adding 0 to indexed columns. This forces other indexes or, if none are available, full table scans to be used.

  • Change the order of tables in the from clause. RBO drives from right ->left in the from clause if it has nothing else to go on or if 2 predicates have equal rank.

  • The RBO can also be invoked by using RULE hint in the queries.

For example,

select /*+ RULE */ col1, col2

from

where ;

The RBO maybe more efficient in some cases where queries contain views based on many tables. If the current statistics are not present for tables, RBO is used.

CHOOSE OR COST BASED OPTIMIZER

Oracle addresses this by incorporating a Cost Engine in the kernel to estimate and select execution plans on the basis of costs. Costs quantify the resource consumption of the query. Resources used by a query can be broken into three principal parts - I/O cost, CPU Costs and Network Costs. I/O cost is the cost of transferring information from disk to main memory. Rows in tables are stored in database blocks which reside in disk files. Accessing the rows involves reading the blocks into the buffer pool in SGA (Shared Global Area) which resides in the main memory. Typically this is the most significant cost of processing a query. The CPU cost is the cost of processing the information once its available in memory. Once the blocks are cached, the rows have to be identified and processed to perform operations such as join, sort etc.

For queries which access information across nodes of a distributed database, there is a Network Cost which measures the cost of information transfer (network I/O). Queries which access remote tables or perform distributed joins would have significant element of this cost. Currently, the cost model incorporates functions to cost I/O operations and to some extent CPU costs. Network Costs are not directly computed but taken as certain weight of the I/O's estimated. As the cost optimizer model is enhanced, these elements of cost will be measured better and costs more accurately.

IMPORTANT:

To enable costing of execution plans, detailed statistical descriptions of the data relating to objects in the query is required. The statistics are generated by the ANALYZE facility. There are two modes in which analyze may be performed - compute or estimate. Compute scans each member of the object whereas estimate mode would look at a sample of the total. If there are no statistics, then the cost optimizer uses hardcoded estimates or "guesses".

Optimization Process

The optimization process starts with performing transitivity analysis on the predicates of the query. Then, each query block is optimized by determining and evaluating access paths open to table in the query. Joins involve join order enumeration and costing. To influence the optimizer, users can specify "Hints" to force the optimizer to consider a certain access strategy superior to others irrespective of the costs. Each of these steps is considered in detail below.

Transitivity

Consider a query with a join involving tables T1 and T2:

SELECT ... FROM t1,t2 WHERE

t1.column1=t2.column1 AND

t1.column1=100;

This query is equivalent to the query with the additional predicate on T2:

SELECT ... FROM t1,t2 WHERE

t1.column1=T2.column1 AND

t1.column1=100 AND

t2.column1=100;

The advantage we have in optimization is that the additional predicate opens an extra access path for T2.column1 which did not exist in the original query. Transitivity analysis involves generating additional predicates based on existing predicates to open extra access paths to objects. This however does not mean that the new path will be chosen. The cost optimizer will generate transitive predicates as a first step in optimization.

Cardinality

The cost of I/O is directly proportional to the number of rows to be retrieved and manipulated by a query. A query may specify retrieval of all the rows of a table or may contain restrictions based on some criteria specified in the "Where" clause. Optionally it may demand the results to be aggregated based on "Group By" clause or sorted on some expression or column values using a "Order By" clause. The number of rows expected to be returned by the query constitutes the cardinality of the query. The Cost Engine, to be able to cost queries, must have some mechanism to compute the cardinality of the query.

Selectivity Analysis

Cardinality determination requires the use of selectivity factors. Selectivity factors are estimates about the number of rows that will satisfy a certain criteria on column or attribute values. For example, we wish to retreive all rows that satisfy the column Sex with value Female. A selectivity of 1/2 would indicate that half of the rows possess this value. Typically, selectivity of predicate of form "column=constant value" would be:

selectivity( Of Predicate P) = 1/NDV

(NDV = No. Of distinct Values of the column.

(Predicate - a single Where clause condition)

Selectivity estimate is thus a value ranging from 0 to 1, with one indicating poorest selectivity. Predicates come in different forms and many other relational operators (e.g <,> etc), column expressions, joins etc. Using the statistical descriptors such as high/low values, number of distinct values, etc. and by resolving certain complex forms to simpler types the cost optimizer calculates the selectivity of each predicate. Note that so far we have dealt with selectivity associated with predicates. A query typically has multiple predicates that are joined by boolean operators (AND,OR,NOT). Selectivity of individual predicates are combined in the following way to determine the overall selectivity which will be used to determine the cardinality of the query.

Logical Operator Resultant Selectivity

---------------- ---------------------

AND S1 * S2

OR S1 + S2 -( S1 * S2 )

NOT 1 - S1

S1=Selectivity of Predicate P1

S2=Selectivity of Predicate P2

Retrieval Methods

Once the cardinality of the query is known, the next thing to consider are the costs associated with each available retrieval method. Currently, Oracle has two possible methods to retrieve rows from a table:

  1. Table can be scanned sequentially starting from the first data block of first extent to the last block allocated. This is called the "Full Table Scan".

  1. Table can access block at a time by first retrieving the address of the row by performing a lookup on one or more indexes. This is called "Index Scan".

Each method involves I/O usage and no assumptions are made on availability of blocks in cache. The I/Os involved in each case are detailed using the statistical descriptions of the objects accessed and certain instance initialization parameters e.g. multi-block read factor, etc.

A table may have several access indexes on it. Given the query, it may be possible to use one or more of the indexes. The cost engine considers all the possible access paths and computes the cost of each path.

Full Scan Cost

Full table scan cost is dependant on the maximum number of blocks ever used by the table and the multi-block read factor. As more rows are inserted into the table, blocks are acquired from the extents assigned to the table. Even if some rows are later deleted from the table and may cause blocks to be empty, the kernel has to still read all blocks ever used by the table. Multi-Block read factor is the number of adjacent database blocks that can be read in a single I/O operation.

Index Scan Cost

The cost of using an indexed accesss can be split into two components :

  1. Cost of traversing the index (Oracle indexes are B*tree structures made up of one or more levels of branch blocks and leaf blocks) and

  1. Cost of performing table lookup. In certain cases, this may not be needed if the optimizer determines that index lookup is sufficient to satisfy the query requirements.

The following statistical descriptors are available about indexes :

  1. Bl - Number of levels in the index including the leaf level
  2. Lb - Number of leaf blocks
  3. Bk - the average number of index leaf blocks per index key value
  4. Dk - the average number of data blocks in the tables that are pointed to by a key value.
  5. Cf - Clustering Factor

This value indicates the ordering of the rows in the table blocks based on the values of the index. If this value is closer to the number of blocks in the table, then the table rows are ordered by the values of this index. This means that row addresses in the index leaf block are most likely to point to rows in the same table data block. If this value is closer to the number of rows in the table, then the table rows are randomly distributed in relation to the values of this index. This means that row addresses in the index leaf block are most likely to point to rows in the different table data blocks.

Using one or more of these descriptors, the index traversal cost is calculated. For example, consider a query which has predicates of the form

"column=" for all columns of a unique index -

Cost of Index Access = Number of Levels + 1

= Bl + 1

The cost, in other words, is the cost of accessing each block as we traverse down in the index plus the cost of accessing the table block. The Cost Engine uses several functions to evaluate the costs for complicated cases. Predicates may use nonequality operators such as "column > value" or LIKE operator or may specify only some columns in a concatenated index. Using selectivity estimates and table and index statistics, the cost engine computes the number of distinct blocks that will be touched by the predicates and thus derives the costs.

Sort Costs

Certain SQL operations require a sort of the rows to be performed. This may be explicit in the SQL statement e.g. when an Order By, Group By clause is used or be implicit e.g when a sort merge join is done to join two tables. No matter what triggers the sort, to perform the sort, the kernel uses up the system resources which need to be costed. Depending on how much data is to be sorted, the kernel may be able to complete a sort in memory. This depends on the setting of the initialization paramter "sort_area_size". If the data to be sorted does not fit in the sort_area_size buffer, the kernel splits the sort into manageable segments and does smaller sorts writing the results to disk in areas called as the temporary segments.

When all segments are sorted, we do a final merge pass. In estimating sort I/O costs, we first determine the size of the data based on the average row lengths. We then ascertain if this can fit in the sort_area_size and if it can then no I/Os are involved. Otherwise the I/Os are estimated based on the writes/reads from temporary segments. Multi block I/O capability is considered in the calculations. The calculations also take into account the work done in the final merge pass.

Join Processing

In a query involving more than one table, join processing is involved to determine the most optimal plan to perform the join. This comprises:

  1. Join cardinality
  2. Enumerating join orders
  3. Evaluating the costs associated with each join path under each join method.

Join Cardinality

Join cardinality is the determination of the number of rows that will make up the joined relation. In the worst case, we have no join predicates and the cardinality is the Cartesian product of the two or more tables involved. But typically when two or more tables participate in a join, the join is based on values of some columns. Using the statistical descriptions on the columns and the number of rows in each table, we calculate the cardinality of the joined form.

Join Orders

A "Join Order" is a particular permutation of ordering the access to tables participating in the join to complete the joined relation. Join orders depend on the type of join or the join topology. Depending on the structuring of the join predicates, joins can be classified into various types viz chain, star, cycle, complete graph, etc. For example, consider the query with a three table join -

SELECT ... FROM t1,t2,t3 WHERE

t1.col1 = t2.col1 AND

t2.col1 = t3.col1 ;

Tables t1, t2 and t3 are joined in a chain order.

Tables t1,t2,t3 can be joined in n! i.e n factorial way .

where n is number of tables .

So in This case it will be 3*2*1=6

Possible join orders:

t1->t2->t3, t2->t1->t3, t2->t3->t1, t3->t2->t1

t1->t3->t2, t3->t1->t2

Note the Join orders such as t1->t3->t2, t3->t1->t2 are evaluated using

Cartesian product as there is no join predicate specified between t1 and t3.

Each of the join orders are possible and need to be evaluated for resource usage. As number of tables increase, depending on the join topology, the search space or the possible permutations go up steeply. Several techniques are used to prune the search space and reduce the work involved in identifying the best join order.

Evaluating Join Path Costs

For each join order, the optimizer evaluates the cost of performing the join by breaking the order into pairs. The first part is made of the relations already joined and the second part of the next table to be joined. Costs are considered using both join methods currently implemented in the kernel, the sort-merge join method and the nested-loops join method.

In the sort-merge method, the tables participating in the join are first sorted by the join columns. The sorted relations are then joined. The cost of performing join by this method includes the sorting costs and the cost of retrieving sorted records to perform the join.

In the nested-loops method, we scan the outer table row one at a time and for each row retreived, we access the inner table based on the join columns. The outer table will actually be a joined relation where more than two tables are joined. The cost of performing the join in this method includes the cost of accessing the outer table and for each row retrieved and the cost of fetching the group of rows from the inner table based on the join columns.

For each table in a particular join order, there is a large number of possible access paths, The table can be accessed using a ROWID or could be accessed by doing a table scan. Alternatively, it may be accessed using a single index or a cluster scan or a merge of several single column nonunique indexes.

Several rules are used to narrow down or prune the search space to speed up access path selection. For example, if there is a lookup by ROWID, it is considered the best and we don't bother to evaluate other access paths open to the table.

Selecting the Execution Plan

As the optimzer evaulates the cost of each plan it compares it with the best cost plan seen so far. It keeps doing that until it gets to the best from the range of plans it considers. If the best plan selected is not the "ideal" plan based on the users knowledge of the data, the user can force a different plan by using "Hints".

The rule based Optimizer can be influenced in many ways. The query could be rearranged with the tables ordered differently to force a certain join order, functions could be put around indexed columns to avoid using the index, etc. These techniques were developed by users over a period of time and were valuable in acheiving desired performance.

How to Obtain Tracing of Optimizer Computations (EVENT 10053)

OBTAINING TRACE OF OPTIMIZER COMPUTATIONS

Occasionally, support may wish to examine the internal decisions made by the Cost Based Optimizer (CBO) and will request a trace of optimizer decisions using event 10053. 2 methods Detailing how to gather this trace are illustrated below.
N.B. This event has no impact on queries optmized by the Rule Based Optimizer (RBO).
Also, make sure that max_dump_file_size=unlimited or the trace file may be truncated

Method 1

Ensure that a PLAN_TABLE exists in the schema of the user that will be used to trace the query. If the PLAN_TABLE does not exist then it can be created by running the utlxplan.sql script which resides in the rdbms/admin under the Oracle home $ORACLE_HOME/rdbms/admin/utlxplan.sql on unix systems).

Connect to Oracle using SQL*Plus as the appropriate user and issue the following series of commands:

SQL> alter session set max_dump_file_size = unlimited;

SQL> ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

Session altered.

SQL> EXPLAIN PLAN FOR --SQL STATEMENT--;

Explained.

SQL> exit


Method 2 (using binds)


Connect to Oracle using SQL*Plus as the appropriate user and issue the following series of commands. Set up as many binds as are needed by the query:

SQL> ALTER SESSION SET EVENTS '10053 trace name context forever, level 1';

Session altered.

-- set up binds.

SQL> variable a number

SQL> variable b varchar2(10)

-- assign values to binds

SQL> begin

2 :a:=20;

3 :b:='CLERK';

4 end;

5 /

SQL> select empno, ename, mgr

2 from emp

3 where deptno = :a

4 and job = :b

/

SQL> exit


When using method 2 you need to ensure that the statement is parsed. Do this by either changing the formatting of the SQL (add spaces, make some characters uppercase) or adding a comment to the SQL. e.g:

SQL> select /* 10053 trace #1 */


Finding the trace file


With either case, a trace file will be generated in the location.

To identify the correct trace file, search for the the relevant --SQL STATEMENT--.
This will be followed by a section headed "PARAMETERS USED BY THE OPTIMIZER". e.g:

*** SESSION ID:(15.7070) 2003-01-07 12:10:11.308

QUERY

SELECT * FROM EMP

***************************************

PARAMETERS USED BY THE OPTIMIZER

********************************

OPTIMIZER_FEATURES_ENABLE = 8.1.7

OPTIMIZER_MODE/GOAL = Choose

OPTIMIZER_PERCENT_PARALLEL = 0

HASH_AREA_SIZE = 10240

HASH_JOIN_ENABLED = TRUE

HASH_MULTIBLOCK_IO_COUNT = 16

...

Explanation of the PLAN_TABLE requirement:

For a 10053 trace to be produced, the QUERY must be using the CBO and must be reparsed with the event in place. The PLAN_TABLE is not actually required for this trace to work. It is only there to facilitate the EXPLAIN PLAN command. The EXPLAIN PLAN is there because it forces a reparse a reparse of the statment. EXPLAIN PLAN will fail without the presence of the PLAN_TABLE.

Explanation of the behaviour when using binds :

Note that EXPLAIN PLAN and SQLPlus have limitations in the way they treat certain bind types. With bind variables in general, the EXPLAIN PLAN output might not represent the real execution plan. Additionally, SQLPlus does not support DATE datatypes. If using binds, use method 2 but bear in mind this might not give the exact same execution plan as when the SQL is run from within in your application

  1. In a cost based optimization strategy multiple execution plans are generated for a given query, and then an estimated cost is computed for each plan. The optimizer chooses the plan with the lowest estimated costs.
  2. Cost based optimizer depends upon the statistics of the tables, if the statistics are not present Oracle will execute rule based optimizer, or execute cost based optimizer if statistics are present.
  3. The danger with the "chose" or "cost" optimizer is in cases when one oracle table in a complex query has statistics and the other tables do not have statistics.
  4. You can collect the statistics if the table by issuing the command - SQL> analyze table emp compute statistics;
  5. Cost based optimizer will concentrate on CPU time and I/O time and it will try to choose which is better using index or full table scan.

HINTS

SQL tuning with hints

  1. Although the optimizer in incredibly accurate at choosing the correct optimization path and use of indexes for thousands of queries on your system, it is not perfect.
  2. Oracle provides hints that you can specify for a given query so the optimizer is over ridden, hopefully achieving better performance for a given query
  3. There are several hints that can be directly embedded into oracle SQL.
  4. These hints serve the purpose of changing the optimizer path to the data.
  5. Remember, hints override all settings for optimizer mode.
  6. Multiple hints are separated with a space between each.
  7. The cost based optimizer will not alert you if you have a syntax error in your hints.
  8. Incorrect hints syntax leads to the hints being interpreted as comments.
  9. if an aliased is used, the alias must be used in the hint or it will not work

The top Hints used are

  1. Index
  2. Ordered
  3. Parellel
  4. First_rows
  5. Rule
  6. Full
  7. Leading
  8. Use_nl
  9. Use_merge
  10. Append
  11. Use_hash

Comparisons of the primary join methods and the database parameters that affect them.

This is a quick reference that describes the differences between the primary join
methods and the init.ora parameters that impact their performance.

CATEGORY

NESTED LOOPS

Optimizer hint

USE_NL(Inner_Table/Alias)

When we can use it

Any join

Resource concerns

CPU, Disk I/O

Init.ora parameters

DB_BLOCK_BUFFERS
OPITIMZER_INDEX_COST_ADJ
OPTIMIZER_INDEX_CACHING

Features

Efficient when driving table has small number of rows and the lookup table has efficient access methods. Returns normally the first row faster than a SORT MERGE or HASH join.

Drawbacks

Very inefficient when indexes are missing or if indexed criteria are not very selective

CATEGORY

SORT MERGE JOIN

Optimizer hint

USE_MERGE(Inner_Table/Alias)

When we can use it

Any join

Resource concerns

Temporary Segments

Init.ora parameters

SORT_AREA_SIZE
DB_FILE_MULTIBLOCK_READ_COUNT

Features

Better than NESTED LOOPS when an index is missing or the search criteria is not very selective. Can work with limited memory.

Drawbacks

Requires a sort on both tables. Sorts are expensive. It is built for best optimal throughput and does not return the first row until all rows are found

CATEGORY

HASH JOIN

Optimizer hint

USE_HASH(Inner_Table/Alias)

When we can use it

only with CBO available
the kind of join operation ( equijoin ,outer join) where we can use a hash join depends on the oracle version.

Resource concerns

Memory and Temporary segments

Init.ora parameters

HASH_JOIN_ENABLED
HASH_AREA_SIZE

Features

Better than NESTED LOOPS when index is missing or the search criteria is not very selective. It is usually faster than a SORT MERGE.

Drawbacks

Can require a large amount of memory for the hash tab to be built. Does not necessarily return the first row quickly

Explanation of join related initialization parameters

The performance of SORT MERGE joins and HASH joins is strongly impacted by certain initialization parameters. Join performance can be crippled if certain parameters are not set properly. Nested loop operations normally tend not to require particular parameters other than for general tuning purposes.

NESTED LOOP join

Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table.

o modifys the behavior of the Cost Based Optimizer (CBO) to favor nested loops joins and IN-list iterators.

o Adjusts the cost of index use.
In 8, 8i and 9i , the default value seems to be to high, thus the CBO tends to choose full table table scan instead of an index scan

SORT MERGE join

Sort merge joins can be used to join rows from two independent sources. Hash joins generally perform better than sort merge joins. On the other hand, sort merge joins can perform better than hash joins if both of the following conditions exist:

· The row sources are sorted already.

· A sort operation does not have to be done.

However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.
Sort merge joins are useful when the join condition between two tables is an inequality condition (but not a nonequality) like <, <=, >, or >=. Sort merge joins perform better than nested loop joins for large data sets. (You cannot use hash joins unless there is an equality condition).

· specifies how many blocks Oracle should read at a time from disk when performing a full table scan. Since SORT MERGE joins often involve full table scans, setting this parameter will reduce overhead when scanning large tables. But on the other side high values of DB_FILE_MULTIBLOCK_READ_COUNT makes index access more expensive

· specifies how much memory can be used
for sorting , and this has a strong impact on performance of all sorts. Since SORT MERGE joins require sorting of both row sources, SORT_AREA_SIZE strongly impacts SORT MERGE join performance. If an entire sort cannot be completed in the amount of memory specified by this parameter, then a temporary tablespace will be allocated .In this case, the sort will be performed in memory one part at a time and partial results will be stored on the disk in the temporary segment. If SORT_AREA_SIZE is et very small, then excessive disk I/O will be necessary to perform even the smallest of sorts. If it is set too high then OS may run out of physical memory and resort to swapping.

HASH join

Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.
This method is best used when the smaller table fits in available memory. The cost is then limited to a single read pass over the data for the two tables.
However, if the hash table grows too big to fit into the memory, then the optimizer breaks it up into different partitions. As the partitions exceed allocated memory, parts are written to temporary segments on disk.

· dictates whether optimizer should consider using HASH joins. If you do not want HASH joins to be used, set this parameter to false.

· specifies how much memory can be used to build a hash table for a HASH join , and resembles the SORT_AREA_SIZE parameter. If this parameter is set too small , then partial hash tables will need to be stored in temporary segments. If this parameter is set too big, then physical memory would be exhausted. As with SORT_AREA_SIZE, HASH_AREA_SIZE indicates how much memory can be used per session. Many concurrent sessions can consume a lot of memory.
The default value of HASH_AREA_SIZE = 2 * SORT_AREA_SIZE.

General rules for tuning SQL statements

  1. Tuning SQL statements by adding indexes
  2. One of the most common techniques for removing the unwanted full table scan is to add a new index to a table.
  3. There are two special cases of indexes that are especially useful
  4. Function based indexes, when ever a SQL query must use a function to an indexed column, a function based index can remove full table scan
  5. Bitmap indexes, the index on the column with distinct values and low cardinality can give the better response.
  6. Avoid the use of "not in" or "having" in the where clause, instead use the "not exist" clause.
  7. Never specify numeric values in character from, and character values in numeric from this in validates the index and causes full tables scams.
  8. Avoid specifying NULL in an indexed column,
  9. Avoid using "like" parameter, if "=" will suffice, using any Oracle function will invalidate the index, causing a full table scan.
  10. Avoid using sub queries when a "join" will do the job
  11. Avoid literal variable in SQL statements.

SQL tuning process

There are several steps that are repeated until all major SQL is tunes

  1. Locate offensive and high impact SQL statements
  2. Extract the offensive SQL syntax
  3. Explain the SQL to get the execution plan
  4. Tune the SQL with indexes and / or hints and /or Joins
  5. Make the tuning permanent by changing the SQL source

Process

  1. Get the lists of SQL statements from V$SQLAREA view.
  2. The "execution" column of the V$SQLAREA view can be used to locate the most frequently used SQL.
  3. Explain Plan for the top 10 SQL to see the execution plan (get the execution plan using the above methods, TKPROF, or EXPLAIN PLAN)
  4. For those SQL statements that posses a non optimal execution plan, the SQL will be tuned by one of the following methods.

a) Add SQL hints to modify execution plan (See the Hints)

b) Add B-tree indexes to remove full table scans

c) Add function based indexes if the column in the where clause is using any Oracle function

d) Adding bitmapped indexes to all low-cardinality column that are mentioned in where clause of query

e) Examine the join methods used in the original query. Use appropriate hints to force the join methods that are not being used (i.e. USE_NL, USE_HASH, USE_MERGE).

f) Rewrite the SQL in PL/SQL

g) Change the SQL source to make the changes permanent

h) Remove literal variables if possible from query or use CURSOR_SHARING=SIMILAR/FORCE

TIPS

  1. Query V$SQLAREA and V$SQL to find problem queries that need to be tuned
  2. When a small number of rows are to be returned based on a condition in a query, you generally want to use an index on that condition (column) given that the rows are not skewed with in the individual blocks.
  3. You will have a lot of poor queries if you are missing indexes on columns that are generally restrictive. Building indexes on restrictive columns is the first step toward better system performance.
  4. if you are forced to use the literal SQL statement then use the parameter CURSOR_SHARING=SIMILAR/FORCE
  5. Use the special indexes where ever necessary.
  6. When a query is run multiple times in succession, it becomes faster because you have now cached the data in memory. Some times people are tricked into believing that they had made a query faster, when they are actually accessing data stored in memory.
  7. For large tables with concatenated indexes, the index skip-scan feature can provide quick access even when the leading column of the index is not used in a limiting condition.
  8. The values of the indexed columns are stored in an index. For this reason, you can build composite indexes that can be used to satisfy a query without accessing the table. This eliminated the need to go the table to retrieve the data, reducing I/O
  9. Don’t use bitmap indexes in heavy OLTP environments
  10. Consider using index-organized tables for tables that are always accessed using exact matches or range scans in the primary key.
  11. If you have a limited number of disks and large concurrent sequential loads to perform, reverse key indexes may be a viable solution
  12. For function based indexes to be used by the optimizer, you must set the QUERY_REWRITE_ENABLED initialization parameter to TRUE
  13. Bad indexes can cause as much trouble as forgetting to used indexes on the correct columns. Although Oracle’s cost-based optimizer generally suppresses poor indexes, problems can still develop when a bad index is used at the same time as a good index

No comments: