Wednesday, January 16, 2008

General Hints for Oracle Performance and tuning



SERVER H/W

CPU

  1. Ensure that the servers are SMP based
  2. let the server is scalable in terms of CPU, but there is no guarantee that increasing the number of CPU’s will enhance the performance
  3. Use the O/S utility to monitor the usage of the CPU and pin out where the exact the problem is?
  4. sar , use this command with different options, if the CPU is busy 90% then increase the number of CPU’s
  5. using TOP also we can find out whether the server is slow because of the number of CPU’s
  6. on mission critical databases where speed is a primary concern adding additional processors may not be the best solution it is recommended to upgrade to the faster server architecture

MEMORY

  1. lets ensure that server is not doing high swapping, one can not really stop the swapping, but can minimize the swapping
  2. use commands like “swap –l”, “vmstat” to monitor, if continuous swap is going on through out the day then go for increasing the RAM
  3. Occasional swapping may be tolerable continuous swapping should be avoided by increasing the amount of RAM on the server

DISK I/O

  1. the largest single component of ORACLE response time is disk I/O, any thing that we can do to reduce disk i/o will have a positive benefit on the performance of the database
  2. we concentrate more on increasing the I/O performance which should result over all performance of server by implementing the following things

I. use only ultra 320 SCSI controllers

II. Preferably with some buffer on the controller which can be used for “Read Ahead” and “Write Delayed” technologies

  1. Never connect a slower derive in the faster drive’s groups as the entire SCSI based width falls down to the speed of the lowest device on the chain
  2. Match the controller speed with drive’s speed which means when we have ultra 320 SCSI controller let all the drives also be ultra 320 drives
  3. though 15k rpm disk is more expensive when compare to 10k rpm disk, from the performance point of view 15k rpm disk can deliver the data faster
  4. the latest 15k drives are also offering buffer cache which can dramatically increase the I/O performance
  5. never buy larger drives instead get multiple small drives so that more spindles are being involved in providing the data hence better I/O performance
  6. Don't over load the controller with too many drivers, usually u-5 drives per controller are recommended.
  7. When the server has more data which required more drives than plan a having multiply SCSI controller of one.
  8. Monitor the disk waits, SCSI controller channel contention and the contention between the disk adapters.
  9. If possible implement the RAID with respect to the environment to enhance the performance.
  10. Instead of internal disks go far the external disk array, are consider the RAM cache on the disk array because it changes the basic nature of disk I/O.
  11. If there is a high amount of I/O then go far RAW devices which bypasses the I/O overhead related by the JFS ( Journal File System) in unix.
  12. Collect the I/O statistics against the disk and perform load satisfying at the disk level.
  13. Use the O/S utility iostat to collect the statistics.

CLUSTER

  1. Increasing the number of CPU's and the amount of RAM is not solving the problem, and then think of clustering the server.
  2. Implement the active/active cluster at the operating system level and implement the load balancing.
  3. Use the latest SAN which delivers the maximum throughput at each level i.e. HBA, controller and the channel.
  4. Think of the cache at every level in SAN which reduces the physical disk I/O controller

SERVER OPERATING SYSTEM TUNING

  1. If any particular file system uses beyond 85% immediately the performance decreases against that file system which indicates always used to maintain at least 15% free space
  2. Check the fragmentation using the command fsck -m , if the fragmentation is beyond 80% than re-organize the file system
  3. If possible implement disk quotas so that the usage of the file system will be under control
  4. Frequently use du command against all the users and implement cleaning procedures for bigger files or out dated files
  5. If the server is carrying a database which is I/O centric than instead of depending on F.S. may be we can implement RAW devices.
  6. Any un utilized service must be turned off use ps -ef command
  7. Where there are no users or in night time may be the server can made busy by executing the long time and huge resource consuming batch job such as backup, compress or move etc.
  8. Always take compress backup files on the tape to keep backup
  9. By default the O/S may not offer all the available resource to grab which can be modify by changing the kernel tunable parameter like “shmmax”, “shmseg” etc in most of the unix platform
  10. Optimize the unix kernel parameter to suit Oracle database as recommended by the Oracle
  11. Use the following performance tunable command is unix

sar, top, swap, vmstat, iostat, fsck , df -k, bdf, du, netstat, nfsstate, cachefsstat

CLIENTS

  1. On the client side ensure that there is no fragmentation.
  2. Do not use yesterday's CPU with today's software.
  3. upgrade the clients with respect to the server ( replace out dated with currents)
  4. Use browser cases clients
  5. As the H/W is not permanent, lease the clients machines instead of buying permanently

NETWORK

1. Make sure that the server is equipped with Gigabyte NIC

2. As the server is responsible for many clients the condition could be at network, so it is advisable to go for multiple NICs.

3. Let all the clients uses 100ms NICs.

4. Instead of hub using switches can enhance the performance at N/W level.

5. Do not cascade too many switches instead limit the member of switches with more ports.

6. Use some N/W monitoring program like HP open-view or Cisco works and ensures there are no packet losses too many collisions at N/W level.

  1. Use the “netstat” utility to monitor the N/W activity at O/S level

ORACLE INSTANCE TUNING

Shared pool tuning

  1. Execution plans are stored in Library Cache if user is executing SQL statement whose explain plan is not present in library cache, then LRU statement is flushed out from LC which is stored in LRU list and I/O is done for execution plan.
  2. Library Cache maintains a list of SQL context area, if it is full it will be flushed off

3. Whatever the SQL statement user issues the Oracle reduced that statement to ascii values on top of which hash function applied, that will generate on unique 8 digit value for the statement

SQL > select * from acct where custno=7359; # ascii hash 3514

SQL> select * from acct where custno=7359 # ascii 378 hash #29813

According to us the above statements are same but Oracle gives different ascii values

4. As all the explain plans are present in Library cache the I/O can be minimized by increasing size of the Library cache

5. If the get hit ration > 0.9 than the performance is good

6. Monitor the get hit ratio by using the view v$librarycache

7. If the get hit ratio is <>

8. Configure the large pool in MTS separately, Oracle stores all the user’s session variables for global verification in large pool

9. If the large pool is not configured than Oracle is going to eat shared pool size

10. Heavily used pl/Sql procedure can be saved permanently in library cache using dbms_shared_pool.keep procedure

11. Using the bind variables in the SQL statement is very helpful in increasing the performance

Ex: - select * from emp where empno=: b

12. The execution plan for the above SQL statement is saved in Library cache and always uses the same execution plan

13. Using literal variables will decrease the performance

Ex: - select * from emp where empno=735

14. The size of the shared pool is controlled by shared_pool_size

Tuning Database Buffer cache

  1. the purpose of the Oracle data buffers is to reduce Disk I/O. Disk I/O is the single largest component to performance
  2. when a SQL statement requests a row Oracle will first check the internal memory structures to see if the data is already in a data buffer, by caching the data blocks in RAM Oracle avoids doing unnecessary I/O
  3. When there is not enough room in the data buffer for the whole database, Oracle utilizes a LRU algorithm to determine which database pages are to be flushed from memory
  4. Starting from Oracle 8, Oracle provides three separates Pools of RAM with in the database buffer cache region to hold incoming Oracle data blocks I)Keep pool II) Recycle pool III) Default pool
  5. Use multiple DBWR to flush the data fast
  6. Small objects which are heavily used on daily basis can be cached permanently in the buffer. Ex: - like table DEPT having few rows can be cached
  7. calculate the cache hit ratio using the view v$sysstat, it should be always > 0.9, if it is <>
  8. Keep pool is used to hold tables that are frequently referenced by the application. This normally includes small tables that have frequent full table scan and reference tables for application.
  9. db_keep_cache_size is the parameter used for keep pool configuration
  10. Recycle pool is reserved for large table’s full scan, because Oracle data blocks from full table scans are unlikely to be reread, the recycle pool is used so that the incoming data blocks do not flush out data blocks from more frequently used tables and indexes
  11. the tables which are not more in used should be kept in recycle pool
  12. db_keep_recycle_pool is the parameter used for recycle pool
  13. once the table is used then the blocks are going to be flushed from recycle pool
  14. Default pool is used for all tables and index access that is not appropriate for the keep or recycle pool

Redo Buffer Tuning

  1. If the redo buffer is showing any wait events than increase the size of the log buffer
  2. the wait events can be monitor through v$sysstat
  3. If there is wait for LGWR than increase redo buffer

DATABASE TUNING

Regular maintenance

1. Taking care of segments, extents. Should not reach more than 20 to 30 extents as it decreases the performance.

2. Ensure every tablespace has at least 10% free space.

3. Make sure that there is no row migration and row chaining issue.

4. As the indexes get scattered leaves because of insert, delete and updates, It is a good idea to rebuild all the indexes periodically, at least once in a month depending up on the environment.

5. Always create tables and indexes in different tablespaces.

6. Properly size the undo tablespace and undo retention which suite the size of the transactions.

7. Every 3-4 month plan to reorganize the database depending upon the environment

8. Properly size the redo log files if it is very small there will be too much switches which will reduce the performance

9. Analyze all the tables at least once in a month depending up on the environment.

10. Increase the SGA size depending upon the usages of database.

11. Implement multiple database writes by which Oracle can write different blocks belonging to different disk simultaneously.

12. Monitor the Hit ratio should not be less than 90%.

13. Allocate enough sort area size and make size to minimize the sorting in disk (temp file) which will decrease the performance.

Key init.ora parameters

The crucial init.ora parameters in Oracle which has direct impact on the performance are as follows

  1. SGA_MAX_SIZE
  2. DB_CACHE_SIZE
  3. SHARED_POOL_SIZE
  4. PGA_AGREGATE_TARGET
  5. SORT_AREA_SIZE
  6. DB_FILE_MULTIBLOCK_READ_COUNT
  7. DB_BLOCK_SIZE

By properly sizing the above parameters we can achieve the better performance.

PARTITIONING

Advantages

  1. Partitions can be independently manages.
  2. Backup and recoveries can be done on individual partitions
  3. Partitions can be moved from one tablespace to another tablespace.
  4. Can be dropped, added or truncates independently.
  5. SUID (select, update, insert, delete) can be done independently at partition level instead of full tables level.
  6. The size of the temporary segments used for sorting can be dramatically reduced in case of partitions.
  7. We can load balance the partitions evenly across the available physical disks.
  8. Export/Import can be done at partitions level.
  9. Partitioning is a way to increase efficiency by accessing smaller pieces of table or indexes instead accessing full table or index.

Why

  1. Continued data availability for partitioned tables
  2. Data management operations with infinite maintenance window.
  3. Scalable performance with the high growth in data volume.
  4. Improved performance for parallel DML statements
  5. Avoiding disk contention by using partitions.

Where

  1. Heavily access huge tables/indexes
  2. Tables those are bigger than medium size.
  3. When there is disk contention at object level

Goal

To minimize disk I/O on a single large table, break the table into multiple partitions that reside in tablespace on different physical disks.

Actions

  1. Modify the partitions (physical attributes), such as pctfree, pctused etc.
  2. Rename the partitions
  3. Move partitions (different tablespace)
  4. Add partitions ( to the high end of a partitioned table)
  5. Drop partitions
  6. Split / Merge partitions
  7. Rebuild partitions
  8. Export Partitions

Choice

  1. Partitioned tables can have partitioned or non partitioned indexes
  2. A non partitioned table can have partitioned or non partitioned indexes

Types

a) Range partitions

1. Partitions by range, by defining the partition key (min one column, max 16 columns).

2. The key word "Value less than" must be specified as literal

3. Physical attributes for storage parameter can be set independently per partitions

4. Can not have gaps in the range

5. ‘Max Value’ allow possible values given to the highest partitions

6. Range partition is best suited when we have before hand how much data will be mapped into a given range.

7. Sizes of partition may differ

8. Partitions can be eliminated if necessary

9. Multicolumn range partitioning is the same as range partitioning is the same as range partitioning except you use multiple columns to define the ranges.

b) Hash Partitioning

  1. Hash partitioning is usually used when you are unsure where to put the break points as you do in the range partitioning.
  2. It breaks up data into the number of partitions specified based on the hash of the partition key specified.
  3. To get even distribution, you should always specify a multiple of 2 as the number of hash partitions.
  4. When you don't know how to breakup a task, but you know that it needs to be partitioned and spread out, use hash partitioning

c) List partitioning

  1. List partitioning allows you to assign the individual column values associated with each of the partitions.
  2. You can specify only one partitioning key in the column list, and it cannot be al LOB column. If the partitioning key is an object type column, you can partition only one attribute of the column type
  3. each partition value in VALUES clause must be unique among all the partitions of the table
  4. you can not list partition an index-organized table
  5. The string comprising the list of values for each partition can be up to 4KB.
  6. The total number of partition values for all partitions cannot exceed 64KB-1.
  7. you cannot sub partition a list partition
  8. MAXVALUES is not specified.

I/O TUNING

  1. There are many techniques that can be performed with in Oracle to reduce the amount of I/O.
  2. There are several database instance parameter that have a direct impact on lowering physical disk I/O.

3. Large DB_CACHE_SIZE, The greater the number of database buffers the smaller the chance the Oracle will need to perform disk I/O.

4. Larger DB_BLOCK_SIZE, The block size of the database has a dramatic effect on the amount of Disk I/O, as a general rule, the larger the block size the less the Disk I/O

5. Utilize multiple block sizes, the largest supported block size in Oracle 9i for your platform should be reserved for index tablespace, and smaller block sizes help the speed of bitmap index DML.

6. Multiple DBWR process allows for more efficient writing to the data files.

7. Large SORT_AREA_SIZE the greater the sort area size in RAM, the less disk sorting will take place in the temp tablespace.

8. Large online redo logs, the larger the online redo logs, the less frequently the log switches.

9. Inside the database, settings for tables and indexes can reduce physical disk I/O.

10. With the SQL statements, there are many settings to reduce disk I/O such as using special indexes, hints.

11. Stripping the data files of a tablespace on different physical disks.

12. Implementing the raid with respect to the environment.

13. Using the RAW devices to by pass the operating system.

14. Create the separate tablespaces for heavily access tables on separate disk.

15. Create the separate tablespaces for the tables and indexes on different disks

16. Using the external disk arrays instead of internal disks.

17. Using the locally managed tablespaces instead of dictionary managed tablespaces.

  1. Use the partitioning techniques for huge tables

OBJECT TUNING

1. Use the automatic segment space management for controlling the internal free blocks by Oracle automatically.

2. Create the tablespace with segment space management option as auto.

3. If using segment space management manual than assign the suitable values for the attributes pctfree and pctused.

4. The values for "pctfree" and "pctused" are heavily dependent upon row length and desired reserved space for row expansion.

5. Failure to set pctfree properly can lead to chained rows because not enough space on the data block is reserved for row expansion during on SQL update.

6. Failure to set pctused properly can lead to poor insert performance because not enough row space exists on a re-linked data block.

  1. Poor object performance with in Oracle is experienced in several areas

a) SLOW INSERTS: Insert operations sum slowly and have excessive I/O. This happens when blocks on the free lists. Only have room for a few rows before oracle is forced to grab another free block.

b) SLOW SELECTS: Select statements have excessive I./O because of chained rows. This occurs when rows "chain" and fragment on to several data blocks. Causing additional I/O to fetch the blocks.

c) SLOW UPDATES: Update statements run very slowly with double the amount of I/O. This happens when "update
operations expand a VARCHAR or BLOB column and oracle is forced to chain the row contents on to additional data blocks.

  1. Query the view dba_tables to see the "chaining".
  2. If there is any row chaining then reorganize the objects.
  3. There are several methods for reorganizing the tables such as, creating the new tablespace and moving the tables to the new tablespace, Export and import, Using CTAS (create table as select * from table)

11. Rebuild the indexes to reorganize the index.

12. Segregate the huge tables with high I/O into a different tablespace.

13. Split the tables in to different tablespaces whose data files are on different disks.

  1. Never set PCTINCREASE to any thing other than zero in dictionary-managed tablespaces
  2. if the indexes are relatively small in comparison to the overall size of the table, than the fast full scan may provide the performance burst necessary for the application. With concatenated indexes that contain most of the columns of a table, the index may be larger than the actual table, and the fast full scan could cause degradation in performance.

APPLICATION TUNING

  1. EXPLAIN PLAN

1. It is an utility which is useful for both DBA and application programmers.

2. This is pre execution, we are only asking oracle to offer us execution plan without really executing the statement.

  1. We create a table call PLAN_TABLE by running the script $ORACLE_HOME/netwrk/admin/utlxplan.sql

4. When we ask Oracle to explain plan for a given SQL statement, Oracle dumps the execution plan in the plan table.

5. Later we can visit the plan table to get the execution plan.

6. The only limitation is we need to have the SQL statement.

  1. The explain plan command allows us to view the query execution plan that the optimizer will use to execute SQL statement.
  2. By having the "Explain Plan" from the SQL statement we can get the information about the execution of the SQL statement that the optimizer is using which execution plan and how much it cost.

  1. TKPROF

1. This is useful whether we have the source code or not

2. Before submitting the SQL statement we enable SQL_TRACE=true at session level

3. When ever the SQL statements are submitted to the database, oracle is generating trace files in UDUMP directory.

4. This trace file is in binary, we use this utility TKPROF on top of the generated trace file and create an output file which is readable(text format).

5. This output file basically gives us the complete execution plan for the submitted SQL statement.

6. If the SQL statement is not available since we are using a shrink grab software package than we need to enable SQL_TRACE at instance level.

7. TKPROF is always post execution command

8. You have to set the parameter TIMED_STATISTICS=TRUE in init.ora

9. A trace query with a large number of physical reads usually indicates a missing index

  1. A traced query output with only memory reads usually indicates that an index is being used.

OPTIMIZATION

Optimizer executes SQL statements; there are two types of optimizer a) RULE BASED, b) COST BASED

RULE BASED OPTIMIZER

  1. OPTIMIZER_MODE=RULE, the first and the oldest mode is rule

2. Oracle uses heuristics from the data dictionary in order to determine the most effective way to service an oracle query and translate the declarative SQL command in to an actual navigation plan to extract the data.

  1. Oracle will execute the RULE_BASED optimizer if there are no statistics present for the table.

CHOOSE OR COST BASED OPTIMIZER

1. In a cost based optimization strategy multiple execution plans are generated for a given query, and then an estimated cost is computed for each plan. The optimizer chooses the plan with the lowest estimated costs.

2. Cost based optimizer depends upon the statistics of the tables, if the statistics are not present Oracle will execute rule based optimizer, or execute cost based optimizer if statistics are present.

3. The danger with the "chose" or "cost" optimizer is in cases when one oracle table in a complex query has statistics and the other tables do not have statistics.

4. You can collect the statistics if the table by issuing the command - SQL> analyze table emp compute statistics;

  1. Cost based optimizer will concentrate on CPU time and I/O time and it will try to choose which is better using index or full table scan.

HINTS

SQL tuning with hints

1. Although the optimizer in incredibly accurate at choosing the correct optimization path and use of indexes for thousands of queries on your system, it is not perfect.

2. Oracle provides hints that you can specify for a given query so the optimizer is over ridden, hopefully achieving better performance for a given query

3. There are several hints that can be directly embedded into oracle SQL.

4. These hints serve the purpose of changing the optimizer path to the data.

5. Remember, hints override all settings for optimizer mode.

6. Multiple hints are separated with a space between each.

7. The cost based optimizer will not alert you if you have a syntax error in your hints.

  1. Incorrect hints syntax leads to the hints being interpreted as comments.
  2. if an aliased is used, the alias must be used in the hint or it will not work

The top Hints used are

1. Index

2. Ordered

3. Parellel

4. First_rows

5. Rule

6. Full

7. Leading

8. Use_null

9. Append

10. Use_hash

General rules for tuning SQL statements

  1. Tuning SQL statements by adding indexes
  2. One of the most common techniques for removing the unwanted full table scan is to add a new index to a table.
  3. There are two special cases of indexes that are especially useful
  4. Function based indexes, when ever a SQL query must use a function to an indexed column, a function based index can remove full table scan
  5. Bitmap indexes, the index on the column with distinct values and low cardinality can give the better response.
  6. Avoid the use of "not in" or "having" in the where clause, instead use the "not exist" clause.
  7. Never specify numeric values in character from, and character values in numeric from this in validates the index and causes full tables scams.
  8. Avoid specifying NULL in an indexed column,
  9. Avoid using "like" parameter, if "=" will suffice, using any Oracle function will invalidate the index, causing a full table scan.
  10. Avoid using sub queries when a "join" will do the job
  11. Avoid literal variable in SQL statements.

SQL tuning process

There are several steps that are repeated until all major SQL is tunes

  1. Locate offensive and high impact SQL statements
  2. Extract the offensive SQL syntax
  3. Explain the SQL to get the execution plan
  4. Tune the SQL with indexes and / or hints
  5. Make the tuning permanent by changing the SQL source

Process

1. Get the lists of SQL statements from V$SQLAREA view.

  1. The "execution" column of the V$SQLAREA view can be used to locate the most frequently used SQL.
  2. Explain Plan for the top 10 SQL to see the execution plan
  3. For those SQL statements that posses a non optimal execution plan, the SQL will be tuned by one of the following methods.

a) Add SQL hints to modify execution plan

b) Add B-tree indexes to remove full table scans

c) Add function based indexes if the column in the where clause is using any Oracle function

d) Adding bitmapped indexes to all low-cardinality column that are mentioned in where clause of query

e) Rewrite the SQL in PL/SQL

f) Change the SQL source to make the changes permanent

g) Remove literal variables if possible from query or use CURSOR_SHARING=SIMILAR/FORCE

TIPS

  1. Query V$SQLAREA and V$SQL to find problem queries that need to be tuned
  2. When a small number of rows are to be returned based on a condition in a query, you generally want to use an index on that condition (column) given that the rows are not skewed with in the individual blocks.
  3. Your will have a lot of poor queries if you are missing indexes on columns that are generally restrictive. Building indexes on restrictive columns is the first step toward better system performance.
  4. if you are forced to use the literal SQL statement then use the parameter CURSOR_SHARING=SIMILAR/FORCE
  5. Use the special indexes where ever necessary.
  6. When a query is run multiple times in succession, it becomes faster because you have now cached the data in memory. Some times people are tricked into believing that they had made a query faster, when they are actually accessing data stored in memory.
  7. For large tables with concatenated indexes, the index skip-scan feature can provide quick access even when the leading column of the index is not used in a limiting condition.
  8. The values of the indexed columns are stored in an index. For this reason, you can build composite indexes that can be used to satisfy a query without accessing the table. This eliminated the need to go the table to retrieve the data, reducing I/O
  9. Don’t use bitmap indexes in heavy OLTP environments
  10. Consider using index-organized tables for tables that are always accessed using exact matches or range scans in the primary key.
  11. If you have a limited number of disks and large concurrent sequential loads to perform, reverse key indexes may be a viable solution
  12. For function based indexes to be used by the optimizer, you must set the QUERY_REWRITE_ENABLED initialization parameter to TRUE
  13. Bad indexes can cause as much trouble as forgetting to used indexes on the correct columns. Although Oracle’s cost-based optimizer generally suppresses poor indexes, problems can still develop when a bad index is used at the same time as a good index

Kill All, Pretty cool

This statement is written by my colleage Yousef Swiess (Database Consultant), it collects all the sessions from the database which are causing enques and the out put of this statement is ready to kill the sessions at OS(UNIX) level.

SQL> select 'kill -9 '||p.spid
from gv$process p, gv$session s, gv$session_wait sw
where (p.inst_id = s.inst_id and p.addr = s.paddr)
and (s.inst_id = sw.inst_id and s.sid = sw.sid)
and sw.sid in (select sw.sid
from gv$session_wait sw, gv$session s, gv$sqlarea sa
where sw.event not in
('rdbms ipc message','smon timer','pmon timer',
'SQL*Net message from client','lock manager wait for remote message',
'ges remote message', 'gcs remote message', 'gcs for action', 'client message',
'pipe get', 'null event', 'PX Idle Wait', 'single-task message',
'PX Deq: Execution Msg', 'KXFQ: kxfqdeq - normal deqeue',
'listen endpoint status','slave wait','wakeup time manager')
and seconds_in_wait > 0
and (sw.inst_id = s.inst_id and sw.sid = s.sid)
and (s.inst_id = sa.inst_id and s.sql_address = sa.address)
and s.username in ('USERNAME','USERNAME'))
and sw.inst_id=&n
/

write the username in the statement and after running this statement you have to give the parameter value for the instance id (inst_id)

Enjoy the easy administration.

Tuesday, January 15, 2008

Oracle Recommended RAC Architecture


The major physical components of Oracle RAC Architecture are:

  • Physical Hosts (nodes)
  • Shared storage system
  • High speed clusters interconnect
  • Cluster software
  • Oracle RDBMS

Physical Hosts

Physical hosts or the nodes are one of the major components where the Oracle instance resides. This is the place where data-actual database processing takes place. RAC database system provides scalability and High availability. In order to provide the high availability and maintain load balancing it is recommended to extend the number of nodes.

Most of the cluster frameworks now days differs in terms of the maximum number of nodes they can handle, most can support minimum 4 nodes in cluster and some can support hundreds.

The nodes themselves may or may not need to be scalable to provide additional capacity in terms of CPU or memory. Unless one uses an expensive SMP server, scalability will be an issue. The ability to scale both within a machine as well as across machines is often desirable.

Oracle RAC can be implemented on a wide range of servers from a clustered group of single CPU Windows boxes to a cluster of 32-CPU SUN E10000 boxes.

Linux machines are now able scale up to 8 CPUs, but the majority of systems are 2 or 4 CPU nodes. SMP scalability in Linux beyond 4 CPU’s is not will proven, so the current Oracle recommendation is to stick with 4-CPU machines..

At the same time, Oracle RAC can also run on platforms that allow sub-setting of CPU’s, such as the SUN E10000, E15000, and the HP Superdome. In the case of CPU sub-setting, the single server is divided into multiple nodes, each running an instance of Oracle 10g/9i RAC.

Shared Storage System

Shared storage is a critical component of an Oracle RAC environment. Traditionally, storage was attached to each individual server known as DAS. Today, more flexible storage that is accessible over a storage area networks (SAN) or regular Ethernet networks (NAS) is popular. These new storage options enable multiple servers to access the same set of disks through a network (FC-Switches or Ethernet), simplifying provisioning of storage in any distributed environment. SANs represent the evolution of data storage technology to this point.

In shared storage, database files should be equally accessible to all the nodes concurrently. Generic file systems do not allow disks to be mounted in more than one system. Generic UNIX file systems (UFS) do not allow the files to be shared among the nodes because of the obvious file locking (inode locks) issues and unavailability of a coherent file system cache. One option is to use network file system (NFS), but it is unsuitable as it relies on a single host (which mounts the file systems) and for performance reasons. Since the disks in such implementation are attached to one node, all the write requests must go that particular node, thus limiting the scalability and fault tolerance.

The choice of file system is critical for RAC deployment. Traditional file systems do not support simultaneous mounting by more than one system. Therefore, you must store files in either raw volumes with out any file system or on a file system that supports concurrent access by multiple systems.

Thus as of Oracle 10g RAC three major approaches exist for providing the shared storage needed by RAC:

Raw volumes: These raw devices require storage that operate in block mode such as Fiber Channel SANs or Internet SCSI (iSCSI).

Cluster file system

High Speed Cluster Interconnects

Configuring private network as recommended by Oracle

The cluster interconnect is a high bandwidth (preferably 1gigabit or more), low latency communication facility that connects each node to other nodes in the cluster and routes messages among the nodes.

In general, the clusters interconnect is used for the following high-level functions:

  • Monitoring health, Status, and Synchronous messages
  • Transporting Distributed lock manager (DML) messages
  • Accessing remote file systems
  • Moving application-specific traffic
  • Providing cluster alias routing

It is a communication path used by the cluster for the synchronization of resources and is also used in some cases for the transfer of data (cache fusion) from one instance to another instance. Typically, interconnect is a network connection is dedicated to the server nodes of a cluster (and thus sometimes refer to as a private interconnect) and has a high bandwidth and low latency.

It is important not to use the private network for the regular (public) user traffic, keep user traffic away from the private network otherwise cache fusion and other inter-instance activity will become backlogged, reducing the overall effectiveness of the cluster.

A redundant private network, that can replace the network carrying the cache fusion and other messages, is recommended to help avoid down time in the case the primary interconnect fails.

At network level, a failure in a NIC can cause an outage to the cluster, especially if the failures occurs at the interface on which the interconnect is configured. To achieve high availability at this layer, network teaming/bonding can be used.

Bonding offers the following benefits:

  • Bandwidth scalability Adding a network card doubles the network bandwidth. It can be used to improve aggregate throughput.
  • High availability Provides redundancy or link aggregation of computer ports.
  • Load balancing HP Auto Port Aggregation (APA) supports true load balancing and failure recovery capabilities and distributes traffic evenly across the aggregated links.
  • Single MAC address Because ports aggregated with HP APA share single, logical MAC address, there is no need to assign individual addresses to aggregated ports.
  • Flexibility Ports can be aggregated to achieve higher performance whenever network congestion occurs.

Interconnect Switch

The basic requirement of interconnect is to provide reliable communication between nodes, but this cannot be achieved by a crossover cable between the nodes. However, using a cross over cable as interconnect may be appropriate for development or demonstration purposes. Substituting a normal crossover cable is not officially supported in production RAC implementations for the following reasons:

  • Crossover cables do not provide complete electrical insulation between nodes. Failure of one node because of a short circuit or because of an electrical interference will bring down the surviving node.

  • Using crossover cables instead of a high-speed switch greatly limits the scalability of the clusters as only two nodes can be clustered using a crossover cable.

  • Failure of one node brings down the entire cluster as the cluster manager can not exactly detect the failed/surviving node. Had there been a switch during split-brain resolution, the surviving node can easily detect the heartbeat and take the owner ship of the quorum device and node failures can be easily detected.

  • Crossover cables do not detect split-brain situations as effectively as communication interface through switches. Split-brain resolution is the effective part in cluster management during communication failures.

The list of various interconnects used by the implementations based on the clusterware used and network hardware.

  1. Gigabit Ethernet
  2. Hyper Fabric
  3. Memory Channel
  4. SCI Interconnect
  5. Firelink interconnect

The following must be true for each private IP address:

  • It must be separate from the public network
  • It must be accessible on the same network interface on each node
  • It must have a unique address on each node

The private interconnect is used for internodes communication by both Clusterware and RAC. The private IP address must be available in each node's /etc/hosts file.

During Clusterware installation, the information you enter as the private IP address determines which private interconnects are used by Clusterware for its own communication. They must all be available, and capable of responding to a ping command.

Oracle recommends that you use a logical Internet Protocol (IP) address that is available across all private networks, and that you take advantage of any available operating system-based failover mechanism by configuring it according to your third-party vendor's instructions for using their product to support failover.