Oracle Maximum Availability and Best Practices: Oracle Recommended RAC Architecture

The major physical components of Oracle RAC Architecture are:

Physical Hosts (nodes)
Shared storage system
High speed clusters interconnect
Cluster software
Oracle RDBMS

Physical Hosts

Physical hosts or the nodes are one of the major components where the Oracle instance resides. This is the place where data-actual database processing takes place. RAC database system provides scalability and High availability. In order to provide the high availability and maintain load balancing it is recommended to extend the number of nodes.

Most of the cluster frameworks now days differs in terms of the maximum number of nodes they can handle, most can support minimum 4 nodes in cluster and some can support hundreds.

The nodes themselves may or may not need to be scalable to provide additional capacity in terms of CPU or memory. Unless one uses an expensive SMP server, scalability will be an issue. The ability to scale both within a machine as well as across machines is often desirable.

Oracle RAC can be implemented on a wide range of servers from a clustered group of single CPU Windows boxes to a cluster of 32-CPU SUN E10000 boxes.

Linux machines are now able scale up to 8 CPUs, but the majority of systems are 2 or 4 CPU nodes. SMP scalability in Linux beyond 4 CPU’s is not will proven, so the current Oracle recommendation is to stick with 4-CPU machines..

At the same time, Oracle RAC can also run on platforms that allow sub-setting of CPU’s, such as the SUN E10000, E15000, and the HP Superdome. In the case of CPU sub-setting, the single server is divided into multiple nodes, each running an instance of Oracle 10g/9i RAC.

Shared Storage System

Shared storage is a critical component of an Oracle RAC environment. Traditionally, storage was attached to each individual server known as DAS. Today, more flexible storage that is accessible over a storage area networks (SAN) or regular Ethernet networks (NAS) is popular. These new storage options enable multiple servers to access the same set of disks through a network (FC-Switches or Ethernet), simplifying provisioning of storage in any distributed environment. SANs represent the evolution of data storage technology to this point.

In shared storage, database files should be equally accessible to all the nodes concurrently. Generic file systems do not allow disks to be mounted in more than one system. Generic UNIX file systems (UFS) do not allow the files to be shared among the nodes because of the obvious file locking (inode locks) issues and unavailability of a coherent file system cache. One option is to use network file system (NFS), but it is unsuitable as it relies on a single host (which mounts the file systems) and for performance reasons. Since the disks in such implementation are attached to one node, all the write requests must go that particular node, thus limiting the scalability and fault tolerance.

The choice of file system is critical for RAC deployment. Traditional file systems do not support simultaneous mounting by more than one system. Therefore, you must store files in either raw volumes with out any file system or on a file system that supports concurrent access by multiple systems.

Thus as of Oracle 10g RAC three major approaches exist for providing the shared storage needed by RAC:

Raw volumes: These raw devices require storage that operate in block mode such as Fiber Channel SANs or Internet SCSI (iSCSI).

Cluster file system

High Speed Cluster Interconnects

Configuring private network as recommended by Oracle

The cluster interconnect is a high bandwidth (preferably 1gigabit or more), low latency communication facility that connects each node to other nodes in the cluster and routes messages among the nodes.

In general, the clusters interconnect is used for the following high-level functions:

Monitoring health, Status, and Synchronous messages
Transporting Distributed lock manager (DML) messages
Accessing remote file systems
Moving application-specific traffic
Providing cluster alias routing

It is a communication path used by the cluster for the synchronization of resources and is also used in some cases for the transfer of data (cache fusion) from one instance to another instance. Typically, interconnect is a network connection is dedicated to the server nodes of a cluster (and thus sometimes refer to as a private interconnect) and has a high bandwidth and low latency.

It is important not to use the private network for the regular (public) user traffic, keep user traffic away from the private network otherwise cache fusion and other inter-instance activity will become backlogged, reducing the overall effectiveness of the cluster.

A redundant private network, that can replace the network carrying the cache fusion and other messages, is recommended to help avoid down time in the case the primary interconnect fails.

At network level, a failure in a NIC can cause an outage to the cluster, especially if the failures occurs at the interface on which the interconnect is configured. To achieve high availability at this layer, network teaming/bonding can be used.

Bonding offers the following benefits:

Bandwidth scalability Adding a network card doubles the network bandwidth. It can be used to improve aggregate throughput.
High availability Provides redundancy or link aggregation of computer ports.
Load balancing HP Auto Port Aggregation (APA) supports true load balancing and failure recovery capabilities and distributes traffic evenly across the aggregated links.
Single MAC address Because ports aggregated with HP APA share single, logical MAC address, there is no need to assign individual addresses to aggregated ports.
Flexibility Ports can be aggregated to achieve higher performance whenever network congestion occurs.

Interconnect Switch

The basic requirement of interconnect is to provide reliable communication between nodes, but this cannot be achieved by a crossover cable between the nodes. However, using a cross over cable as interconnect may be appropriate for development or demonstration purposes. Substituting a normal crossover cable is not officially supported in production RAC implementations for the following reasons:

Crossover cables do not provide complete electrical insulation between nodes. Failure of one node because of a short circuit or because of an electrical interference will bring down the surviving node.

Using crossover cables instead of a high-speed switch greatly limits the scalability of the clusters as only two nodes can be clustered using a crossover cable.

Failure of one node brings down the entire cluster as the cluster manager can not exactly detect the failed/surviving node. Had there been a switch during split-brain resolution, the surviving node can easily detect the heartbeat and take the owner ship of the quorum device and node failures can be easily detected.

Crossover cables do not detect split-brain situations as effectively as communication interface through switches. Split-brain resolution is the effective part in cluster management during communication failures.

The list of various interconnects used by the implementations based on the clusterware used and network hardware.

Gigabit Ethernet
Hyper Fabric
Memory Channel
SCI Interconnect
Firelink interconnect

The following must be true for each private IP address:

It must be separate from the public network
It must be accessible on the same network interface on each node
It must have a unique address on each node

The private interconnect is used for internodes communication by both Clusterware and RAC. The private IP address must be available in each node's /etc/hosts file.

During Clusterware installation, the information you enter as the private IP address determines which private interconnects are used by Clusterware for its own communication. They must all be available, and capable of responding to a ping command.

Oracle recommends that you use a logical Internet Protocol (IP) address that is available across all private networks, and that you take advantage of any available operating system-based failover mechanism by configuring it according to your third-party vendor's instructions for using their product to support failover.

1 comment:

Unknown said...: Thank you Muhammed for this valuable information,acctually I am new in Oracle RAC world and I should migrate my compny's database to cluster soon.

I will be greatful if you can provide me with suitable hardware types specially from hp for small tow nodes RAC configureation.; December 9, 2008 at 3:05 PM

Oracle Maximum Availability and Best Practices

Tuesday, January 15, 2008

Oracle Recommended RAC Architecture

1 comment:

Me and Mike Ault

About Me

Blog Archive