Log directory structure in Oracle RDBMS
In a 10g RAC installation, a node consists of a CRS home and an Oracle RDBMS home. The RDBMS home is where you install the Oracle software with the RAC option. You create databases from this home. The log files are related to each of the database instances running out of this Oracle home. The most important log file that every database generates is the alert.log file that is created in the directory specified by the init.ora parameter background_dump_dest. For most sites, this directory is kept under $ORACLE_BASE/admin.
Suppose, for example, that you have a database named test. All its background process trace files, along with the alert.log, would be available in the $ORACLE_BASE/admin/test/bdump directory; that's the norm. Its not a hard and fast rule that you use this directory structure, but it does help to follow conventions.
These directories are also important for the RDBMS home:
- $ORACLE_BASE/admin/udump Contains any trace file generated by a user process.
- $ORACLE_BASE/admin/cdump Contains core files that are generated due to a core dump in a user process.
Let's take a closer look at the startup of an instance in a two-node RAC cluster running version 10.1.0.4. We use SQL *Plus to issue startup on node1 and then on node2. The sequence of events follows with additional explanations added as and when required:
SQL> startup nomount; Cluster communication is configured to use the following interface(s) for this instance 192.168.0.1 Mon Aug 29 07:25:09 2005 cluster interconnect IPC version:Oracle UDP/IP IPC Vendor 1 proto 2 Version 1.0 PMON started with pid=2, OS id=31242
The correct interconnect must be used for the Cache Fusion traffic. Some may choose the public network for the interconnect traffic, but doing so will bring the database to its knees. To identify the network used for Cache Fusion or private traffic, you can do any of the following:
$ oifcfg getif SQL> select INST_ID,PUB_KSXPIA,PICKED_KSXPIA,NAME_KSXPIA,IP_KSXPIA from x$ksxpia; INST_ID P PICK NAME_KSXPIA IP_KSXPIA ---------- - ---- --------------- ---------------- 1 OCR prod1 192.168.0.1
Depending on the source of the information, the PICKED_KSXPIA is populated with the values OSD, OCR, and CI. If the cluster_interconnects parameter is set in the SPFILE, the query will return the following output:
SQL> select INST_ID,PUB_KSXPIA,PICKED_KSXPIA,NAME_KSXPIA,IP_KSXPIA from x$ksxpia; INST_ID P PICK NAME_KSXPIA IP_KSXPIA ---------- - ---- --------------- ---------------- 1 CI prod1 192.168.0.1
Another option is the conventional method of finding the interconnect information from an Interprocess Communications (IPC) dump. This was the only method available in Oracle versions prior to 10g, but this method is also possible in Oracle 10g. Starting from 10g R2, X$KSXPIA is exposed as GV$CONFIGURED_INTERCONNECTS.
Log in to the database as user SYS:
SQL> oradebug setmypid SQL> oradebug ipc
This will dump a trace file to user_dump_dest. The output will look something like this:
SSKGXPT 0x1a2932c flags SSKGXPT_READPENDING info for network 0 socket no 10 IP 192.168.0.1 UDP 43749 sflags SSKGXPT_WRITESSKGXPT_UP info for network 1 socket no 0 IP 0.0.0.0 UDP 0...
You can see that we are using IP 192.168.0.1 with a User Datagram Protocol (UDP).
To change the network that RAC uses, you can change the order of the network(s) in the operating system–dependent network configurations, such as /etc/hosts or by using the cluster_interconnects parameter.
A little caution is required while using the cluster_ interconnects parameter. With Oracle 10g, interconnect information is stored in OCR, so you don't need to specify the interconnect using the cluster_interconnects parameter. Although this parameter supports load balancing among the specified interfaces, it does not provide failover capabilities, and in case one of several interfaces goes down, Oracle will treat it as a complete failure and start evicting instances. You can still use cluster_ interconnects with one IP address—for example, to work around a bug in the network selection or to use a specific NIC for a particular instance for test purposes.
The moment the first instance is started, its alert log gets populated with important information. The first thing to notice in the preceding example after the list of non-default parameters is the IP address used for the interconnect. On the first instance, mentioning the cluster_interconnects parameter is necessary so that the correct interface is picked up—due to a bug that doesn't pick up the correct interface for the interconnect. Also shown is the protocol used—in our case, UDP. The value shown would depend on your platform-specific usage—RSM could be used on Sun, HyperFabric on HP, and so on. Ensure that you see the correct protocol that you have configured for the interconnect usage. The supported interconnect protocols with respect to various operating systems are listed in Chapter 3.
Mon Aug 29 07:25:11 2005 lmon registered with NM - instance id 1 (internal mem no 0) Mon Aug 29 07:25:11 2005 Reconfiguration started (old inc 0, new inc 1) List of nodes: 0 Global Resource Directory frozen Update rdomain variables Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Mon Aug 29 07:25:11 2005 LMS 1: 0 GCS shadows cancelled, 0 closed Mon Aug 29 07:25:11 2005 LMS 0: 0 GCS shadows cancelled, 0 closed Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Post SMON to start 1st pass IR Mon Aug 29 07:25:11 2005 LMS 0: 0 GCS shadows traversed, 0 replayed Mon Aug 29 07:25:11 2005 LMS 1: 0 GCS shadows traversed, 0 replayed Mon Aug 29 07:25:11 2005 Submitted all GCS remote-cache requests Post SMON to start 1st pass IR Fix write in gcs resources Reconfiguration complete LCK0 started with pid=17, OS id=31272
When an instance starts up, it's the Lock Monitor's (LMON) job to register with the Node Monitor (NM). That's what we see in the alert.log with the instance ID that is getting registered. When any node joins or leaves a cluster, the global resource directory undergoes a reconfiguration event. We see the start of the reconfiguration event along with the old and new incarnation. Next, we see the number of nodes that have joined the cluster. As this was the first node to be started up, in the list of nodes we see only one node listed, and the number starts with 0. A reconfiguration event is a seven-step procedure and upon completion the "reconfiguration complete" message is logged into the alert.log.
The messages logged in the alert.log are summaries of the reconfiguration event. The LMON trace file would have more information about the reconfiguration. Following are the contents of the LMON trace file:
*** 2005-08-29 07:25:11.235 kjxgmrcfg: Reconfiguration started, reason 1 kjxgmcs: Setting state to 0 0.
Here, you can see the reason for the reconfiguration event. The most common reasons would be 1, 2, or 3. Reason 1 means that the NM initiated the reconfiguration event, as typically seen when a node joins or leaves a cluster. A reconfiguration event is initiated with reason 2 when an instance death is detected. How is an instance death detected? Every instance updates the control file with a heartbeat through its Checkpoint (CKPT) process. If heartbeat information is not present for x amount of time, the instance is considered to be dead and the Instance Membership Recovery (IMR) process initiates reconfiguration. This type of reconfiguration is commonly seen when significant time changes occur across nodes, the node is starved for CPU or I/O times, or some problems occur with the shared storage.
A reason 3 reconfiguration event is due to a communication failure. Communication channels are established between the Oracle processes across the nodes. This communication occurs over the interconnect. Every message sender expects an acknowledgment message from the receiver. If a message is not received for a timeout period, then a "communication failure" is assumed. This is more relevant for UDP, as Reliable Shared Memory (RSM), Reliable DataGram protocol (RDG), and Hyper Messaging Protocol (HMP) do not need it, since the acknowledgment mechanisms are built into the cluster communication and protocol itself.
When the block is sent from one instance to another using wire, especially when unreliable protocols such as UDP are used, it is best to get an acknowledgment message from the receiver. The acknowledgment is a simple side channel message that is normally required for most of the UNIX systems where UDP is used as the default IPC protocol. When user-mode IPC protocols such as RDG (on HP Tru64 UNIX TruCluster) or HP HMP are used, the additional messaging can be disabled by setting _reliable_block_sends=TRUE. For Windows-based systems, it is always recommended to leave the default value as is.
Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE). Completed: alter database mount
Since this is an RAC database, every instance mounts the database in shared mode. Sometimes you want to mount the database in exclusive mode, as in completing the actions of a patch set application. Checking the alert log is one way to confirm that:
Mon Aug 29 15:36:53 2005 alter database open This instance was first to open Picked Lamport scheme to generate SCNs
You may see the message depending on the version of the RAC and setting of the parameter max_commit_propagation_delay. In version 9i, every commit System Commit Numbers (SCN) is broadcasted to all the nodes, and the log writer is held up until all the interested redos are written to the disk. Starting with 10g, the wait is greatly reduced as the broadcast and commit are asynchronous. This means the system waits until it is sure that all nodes have seen the commit SCN. Any message with an SCN greater than commit SCN is deemed sufficient.
Before doing a broadcast, the process checks whether it has already received a higher SCN from that instance. It used the same SCN to determine whether a foreground or an LMS has to be posted. With 10g, this is decoupled: an SCN to release foregrounds and an SCN needed for shipping buffers. The init.ora parameter _lgwr_async_broadcasts = true can be used to change the broadcast method.
Use the following table of contents to navigate to chapter excerpts or click here to view RAC Troubleshooting in its entirety.
Oracle Database 10g: Real Application Clusters Handbook
Home: Oracle RAC troubleshooting: Introduction
1: Oracle RAC: Log directory structure in cluster ready services
2:Oracle RAC: Log directory structure in Oracle RDBMS
3: Oracle RAC and the Lamport algorithm
4: Oracle RAC: ON and OFF
5: Oracle RAC: Database performance issues
6: Oracle RAC: Debugging node eviction issues
7: Oracle RAC: Member voting
8: Oracle RAC: Cluster reconfiguration steps
9: Oracle RAC: Debugging CRS and GSD using DTRACING
|About the book:|
|Oracle Database 10g: Real Applications Clusters Handbook Learn to implement Oracle real application clusters from the ground up. Maximize database availability, scalability, and efficiency. Find RAC concepts, administration, tuning, and troubleshooting information. You'll learn how to prepare and create Oracle RAC databases and servers, and automate administrative tasks. You'll also get full coverage of cutting-edge Oracle RAC diagnostic tools, backup and recovery procedures, performance tweaks and custom application design strategies. Buy this book atMcGraw-Hill/Osborne|
|About the author:|
|K Gopalakrishnan is a senior principal consultant with the Advanced Technology Services group at Oracle Corporation, specializing exclusively in performance tuning, high availability, and disaster recovery. He is a recognized expert in Oracle RAC and Database Internals and has used his extensive expertise in solving many vexing performance issues all across the world for telecom giants, banks, financial institutions, and universities.|