Monday, December 17, 2012

So, this might be old news to most of anyone that reads this, but, despite my sage experience, I got caught out by this today.

First things first:

11.2.0.2.6 on AIX with 4 node RAC cluster.

I have a scheduler job that runs once an hour to gather statistics in the database.  The jobs calls a home grown procedure which in turn calls DBMS_STATS.GATHER_TABLE_STATS.  When this job runs, it uses a combination of logic and TABLE_PREFS to control how statistics are gathered.  Ultimately, it sets and passes the various values to DBMS_STATS.GATHER_TABLE_STATS.  In this particular instance, I was changing the way the job sets METHOD_OPT from "FOR ALL COLUMNS SIZE " to instead just use the stored TABLE_PREFS.  Before my change, my procedure passed in METHOD_OPT.  After the change, METHOD_OPT was no longer passed relying on the TABLE_PREF setting for each individual table.

The problem:  After I changed my procedure, the job still ran with the old procedure.  Hence, statistics were gathered the old way and not using the TABLE_PREFS.  I knew this because my procedure logs the code it executes for each individual table.

The question: Why did my job not execute my new code?  Does it not just call the underlying procedure? (OWNER.PACKAGE.PROCEDURE).

The solution:  When you create and enable a job, Oracle must "store" the procedure code internally.  This is the only thing I can think of.  So, to get your new procedure to run, you must simply DISABLE and ENABLE the scheduler job.

After doing this, my job executed the way I expected.

Let me know if you have a different experience with this.

Thursday, August 23, 2012

DBMS_AUDIT_MGMT Audit Purge Gotcha!

Auditing is an important part of database security.  Along with that is archiving/cleaning up auditing entries.  Oracle 11g enhanced our ability to automatically clean up audit entries by allowing us to set threshold and have the Oracle Scheduler clean them out with a purge job using the supplied DBMS_AUDIT_MGMT package.

I discovered this week that  there is something you need to be aware of when setting up automatic audit entry purging in an Oracle 11gR2 database.  The whole key to this post is that if your database has been cloned or your DBID has changed, the automatic purge job will not clean up all your audit entries.  You see, the automatic purge job uses the CURRENT DBID to select and remove audit entries based on your date threshold.  Therefore, if you have left over audit entries from when your database had a different DBID, they are skipped right over.  This baffled me for about 2 days off and on.  With the help of a 10046 Oracle trace, I figured out what was up.

Here's the situation:

First things first:

Oracle 11.2.0.2.5 on AIX 6.3
3 node RAC Cluster

Audit Setup - I may have omitted some steps that are not relevant to this example

SELECT table_name, tablespace_name
FROM dba_tables
WHERE table_name IN ('AUD$', 'FGA_LOG$')
ORDER BY table_name;

TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------
AUD$                           SYSTEM
FGA_LOG$                       SYSTEM

Storage setup

Create new tablespace:

May need to specify datafile and sizing info. I am using ASM with OMF so that is not needed.

CREATE TABLESPACE ORACLE_AUDIT_DATA /* */;

BEGIN
 DBMS_AUDIT_MGMT.set_audit_trail_location(
  audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
  audit_trail_location_value => 'ORACLE_AUDIT_DATA');
END;
/

BEGIN
 DBMS_AUDIT_MGMT.set_audit_trail_location(
  audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_FGA_STD,
  audit_trail_location_value => 'ORACLE_AUDIT_DATA');
END;
/


Verify New Storage:

SELECT table_name, tablespace_name
FROM dba_tables
WHERE table_name IN ('AUD$', 'FGA_LOG$')
ORDER BY table_name;

TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------
AUD$                           ORACLE_AUDIT_DATA
FGA_LOG$                       ORACLE_AUDIT_DATA


 Auditing Property Configuration

Create a new audit file each day for OS files:

BEGIN
    DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_PROPERTY(
        audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,
        audit_trail_property => DBMS_AUDIT_MGMT.OS_FILE_MAX_AGE,
        audit_trail_property_value => 1);
END;
/


Delete DB audit records in batches of 10000:

BEGIN
    DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_PROPERTY(
               audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
               audit_trail_property => DBMS_AUDIT_MGMT.DB_DELETE_BATCH_SIZE,
               audit_trail_property_value => 10000 /* delete batch size */);

    DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_PROPERTY(
               audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_FGA,
               audit_trail_property => DBMS_AUDIT_MGMT.DB_DELETE_BATCH_SIZE,
               audit_trail_property_value => 10000 /* delete batch size */);

END;
/


Check Audit config settings:

col parameter_name for a30
col parameter value for a20

SELECT *
FROM dba_audit_mgmt_config_params;

PARAMETER_NAME                 PARAMETER_VALUE      AUDIT_TRAIL
------------------------------ -------------------- --------------------
DB AUDIT TABLESPACE            ORACLE_AUDIT_DATA    STANDARD AUDIT TRAIL
DB AUDIT TABLESPACE            ORACLE_AUDIT_DATA    FGA AUDIT TRAIL
AUDIT FILE MAX SIZE            10000                OS AUDIT TRAIL
AUDIT FILE MAX SIZE            10000                XML AUDIT TRAIL
AUDIT FILE MAX AGE             1                    OS AUDIT TRAIL
AUDIT FILE MAX AGE             5                    XML AUDIT TRAIL
DB AUDIT CLEAN BATCH SIZE      10000                STANDARD AUDIT TRAIL
DB AUDIT CLEAN BATCH SIZE      10000                FGA AUDIT TRAIL
OS FILE CLEAN BATCH SIZE       1000                 OS AUDIT TRAIL
OS FILE CLEAN BATCH SIZE       1000                 XML AUDIT TRAIL


Set up Audit Clean up

Setup cleanup job in your schema

-- Grant required privileges

grant execute on DBMS_AUDIT_MGMT to ;
grant execute on dbms_system to ;


Initialize cleanup to run every 24 hours by default:

BEGIN

    DBMS_AUDIT_MGMT.init_cleanup(
    audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_ALL,
    default_cleanup_interval => 24 /* hours */);

END;
/


Create procedure:

CREATE OR REPLACE PROCEDURE .purge_db_audit_trails (
  keepmonthsback IN NUMBER DEFAULT 4)
AS

  curdate date;
  last_archtime date;
 
  /* This procedure will be called by. JOB_PURGE_DB_AUDIT_TRAILS job
     to manage the audit trail cleanup
     PRE-REQUISITE: requires object grants as SYS
     grant execute on DBMS_AUDIT_MGMT to ;
     grant execute on dbms_system to ;
  */

BEGIN

  curdate := SYSTIMESTAMP;
  last_archtime := add_months(curdate, -keepmonthsback);
 
    -- set last archive timestamp for both AUD$ and FGA$
    SYS.DBMS_AUDIT_MGMT.set_last_archive_timestamp (
        audit_trail_type => SYS.DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
        last_archive_time => last_archtime);

--    SYS.DBMS_AUDIT_MGMT.set_last_archive_timestamp (
--        audit_trail_type => SYS.DBMS_AUDIT_MGMT.AUDIT_TRAIL_FGA_STD,
--        last_archive_time => last_archtime);

       SYS.DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(audit_trail_type  => SYS.DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,
                                             last_archive_time => last_archtime, --this says no audit records beyond date
                         rac_instance_number => 1);

       SYS.DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(audit_trail_type  => SYS.DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,
                                             last_archive_time => last_archtime, --this says no audit records beyond date
                         rac_instance_number => 2);

       SYS.DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(audit_trail_type  => SYS.DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,
                                             last_archive_time => last_archtime, --this says no audit records beyond date
                         rac_instance_number => 3);

--   SYS.DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(audit_trail_type  => SYS.DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,
--                                           last_archive_time => last_archtime, --this says no audit records beyond date
--                         rac_instance_number => 4);


    -- write activity to alert log
      SYS.dbms_system.ksdwrt(2,'AUDIT: Purging Audit Trail until ' || last_archtime || ' beginning');

    -- perform purge up to last time stamp for both AUD$ and FGA$ and OS Files

    SYS.DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL (
        audit_trail_type => SYS.DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
        use_last_arch_timestamp => TRUE);
       
--    SYS.DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL (
--        audit_trail_type => SYS.DBMS_AUDIT_MGMT.AUDIT_TRAIL_FGA_STD,
--        use_last_arch_timestamp => TRUE);

    SYS.DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL (
        audit_trail_type => SYS.DBMS_AUDIT_MGMT.AUDIT_TRAIL_OS,
        use_last_arch_timestamp => TRUE);
       
    -- write activity to alert log
      SYS.dbms_system.ksdwrt(2,'AUDIT: Purging Audit Trail until ' || last_archtime || ' completed');

END;
/

Create scheduler job:

BEGIN
  SYS.DBMS_SCHEDULER.CREATE_JOB (
        job_name => '.JOB_PURGE_AUDIT_TRAILS',
        schedule_name => 'SYS.MAINTENANCE_WINDOW_GROUP',
        job_class => 'DEFAULT_JOB_CLASS',
        job_type => 'PLSQL_BLOCK',
        job_action => 'BEGIN .purge_db_audit_trails(); END;',
        comments => 'Audit clean job for AUD$ and FGA$ and OS'
    );
   
    SYS.DBMS_SCHEDULER.ENABLE(name => '.JOB_PURGE_AUDIT_TRAILS');
   
END;
/


Testing

Okay, now, this job is schedule to run every 24 hours when the normal database maintance job window group opens which is by default at 10:00 PM.

Of course, I wanted to make sure that this was going to work, so I ran the job manually to see if it purged the data I have in my AUD$ table that went back 2.5 years.  This job as seen above was scheduled to purge anything older than 4 months.

Before running the job:

SQL> select min(ntimestamp#), max(ntimestamp#)
  2  from aud$;

MIN(NTIMESTAMP#)             MAX(NTIMESTAMP#)
---------------------------- ----------------------------
09-DEC-09 09.02.00.903281 PM 23-AUG-12 08.00.12.321571 PM


Run my proc manually:

SQL> exec purge_db_audit_trails

PL/SQL procedure successfully completed.

Check results:

SQL> select min(ntimestamp#), max(ntimestamp#)
  2  from aud$;

MIN(NTIMESTAMP#)             MAX(NTIMESTAMP#)
---------------------------- ----------------------------
09-DEC-09 09.02.00.903281 PM 23-AUG-12 08.00.12.321571 PM



#@$&?!?!

On a side note, the job did delete the OS audit files from the audit_file_dest on all three of my RAC servers.  It just did not clear out SYS.AUD$ back to 23-APR-2012.

So, next thing is to set a trace:

SQL> alter session set events '10046 trace name context forever, level 12';

Session altered.

SQL> exec purge_db_audit_trails

PL/SQL procedure successfully completed.

SQL> alter session set events '10046 trace name context off';

Session altered.

SQL>

Upon reviewing the trace file, you will see the job working its way through all the settings for audit trail management.  Some you can run yourself to follow along, some you can't unless you trapped the bind variables.

The SQL that finally caught my attention was this:

SQL ID: 1rhcua24xym50 Plan Hash: 4226404550

DELETE FROM SYS.AUD$
WHERE
 DBID = 671363882  AND NTIMESTAMP# < to_timestamp('2012-04-23 14:02:25',
  'YYYY-MM-DD HH24:MI:SS.FF')  AND ROWNUM <= 10000


Of course, I immediately took to SQL*Plus to try and run this as SYS:

0 rows deleted.

So, knowing I had data much older than this date in the AUD$ table, the only other thing in question as far as the where clause goes in the DBID.  Lets see how many DBIDs there are and check the record counts in relation to DBID:

Note, I am doing this after I figured this out and removed the row, hence, the flasback query...

SQL> select dbid, count(*), min(ntimestamp#), max(ntimestamp#)
  2  from aud$ as of timestamp sysdate - 1/8
  3  group by dbid;

      DBID   COUNT(*) MIN(NTIMESTAMP#)             MAX(NTIMESTAMP#)
---------- ---------- ---------------------------- ----------------------------
 665625084       9675 09-NOV-10 08.54.20.879710 PM 05-DEC-10 01.47.39.932012 AM
 671363882      97230 23-APR-12 09.02.00.903281 PM 23-AUG-12 05.19.32.653304 PM
1464858415    1946442 05-DEC-10 05.17.23.070532 PM 28-MAR-12 08.09.44.957081 AM
4034646867         12 19-DEC-09 06.46.10.367106 AM 19-DEC-09 07.58.05.707815 AM

Well son of a...there's the problem.

Oracle by default uses the DBID in the DELETE statement, therefore, any entries made prior to a clone or DBID change are not collected when the automated purge job runs.

This is a perfect example on how new is not always better (well, at least until you understand how it works).  Prior to 11g, we would have manually deleted from the SYS.AUD$ table and probably would not have even considered using the DBID in our where clause.  I know I never did.  Since I tried to get all fancy with the new feature without fully understanding how it worked, I lost a few hours of work time to head scratching...especially since I found no other document on Metalink or on the Interweb that explained this...FIRSTIES!

Wednesday, June 13, 2012

Tuesday, April 17, 2012

Package Compile Hanging, Stuck, etc...

Note: Some names have been changed in this demo to protect client identity.

Today’s Issue:

Trying to compile a PACKAGE BODY and the session was hanging.

It was hanging on the following wait:

SELECT SID, EVENT, P1TEXT, P1, P2TEXT, P2
FROM V$SESSION
WHERE sid = 485;

SID EVENT P1TEXT P1 P2TEXT P2
----- ------------------------------ ------- -- ------- ---
485 read by other session file# 1 block# 828

Normally, I would think that there is some sort of user lock in the database that is preventing my compile from happening. However, in this case, the lock was on an internal object and it was a read by other session (buffer busy wait).

This started to stink of corruption.

There was no problem compiling the package spec. Also, this very same package was modified and compiled in an identical-to-the-block DEV database. This problem was occurring in my TEST copy of the database.

I saw no other sessions actively accessing this object.

I decided to try compiling the object as SYS to see if that made any difference:

alter package . compile body;
*
ERROR at line 1:
ORA-00603: ORACLE server session terminated by fatal error
ORA-00600: internal error code, arguments: [ktcdso-1], [], [], [], [], [], [], [], [], [], [], []
Process ID: 12487
Session ID: 251 Serial number: 10663

Well that did not work.

Since it was a block in the buffer cache, I thought, let’s try flushing the buffer cache…this is test after all:

alter system flush buffer_cache;

System altered.

Still, no luck, the compile hung.

I used the ORA-600 lookup tool on Metalink to see what the ORA-00600 was all about. Not really much was returned, there were three issues, none of which really applied to me. There was one that was a little intriguing that had to do with corruption.

I figured I would do a Database Verify against the system tablespace datafile:

$ dbv blocksize=8192 file=/db/oradata/prod/system01.dbf
logfile=system_dbv.out

DBVERIFY: Release 11.2.0.2.0 - Production on Tue Apr 17 11:23:34 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

$ cat system_dbv.out

DBVERIFY: Release 11.2.0.2.0 - Production on Tue Apr 17 11:23:34 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

DBVERIFY - Verification starting : FILE = /db/oradata/prod/system01.dbf

DBVERIFY - Verification complete

Total Pages Examined : 185344
Total Pages Processed (Data) : 107060
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 35567
Total Pages Failing (Index): 0
Total Pages Processed (Other): 4087
Total Pages Processed (Seg) : 1
Total Pages Failing (Seg) : 0
Total Pages Empty : 38630
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Total Pages Encrypted : 0
Highest block SCN : 3061770519 (16.3061770519)

No obvious corruption.

Next, I decided to use the information I gleaned from V$SESSION_WAIT. Particularly the file# and block# to look up what object my compile was waiting on:

SELECT relative_fno, owner, segment_name, segment_type FROM dba_extents
WHERE file_id = 1
AND 828 BETWEEN block_id AND block_id + blocks - 1;

RELATIVE_FNO OWNER SEGMENT_NAME SEGMENT_TYPE
------------ --------------- ------------- ------------
1 SYS I_PROCEDURE1 INDEX

So, we have an internal index that was not allowing my compile to run. Still no reason why as I said it seems no one else was accessing this procedure. Maybe there was a bit of logical corruption in the index?

Let’s rebuild it online…

Disclaimer: There are all sorts of rules about modifying internal objects. I do not endorse or condone what I am about to do without knowing more about YOUR system. Mine happened to be a TEST system that could easily be refreshed from production. Plus, I figured it was rebuild...not actually changing any structures or anything...

As SYS:

alter index i_procedure1 rebuild online;

Index altered.

Guess what? That fixed the issue. My procedure immediately compiled after rebuilding the index.

Tuesday, January 10, 2012

ORA-19607 XXXXX is an active control file

Getting an error running new RMAN backups against a standby database:

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_2 channel at 01/08/2012 09:00:14
ORA-19607: /u20d/oradata/standby/stbyitst/standby01.ctl is an active control file


Turns out, there is a Metalink article to solve this issue: 444269.1

They don't call it a bug, but, somehow, the standby controlfile gets registered as a controlfile copy in the RMAN Metadata. The solution is to "UNCATALOG" this controlfile copy:

RMAN> list copy of controlfile;

using target database control file instead of recovery catalog
List of Control File Copies
===========================

Key S Completion Time Ckp SCN Ckp Time
------- - --------------- ---------- ---------------
6 X 18-AUG-10 2315490331 18-AUG-10
Name: /u20d/oradata/standby/stbyitst/standby01.ctl
Tag: TAG20100818T094510


RMAN> report obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
Report of obsolete backups and copies
Type Key Completion Time Filename/Handle
-------------------- ------ ------------------ --------------------
Control File Copy 6 18-AUG-10 /u20d/oradata/standby/stbyitst/standby01.ctl

RMAN> change copy of controlfile uncatalog;

uncataloged control file copy
control file copy file name=/u20d/oradata/standby/stbyitst/standby01.ctl RECID=6 STAMP=727350310
Uncataloged 1 objects


RMAN> report obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
no obsolete backups found

RMAN> delete obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=135 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=133 device type=DISK
no obsolete backups found


After following the above procedure, no more problem with backups!