An Overview of Oracle Change Data Capture – Evolution and Methods
If you are looking for a technology that facilitates real-time data integration across enterprises, improves the performance and availability of databases, and reduces data warehousing times, Oracle CDC is the right solution for you.
When this technology is deployed with the necessary advanced tools and methods, a wide range of replication activities can be carried out with Oracle Change Data Capture without facing any lag in performance. These tasks include transferring analytics and queries to data warehouses from databases in production. You can also migrate databases to the cloud without stopping the other activities in the system as well as extracting incremental data collected from multiple sources and transferring it to a data warehouse.
Now, what is this Change Data Capture (CDC)? CDC is software design patterns used to monitor and track any changes or modifications to input data so that necessary actions can be taken. It is also a component of data integration based on data identification, data delivery, and data capture of all changes made in enterprise databases.
One of the critical functions of Oracle CDC is capturing and preserving the state of the data. It mainly happens in a data warehouse ecosystem and is used in any data repository system. It is possible for developers to initiate Oracle CDC functionalities in several ways through physical storage, application logic, or any combination of the two in system layers.
The development of the Oracle CDC Technology
The concept of Change Data Capture as reflected in Oracle via a built-in function was first introduced in Oracle 9i and became very useful to track and record changes made to user tables in a database. All changes were stored for use in ETL applications in specific change tables. The stored data could later be processed for migrating into other databases and data warehouses. The mechanism worked through creating triggers in source tables. This process was rather complex and tedious and Database Administrators were not too keen to use this form of Oracle CDC.
This was set right by Oracle when it released the Oracle 10g version. A less-intrusive form was launched that leveraged redo logs of the source database. 10g also introduced Oracle Streams which was a built-in replication tool. It greatly facilitated detection and transferring change data to a target data repository without in any way impacting the functioning of the database at the source. However, Oracle Streams was discontinued after Oracle 12c was released and no longer supported Oracle CDC. Users had to pay for Oracle GoldenGate for CDC or use alternate Oracle replication and CDC solutions.
Oracle CDC in its primary form can be defined to be the process where changes to the database at the source are made in real-time to another target database. Both may be different or the same. In the latter case too, Oracle optimizes the recording of the changes and it is possible to have several CDC solutions in the same system.
Oracle CDC identifies all changes made to the data at source for other applications with the Oracle Data Integrator which uses two journalizing modes. The first is the Simple Journalizing mode that monitors all changes made to stand-alone data stored in a system. The other is the Consistent Set Journalizing that tracks changes to a set of data stores. This is done by considering the referential integrity between each data store. The process of setting up Oracle CDC on the Oracle Data Integrator is automated and not a complicated one.
Capturing and Publishing CDC
There are two ways to track changed data with CDC.
The first is the Synchronous Mode where triggers are set at the source database to ensure that changed data is captured immediately. This is done through SQL statements that perform a DML (Data Manipulation Language) activity which is Insert, Update, or Delete. All modified data that have made changes to the source tables are captured. This feature of Oracle CDC is available in the Oracle Standard Edition and the Oracle Enterprise Edition.
The second is the Asynchronous mode where data is sent to the redo log files. Here the CDC is not instantaneous but is done only after a SQL statement has performed a DML activity. Changed data is not captured as a part of the transaction that changes the source table and thus has no effect on that specific transaction. The three modes of asynchronous CDC are HotLog, Distributed HotLog, and AutoLog. Asynchronous mode is provided by Oracle as a relational interface to Oracle Streams. Oracle CDC has gone a long way to fine-tune database administration as well as replication and migration activities.