ODI Series – Developing Efficient Integration

It has been a long time that I posted anything , and thought of writing something I have been working with lately – ODI.
Recently, I was training few newbies in ODI when I was asked a question – “How can we be sure whether we have developed an interface which is an efficient one? “
I guess there is no standard answer to this question. It all depends on the technology we are working with and the transformation we intend to achieve. We need to remember that ODI utilizes the underlying technology to perform its transformation and as a developer we need to make use of it in the best way. In the following post, I will be talking about the process of analyzing our data sources and technology to improve integration process.
Let’s consider following scenario and data sources:

Flat File called Products.txt which stores information about product codes


Staging DB which stores a Translation table. This table has the Alias names for all the product codes which should be populated in target


Target DB which stores Product information pulled from flat file and translation table both with following structure of table


We proceed with the assumption that we know how to reverse engineer flat files and RDBMS tables and bring them into ODI as models/datastores. Once reversed, the Datastores can be dragged on to interface as source and target and mapped as required. The interface should look something like this:


One easy way to optimize your integration is by minimizing the number of steps involved for data movement and transformation between source and target. Though, this may not matter much with small chunks of data files or records, but may severely affect integration time when involving huge volumes of data. In the following scenarios we would see how this can be handled well.

Scenario 1: Staging layer is neither source nor target
Let’s consider a scenario where you choose a staging area which is neither the source not target. In this interface we have flat file and RDBMS table (schema called DBStaging) as source and RDBMS table (schema called DBTarget) and thus we choose In-Memory Engine: SUNOPSIS_MEMORY_ENGINE as our staging area.



The flow tab diagram shows the flow of data from source to staging and then to target. The execution takes 15 steps to complete movement of data


Scenario 2: Staging layer is one of the source schema
Since one of our source is RDBMS (Translation table schema) we can use the same schema as our staging layer. This will reduce the step 1  of moving data from RDBMS table to SrcSet0 in the above scenario as the table exists in the same schema.



The execution now takes 11 steps to complete data movement. We actually skipped the steps to move data from Translation Table into SrcSet0 in Scenario 1

Scenario 3: Explore Knowledge Modules
It's always good to explore all the available Knowledge Modules and understand their features. In the above two scenarios, we used IKM SQL Control Append for populating data into target, while for this scenario we would use IKM SQL to SQL Control Append.
IKM SQL Control Append can only read data from single source and due to this reason we used LKM SQL to SQL at the staging layer. This performs the join between SrcSets and load data to a temp table. IKM SQL Control Append then pulls data from this temp table. On the other hand IKM SQL to SQL Control Append has the ability to read data from multiple sources. This can be observed in the flow diagram below, where LKM SQL to SQL is no longer required.



The execution now takes just 7 steps to complete data movement. We skipped the steps to load data to Temp table at the staging layer by using IKM SQL to SQL Control Append, reducing steps for overall integration.

As said earlier, all these optimization steps are dependent on underlying technology being used in integration and the transformation activity. Since the target here was a RDBMS schema, we were able to make use of IKM SQL to SQL Control Append; had the target been either Essbase or Planning, we would be bound to use the specific knowledge modules and need to look out for alternate ways to optimize our steps.

Hope this helps !!!



Comments

Popular posts from this blog

Power BI + Oracle PBCS - Part 1: REST API Structure

Unexpected Essbase error 1007083

Groovy Series 4: Playing with cell level formatting