ODI Series – Developing Efficient Integration
It has been a long time that I posted anything , and thought of writing something I
have been working with lately – ODI.
Recently, I was training few newbies in ODI when I was asked a question – “How can we be sure whether we have developed an
interface which is an efficient one? “
I guess there is no standard answer to this question. It all
depends on the technology we are working with and the transformation we intend
to achieve. We need to remember that ODI utilizes the underlying technology to
perform its transformation and as a developer we need to make use of it in the
best way. In the following post, I will be talking about the process of
analyzing our data sources and technology to improve integration process.
Let’s consider following scenario and data sources:
Flat File called Products.txt which stores
information about product codes
Target DB which stores Product information pulled from flat file and translation table both with following structure of table
We proceed with the assumption that we know how to reverse
engineer flat files and RDBMS tables and bring them into ODI as models/datastores. Once
reversed, the Datastores can be dragged on to interface as source and target and mapped as required. The interface should look something like this:
One easy way to optimize your integration is by minimizing
the number of steps involved for data movement and transformation between
source and target. Though, this may not matter much with small chunks of data
files or records, but may severely affect integration time when involving huge
volumes of data. In the following scenarios we would see how this can be
handled well.
Scenario 1: Staging
layer is neither source nor target
Let’s consider a scenario where you choose a staging area
which is neither the source not target. In this interface we have flat file and
RDBMS table (schema called DBStaging) as source and RDBMS table (schema called DBTarget)
and thus we choose In-Memory Engine: SUNOPSIS_MEMORY_ENGINE as our staging area.
The flow tab diagram shows the flow of data from source to
staging and then to target. The execution takes 15 steps to complete movement
of data
Scenario 2: Staging layer is one of the source schema
Since one of our source is RDBMS (Translation table schema) we can use the same schema as our staging layer. This will reduce the step 1 of moving data from RDBMS table to SrcSet0 in the above scenario as the table exists in the same schema.
The execution now takes 11 steps to complete data movement. We actually skipped the steps to move data from Translation Table into SrcSet0 in Scenario 1
Scenario 3: Explore Knowledge Modules
It's always good to explore all the available Knowledge Modules and understand their features. In the above two scenarios, we used IKM SQL Control Append for populating data into target, while for this scenario we would use IKM SQL to SQL Control Append.
IKM SQL Control Append can only read data from single source and due to this reason we used LKM SQL to SQL at the staging layer. This performs the join between SrcSets and load data to a temp table. IKM SQL Control Append then pulls data from this temp table. On the other hand IKM SQL to SQL Control Append has the ability to read data from multiple sources. This can be observed in the flow diagram below, where LKM SQL to SQL is no longer required.
The execution now takes just 7 steps to complete data movement. We skipped the steps to load data to Temp table at the staging layer by using IKM SQL to SQL Control Append, reducing steps for overall integration.
As said earlier, all these optimization steps are dependent on underlying technology being used in integration and the transformation activity. Since the target here was a RDBMS schema, we were able to make use of IKM SQL to SQL Control Append; had the target been either Essbase or Planning, we would be bound to use the specific knowledge modules and need to look out for alternate ways to optimize our steps.
Hope this helps !!!
Comments
Post a Comment