We start with creating a new procedure to use Jython technology and add a step 'PullFile' to it
Define an option for procedure to store the source directory name - SOURCE_DIR
'Command on Source' uses the RDBMS schema where the FILENAMES table exists and 'Command on Target' uses Jython with code to pull filenames, timestamp for last modification. This information is pushed to SQL table.
The SQL table have all files with names starting with 'DtDemo'. I played around with 'DtDemoBasic.txt' to save it at different times to get different timestamps for this file. Every time the procedure is executed, it loads the names of the files with their last modified timestamp. The filename for the latest one can be pulled by doing a MAX on timestamp.
It was not less that Linga added another requirement to optimize his integration - Process the file only if the file got updated, else ignore it
All I could think of - compare the timestamp of latest two files and then decide whether to process it or not. Lets create procedure which executes an update command to update the PRCSD_FLAG to 'Y' for the latest file. This procedure should get executed only when the timestamp of latest file is different from second latest file. We would do this check in an ODI variable and then call the procedure based on variable's value
The variable would be executed on the RDBMS schema with table FILENAMES and have the code to compare the timestamps of top first file with the top second file when files are arranged in descending order of timestamps
The complete flow can be arranged in a package with following steps:
- Pull all files from directory in a table
- Refresh variable with SQL to compare timestamps of top two files
- Evaluate the variable to check if value is '0' (zero)
- Execute procedure to update flag if variable value is zero