Using Pentaho Spoon for ETL using staging

We want to start using Pentaho Spoon for ETL. In the transforming we will use a staging table. I know that this is possible and works well in MS SQL SSIS.

In Spoon we have the Table Input and Table Output and I need to know which step we would use to transform the data from the staging table to the final table?

Can someone point me to some resourses on this? So far have not found anything specific.

We are migrating from MS SQL to PostgreSQL

Here is a transformation query that we currently use in MS SQL SSIS

MERGE AR_TRANSACTIONS AS D

USING AR_TRANSACTIONS_STAGE AS S

ON (
	 D.CUST	               = S.CUST	and
	 D.FOLIO	              = S.FOLIO   and
                 D.RECNO            = S.RECNO   and
	 D.TRANS_NO     = S.TRANS_NO and
	 D.REF_NO          = S.REF_NO  and
	 D.CODE              = S.CODE
 )

WHEN NOT MATCHED
THEN INSERT
    (
	   CUST,
	   ARF_DATE,
	   DUE_DATE,
	   CODE,
	   TRANS_NO,
	   FOLIO,
	   RECNO,
	   REF_NO,
	   D_AMOUNT,
	   C_AMOUNT,
	   BALANCE,
	   OPEN_CLOSE_FLAG,
	   BVRVMODDATE,
	   BVRVMODTIME
    )
VALUES
    (
   		S.CUST,
		S.ARF_DATE,
		S.DUE_DATE,
		S.CODE,
		S.TRANS_NO,
		S.FOLIO,
		S.RECNO, 
		S.REF_NO,
		S.D_AMOUNT,
		S.C_AMOUNT,
		S.BALANCE,
		S.OPEN_CLOSE_FLAG,
		S.BVRVMODDATE,
		S.BVRVMODTIME
   
)
WHEN NOT MATCHED BY SOURCE THEN
    DELETE
WHEN MATCHED AND
 (
  S.BVRVMODDATE <> D.BVRVMODDATE OR 
  S.BVRVMODTIME <> D.BVRVMODTIME
     )
THEN UPDATE
SET 
		CUST			=	S.CUST,
		ARF_DATE		=	S.ARF_DATE,
		DUE_DATE		=	S.DUE_DATE,
		CODE			=	S.CODE,
		TRANS_NO		=	S.TRANS_NO,
		FOLIO			=	S.FOLIO,
		RECNO                                   =              S.RECNO,  
		REF_NO			=	S.REF_NO,
		D_AMOUNT		=	S.D_AMOUNT,
		C_AMOUNT		=	S.C_AMOUNT,
		BALANCE		=	S.BALANCE,
		OPEN_CLOSE_FLAG	=	S.OPEN_CLOSE_FLAG,
		BVRVMODDATE		=	S.BVRVMODDATE,
		BVRVMODTIME		=	S.BVRVMODTIME;

Open in new window

LVL 1
GerhardpetAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Ferruccio GuicciardiDatabase AdministratorCommented:
Hi Gerhardpet:

In between 'Table Input' and 'Table Output' steps you can add following  steps:

1. 'Strings Cut' step to cut fields to cut and accommodate fields contents as per schema of fact table table define in the 'Table Output' table. Simiar to applying a substring function.

2. 'Select' step to make conversions by manipulation of the Meta-data of the data stream. Final 'Select' step just before the 'Table Output' is required so data stream make mapping available to fact table.

3. 'Sort rows' : Sort Rows based upon field values (ascending or descending)

4. 'Append stream' when making union of two data stream with same schema. Append 2 streams in an ordered way      

5. 'Combination lookup/update': Update a junk dimension in a data warehouse. Alternatively, look up information in this dimension. The primary key of a junk dimension are all the fields.

6. 'Dimension lookup/update': Update a slowly changing dimension in a data warehouse. Alternatively, look up information in this dimension.

7. 'Merge Join': Joins two streams on a given key and outputs a joined set. The input streams must be sorted on the join key

8. 'Merge Rows (diff)': Merge two streams of rows, sorted on a certain key. The two streams are compared and the equals, changed, deleted and new rows are flagged.

9. 'Java Filter': Filter rows using java code

10. 'Notes': Those are small yellow notes that I used to stick in several parts of data stream in order to document what each step of KTR does.
Good example here:
http://churriwifi.files.wordpress.com/2010/06/transformacion-etl-tabla-hechos-ventas.jpg

These are not all possible steps but are the most used by me. I am not using 'Text input' step since I am combining Pentaho with another tool called Pervasive Data Integrator, that better handles queue of csv extracts coming on day by day basis via FTP.
More info:
http://integration.pervasive.com/

So, CSV to stage table load I handle it with Pervasive and stage to fact and dimensions loads I handle with Pentaho. On top of all these, you need to build the Mondrian XML schema to define and publish cube.

For complete reference of all Pentaho Kettle steps available and good set of sample Kettle designs and models:
http://wiki.pentaho.com/display/EAI/.09+Transformation+Steps
http://wiki.pentaho.com/display/EAI/Pentaho+Data+Integration+Steps
http://infocenter.pentaho.com/help/index.jsp?topic=%2Fpdi_user_guide%2Fconcept_pdi_usr_transformations.html
http://www.mooreds.com/wordpress/archives/category/pentaho-data-integration
http://kettle.bleuel.com/
http://churriwifi.wordpress.com/2010/06/01/comparing-talend-open-studio-and-pentaho-data-integration-kettle/
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server 2008

From novice to tech pro — start learning today.