Data Profiling Task to Validate data before load to EDW

Alpesh PatelSolution Architect at TCS (SAFe®Agilist, TOGAF® Foundation, PSM I®, MCP, MCSD)
CERTIFIED EXPERT
Published:
Updated:
Everyone has problem when going to load data into Data warehouse (EDW). They all need to confirm that data quality is good but they don't no how to proceed.

Microsoft has provided new task within SSIS 2008 called "Data Profiler Task". It solve the problem of all SSIS developer.

The task analyze data in a SQL Server database and, from the results of that analysis, generate XML reports that can be saved to a file or an SSIS variable. By configuring one or more of the task's profile types, you can generate a report that provides details such as a column's minimum and maximum values, or the number and percentage of null values, or a column's min and max length with row count, or a column's value with row count, or a column's value's pattern with frequency, or column's functional dependency or a column's candidate key ratio etc.


 
Despite the variety of statistics that the Data Profiling task can provide, the task's practical applications might appear limited when you first try to implement it. After all, what use are data statistics to the automated processes in an SSIS package? However, by accessing the results generated by the Data Profiling task, you can design a workflow that automatically determines the appropriate actions to take based on the validity of the source data.

In this article, I describe a sample SSIS package that includes the Data Profiling task. The task is configured to generate a report based on the Value Inclusion profile type, which provides details that let you determine whether one or more columns in the source table are suitable as a foreign key. Although this type of profile might not seem particularly useful to an SSIS package, it can actually be quite valuable when loading data because you can use it to compare source data with valid data in your target database to determine the validity of the source data.

Let's take sample to understand the functionality of Data Profiler Task.

Data profiler1. You want to check length of particular column data length before load to actual table. It may help to prevent your package to crash if data length is higher than the column width.

This moment you can use "ColumnLengthDistributionProfile" and get consolidated details related to given table.

 Data profilerYou have to pass the name of table and columns which you wish to check for data length.


2. You want to check Null value in columns before load to actual table. It may help to prevent your package to crash due to Null insertion.

This moment you can use "ColumnNullRatioProfile" and get consolidated details related to given table.

Data profilerYou have to pass the name of table and columns which you wish to check for Null value count.


3. You want to check min and max value in columns with it's frequency

This moment you can use "ColumnStatisticsProfile" and get consolidated details related to given table.

Data profilerYou have to pass the name of table and columns which you wish to check for min and max value and it's frequency in column.


4. You want to check value in columns before load to actual table. It may help to load conditional to warehouse.

This moment you can use "ColumnValueDistributionProfile" and get consolidated details related to given table.

Data profilerYou have to pass the name of table and columns which you wish to check for value and frequency of that value in column.


5. You want to check value pattern in columns before load to actual table. It may help to load conditional to warehouse.

This moment you can use "ColumnPatternProfile" and get consolidated details related to given table.

Data profilerYou have to pass the name of table and columns which you wish to check for pattern value and frequency of that value in column.

Please refer Log-File.xml for sample Data Profiler Output.

You can view the output using the Tool, which is available at
    %programfiles%\Microsoft SQL Server\100\DTS\Binn\DataProfileViewer.exe

Please refer link for more details.

Thanks,

Alpesh
Log-File.xml
Capture2.PNG
1
5,548 Views
Alpesh PatelSolution Architect at TCS (SAFe®Agilist, TOGAF® Foundation, PSM I®, MCP, MCSD)
CERTIFIED EXPERT

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.