PATTERN MATCHING

VIVEKANANDHAN_PERIASAMY
VIVEKANANDHAN_PERIASAMY used Ask the Experts™
on
Abc
Created by ABC Rule with ID [XXXX]  XXXXXXX,,XXXX_XXX has no data for last 1 hour(s) 4 minutes

I have a column name abc like above, need to do pattern matching. If it's find anything after "has" followed by " "(space) "No Data"(Mactching word or wildcard) then  followed by space "for last" then it should group under similar title header No data.

Please don't use like command.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®

Commented:
I can't see why you describe the string "has no data for last" in such a complicated manner.

Your question is rather vague and what's the reason no LIKE command can't be used ?
This does sound like a homework assignment....
This is not a homewrok assignment, I am using union some to this query and there many types of header present inside this column, so when run this query i don't want those column to be neglected. So if use like command it will check only for those column which contain value "No data". This is should happened.

That's a reason,i told like command should be include.

Commented:
You can use wild characters in the LIKE to make sure the string you need is selected.
So:
like '%has no data for last %' will only retrieve rows with that string embedded in the field and
like '%has no data*for last %' will allow any number of characters between the 'data' and 'for' and
like '%has no data?for last %' will allow one character (for each ?) in the string.

Will this allow you to construct the unique string needed to identify the rows to select ?
Top Expert 2012

Commented:
"*" and "?" are not supported in T-SQL, the only wildcards are "%" and "_"
Nope,

I have column with not only these description.By using the wildcard it will check only for has no data desciption.
I also have different types of description which doesn't contain "has no data for last" any of these key words.So by using the like , other types of description will be elimainated.

And i cannot think of union all types descrition as , there are morethan 1000's types of description  present.

So please dont use Like ,instead i like to use pat-index
Top Expert 2012

Commented:
So please dont use Like ,instead i like to use pat-index
I am afraid there is no difference whatsoever.  

But unfortunately, I have no idea what you are asking.  You may want to post a SQL Script containing the data and the expected output as clearly we are not following you.
select substring(TicketDesc,PATINDEX('%has ',TicketDesc)+1,PATINDEX('%for last% ',TicketDesc)-1) from CT
        where patindex('%has no data for last%',TicketDesc) > 0 and
        pd =111
and date >='2012-03-11' and date <='2012-03-14'

I am getting below error:

Invalid length parameter passed to the LEFT or SUBSTRING function.

I am getting output for below query
select
 where patindex('%has no data for last%',TicketDesc) > 0 and
        pd =111
and date >='2012-03-11' and date <='2012-03-14'


expected output:
title              count
No data        20
Top Expert 2012

Commented:
Again, what we need is a SQL script that contains the data so that we can duplicate the problem and suggest solutions.

Having said that, you are overlooking the definition of SUBSTRING().  The third parameter is the length of the string.  Run the following query and you should see the problem:
SELECT  --SUBSTRING(TicketDesc, PATINDEX('%has ', TicketDesc) + 1, PATINDEX('%for last% ', TicketDesc) - 1)
        TicketDesc,
        PATINDEX('%has ', TicketDesc) + 1,
        PATINDEX('%for last% ', TicketDesc) - 1
FROM    CT
WHERE   PATINDEX('%has no data for last%', TicketDesc) > 0
        AND pd = 111
        AND date >= '2012-03-11'
        AND date <= '2012-03-14'

Open in new window


Note:  This is not the solution.  Just a learning tool so that you can understand what is going on.
Top Expert 2012

Commented:
Take a look at this SQL script (this is the best I can do without having the SQL script I requested):
DECLARE @TicketDesc varchar(100) = 'Created by ABC Rule with ID [XXXX]  XXXXXXX,,XXXX_XXX has no data for last 1 hour(s) 4 minutes'

SELECT  --SUBSTRING(TicketDesc, PATINDEX('%has ', TicketDesc) + 1, PATINDEX('%for last% ', TicketDesc) - 1)
        @TicketDesc TicketDesc,
        PATINDEX('%has ', @TicketDesc) + 1 Start,
        PATINDEX('%for last% ', @TicketDesc) - 1 Length

Open in new window


Output:
TicketDesc											Start	Length
Created by ABC Rule with ID [XXXX]  XXXXXXX,,XXXX_XXX has no data for last 1 hour(s) 4 minutes	1	-1

Open in new window

Do you see the problem now?
Sorry, I won't be able to give the query.It's confidential.I know i can mask it,but i am noy willing yo post the query.

I am also most done.

This is the result i am getting, need to truncate still before for last .....
Final result i want only "No data"

column
no data for last 1 hour(s) 4 minutes

Query:
SELECT  SUBSTRING(TicketDesc,

       
        charINDEX('no ', TicketDesc) ,
        charINDEX(' for last', TicketDesc)-1)
FROM    CT
WHERE   PATINDEX('%has no data for last%', TicketDesc) > 0
        AND pD = 111
        AND CreatedDtim >= '2012-03-11'
        AND CreatedDtim <= '2012-03-14'
Top Expert 2012

Commented:
Sorry, I won't be able to give the query.It's confidential.I know i can mask it,but i am noy willing yo post the query.
I was not asking for the query.  Just a SQL script to populate some sample data that illustrates the problem.
These are the sample data from ticketdesc

Created by ABC  Rule with ID [XXXX]  SERVERNAME,,XXXXX has no data for last 1 hour(s) 4 minutes
Created by ABC  Rule with ID [XXXX]  SERVERNAME,,XXXXX_XXX has no data for last 1 hour(s) 4 minutes
Created by ABC  Rule with ID [XXXX] SERVERNAME,,XXXXX has no data for last 2 hour(s) -55 minutes

I want only NO data from those junk values.

expected result:

title                 Count
No data            3

Commented:
Looks to me that patternmatching with LIKE or (sub)stringing and/or charindex are all capable of producing the same result.

With the above sample a simple:
select 'No data', count(*) from tblSource where abc LIKE '%has no data for last%' 

Open in new window

will give the described result.

When there are more descriptions that hold this string and shouldn't be selected, than you need the wild characters acperkins described to change the pattern so it's uniquely selecting the needed rows. When you can't describe such a unique pattern, than you'll never get the needed result, as other commands won't solve your matching problem when it's logically impossible to define.
If like and charindex,patindex are same,then why do we have those?Kindly explain.
Is it because of PERFORMANCE?

Commented:
The result of the statements:
 PATINDEX('%has no data for last%', TicketDesc) > 0
and
LIKE '%has no data for last%'
will be exactly the same, but when you need the startposition of the string (e.g. for inserting additional characters in a string) you should use PATINDEX(). Here it will return 0 when not found and some value when found (tested with >0)
As PATINDEX() is a function, the database engine will have to activate the code that will determine the startposition and you can imagine that a standard SQL function only having to determine or the sting is present will be faster.

Personally I prefer solutions "as simple as possible" and the LIKE is easier to read and used in almost every SQL dialect, as the PATINDEX() function is a typical MS SQL function that can't be used in e.g. Oracle or MS Access like the LIKE :-)

To test the performance I often create a large dummy table and check the duration of either solution by running the query several times.

Commented:
I appreciate that there are serious performance implications in using leading wild card characters in a WHERE clause.  Would it be possible to add a computed column (or use a trigger to populate a column with the required piece of text) to the table and index it?  My thought here is that you could use a substring function, or the pattern function to populate the column value and the load would only occur on data update and at a row per time removing the overhead when you wanted to perform the SELECT.
experts since there is performance issue can any one help me to get index query for the same?

Created by ABC  Rule with ID [XXXX]  SERVERNAME,,XXXXX has no data for last 1 hour(s) 4 minutes
Created by ABC  Rule with ID [XXXX]  SERVERNAME,,XXXXX_XXX has no data for last 1 hour(s) 4 minutes
Created by ABC  Rule with ID [XXXX] SERVERNAME,,XXXXX has no data for last 2 hour(s) -55 minutes

I want only NO data from those junk values.

expected result:

title                 Count
No data            3
experts since there is performance issue can any one help me to get index query for the same?

Created by ABC  Rule with ID [XXXX]  SERVERNAME,,XXXXX has no data for last 1 hour(s) 4 minutes
Created by ABC  Rule with ID [XXXX]  SERVERNAME,,XXXXX_XXX has no data for last 1 hour(s) 4 minutes
Created by ABC  Rule with ID [XXXX] SERVERNAME,,XXXXX has no data for last 2 hour(s) -55 minutes

I want only NO data from those junk values.

expected result:

title                 Count
No data            3

Commented:
Did you try the LIKE ?
yes, it's working. please help in getting those o/p using charindex,patindex

Commented:
Why use charindex and patindex for the same, knowing it will be slower....
Reason is because,I have lots of junk data in the column and want to remove all the junk data and group them as per the index pattern.
This column does not have few captions,it has more than 1 million captions and it's diffcuilt for me to put in case or in IF statement.

Sample column:

ShortDesc: MemoryAvailableMBytes.xxxx;  Desc: xxxx:  xxxx xxxx:   Value = 198; Context: |xxxx:xxxx|xxxxx:xxxxx xxxx|Ixxx:|Txxx:200|xxx xxx:10|xxxx:120|xxxxr:less|xxxpe:xxxxx|

ShortDesc: SQLServerGeneralStatisticsProcessesblocked.SQLServer2K8;  Desc: xxx:xxx xxxx:  Pxxx xxx:   Value = 36; xxxxt: |xxx:xxx:xxxl xxx|xxxr:xxxx xxx|xxxe:|Txxx:10|Rxxxt:0|xxxxal:900|Oxxxxr:greater|Txxx:xxx||

I have mask the actual data with chareacter "xxx"

So I want to group them by name and give the count.
Need to take the character after ShortDesc: and end still ;  Desc:

So my Expected output:
title                                                                                              count
MemoryAvailableMBytes.xxxx                                                    1
SQLServerGeneralStatisticsProcessesblocked.SQLServer2K8 1

Commented:
Why not use a UNION with a query per LIKE string like:

select "MemoryAvailableMBytes" as title, count(*) from tblX where description like "%MemoryAvailableMBytes%"
UNION ALL
select "No data" as title, count(*) from tblX where description like "% has no data for %"
UNION ALL
select "SQLServerGeneralStatisticsProcessesblocked.SQLServer2K8 " as title, count(*) from tblX where description like "%SQLServerGeneralStatisticsProcessesblocked.SQLServer2K8 %"
UNION ALL
...etc

Open in new window


You could even add a (select "Total" as Title, count(*) from tblX) to be able to show how many records haven't been counted by the other LIKE strings.

Commented:
To make it flexible you can use:

substring(Description,charindex(':',Description)+1,charindex(';',Description)-1)

to extract just the string between ':' and ';'

You'll need to exclude the "has no data for " string rows and use a separate query for them in a UNION.
select substring(desc,
patindex('%ShortDesc:%',desc)+1,
patindex('%;  Desc:%',desc)-1) from CT
WHERE Id in(111)
and CreatedDatetime between DATEADD(dd,-7,getdate()) and getdate()
and desc like '%ShortDesc:%'

and i'm getting like below

hortDesc: Eventxxxx.MSSQLSERVER.Application.xxxTIPSR.Error50000Severity16State1.Alert;  Desc: Context: |xxxxx:17063|EventSource:MSS
It's not excluding shortdesc: and desc :

As I said, its hard to find the unique header which is morethan than 1 million.So it will be helpful if i'm using charindex or patindex.
Even in future if someone is adding the new header i don't want to add it manually.
As above(using like ) it almost hard coded.i need it dynamically which can pick up new header even it's added in future.

Commented:
I just used the ':' and ";" as separators as I assume they make the unique markers for the needed description.
(Your longer string will give another value for the start of the string and you would have to add the stringlength to correct the position of the start of the needed string.)
Just give it a try with my markers ( the ':' and ";" ) and see the result.
yes it works  from starting point but it didn't for end point.

Eventxxxx.MSSQLSERVER.Application.xxxTIPSR.Error50000Severity16State1.Alert;  Desc: Context: |xxxxx:17063|EventSource:MSS

can we use" Desc:" at a end point.
I tried but it's not working

Commented:
Try to copy/paste the ";" from such a description into the CHARINDEX function and try again please.
Sorry Sir, Didn't work!!! Is it working for you Sir?
Commented:
Oops, the second substring parameter is the length and not the position of the ';' thus we need to calculate the difference like:

substring(Description,charindex(':',Description)+1,charindex(';',Description)-(charindex(':',Description)+1))

Open in new window

and yes, this works for me with:

select substring('Test:Eventxxxx.MSSQLSERVER.Application.xxxTIPSR.Error50000Severity16State1.Alert;  Desc: Context: |xxxxx:17063|EventSource:MSS',charindex(':','Test:Eventxxxx.MSSQLSERVER.Application.xxxTIPSR.Error50000Severity16State1.Alert;  Desc: Context: |xxxxx:17063|EventSource:MSS')+1,charindex(';','Test:Eventxxxx.MSSQLSERVER.Application.xxxTIPSR.Error50000Severity16State1.Alert;  Desc: Context: |xxxxx:17063|EventSource:MSS')-(charindex(':','Test:Eventxxxx.MSSQLSERVER.Application.xxxTIPSR.Error50000Severity16State1.Alert;  Desc: Context: |xxxxx:17063|EventSource:MSS')+1))

Open in new window


Giving:
Eventxxxx.MSSQLSERVER.Application.xxxTIPSR.Error50000Severity16State1.Alert

Open in new window

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial