Link to home
Start Free TrialLog in
Avatar of Oneill0003
Oneill0003

asked on

Parsing Data into Multiple Fields

Experts:
Got the following string and I need to parse it into 9 fields:

1236/14852/369215852/147852/3652/147896/1456398/14523/1254896

I will like to parse this data into 9 fields, since the most the set of ids that this string can have is 9, but not all the time is 9 it can range between 1 to 9 sets of tickets.

I got some logic worked but Im only pulling the first and second set of numbers:

Select substring('123456/147852/369852/147852/369852/147896/1456398/14523/1254896/123654',1,CHARINDEX('/','123456/147852/369852/147852/369852/147896/1456398/14523/1254896/123654
')- 1) AS ID1

I will really appreciate any tip or idea,
Thanks!
Avatar of grant300
grant300

First, which version of Sybase?  If you are on 15.0.2, there is a slicker way to handle this using T-SQL Functions.  For now, I will assume you are running an older version and do not have the JAVA extension licensed and installed.

Next question, are you trying to come up with 9 records, or just 9 fields?

In either case, the problem is that CHARINDEX (and PATINDEX) only return the the first instance of the character or pattern you are searching for.  I have not been able to find a way around that.  As a result, you will have to do this programmatically.  Bottom line, unless you have UDFs available (either JAVA or T-SQL), you cannot accomplish this in a single query.

Now, if you were willing to pad the ticket number out to a fixed width, you can pull it apart with no problem with 9 substring operators.  That would be my recommendation since the design of stuffing variable length delimited strings into a varchar is so highly non-relational.  Just not a good way to go at all.  The really relational thing to do is to normalize the ticket numbers into an additional table but that may not be practical at this point.

As far as an algorithm to pull the ticket numbers apart goes, you need to setup a WHILE loop to iterate 9 times.  Then each time through the loop, you use CHARINDEX to find the first '/' character, substring out what you need off of the front of it, and then chop the front off the string.  You can do that with either another substring or a STUFF function.

You can, of course, exit the WHILE loop early when the remaining length of the string is zero.

Sorry there was no magic way to make this work but the functionality does not exist in native T-SQL.

Regards,
Bill
Avatar of Oneill0003

ASKER

Hey Bill:
I'm running Sybase 9. I'm trying to come up with 9 fields.

Sound like is going to be a trip to get this done. I'm new to Sybase i work usually with MS SQL Server.
Do you think you can give me a hand on coming up with the loop? I have been working with CHARINDEX and actually LOCATE  and SUBSTRING in sybase but with no results. So do I need more then 1 query for this?
Not to sure what you mean by "if you were willing to pad the ticket number out to a fixed width"?
The length of the ticket number right now seems to be always 10 characters but it could change.

Thanks Bill!
Oh, you are not running ASE, you are running Sybase ASA (Adaptive Server Anywhere).

Well the good news is that you have the LOCATE function and can create User Defined Functions so you have a fighting chance.

I don't have an ASA 9 database handy so I can't create the actual code for you but I can outline it and you can slog on from there.

First, you will use the CREATE FUNCTION syntax to create a function that takes the field/string you want to parse, the separator, and the number of the element you want to return.  The function then returns that element as a string.

As I said before, you will build a WHILE loop that takes the first element off of the beginning of the string and then removes that portion and the delimiter from the front of the string.  You will do that N times until you get the element you want out of the string.

To use this in a query, you will have to call the function 9 times, one for each of the elements you are trying to retrieve.  This won't be particularly fast and you will have a lot of edge conditions to deal with in the code including the first element, the last element (since there is no trailing delimiter, a single element in the string, and an empty string.

The example you gave shows 5, 6, and 7 digit long ticket numbers, so I am not sure how you can say they are always 10 digits.  It may be they are a maximum of 10 digits but that does you no good at all.

When I said you should pad them out, I meant that they should all be the same length in the string.  Whether you pad them with leading zeros or spaces doesn't matter but it would make life so much easier you should really consider it.

0002345601/0000001234/0008765432 --- etc. or
      2345601/            1234/      8765432 ........

That way you could create a simple function that took the field, a field length, and an element number and pulled the element you wanted with a simple substring.  Definitely the way to go.

Regards,
Bill
Bill:
Yes you are correct Im using ASA.
At this point I dont have enough privileges to create a User Defined Functions, our DBAs are very picky and to get that kind of access I have to go throw many hoops and that this point I dont have the time. I know this is not the most professional way to get this accomplish but Im going to have to do this just with brute force right in the query. I know Im going to be creating a lot of syntax but like I mention right now I dont have the level of access to create UDF. Im creating a DTS package and connecting to Sybase and generating a SQL task in it to accomplish this. I will love to do the UDF but cant right now.

I now that the example I provided shows 5, 6 and 7 digits all I was trying to do was to show that the length of each set of number can change but on the existing data they all seem to have a max of 10 digits but it can change eventually in the near future and I just wanted to display a worse case scenario.

Im not to sure how to do the padding, remember that they could be less then 10 or greater then 10 digits.

I know Im not giving you many options but many of the times they dont give me the right tools that I need or the level of access  to get stuff accomplish and I just have to work with what I have.

I really appreciate all of your help Bill and trying to brain stormed a solution. Im going to keep playing with LOCATE and SUBSTRING and see were I get.

Thanks Again!
Don't bother.  With the constraints you have, there is literally no way to do what you need to do.  You can not get LOCATE, CHAR_INDEX, or PATINDEX to give you the second through ninth elements.  The parser will give up long before you manage to write the complex in-line expressions nested 9 levels deep.  Again, the database just won't do it.

 - You are stuck with a crappy data design
 - You have no privileges, and
 - It appears you are unable to change the application to store the ticket#s in a useful form

Game over!

When your DBA is the obstacle, I have found great success in publicly dumping the problem in their lap.  First, just ask for help.  If that doesn't work....

Write a sickeningly sweet e-mail to the DBA explaining what you need done, all the constraints (including the lack of privileges to actually use the database as it was intended), that you can't figure out how to do it in plain SQL and that you need his/her help.  Carbon copy your boss and his (if they are not the same) and make it clear that you are dead in the water until this is resolved.  Now you have made it his problem, not yours, and made him and the problem visible to management.

One of two things will happen; either the DBA will write the function you need or he will grant you the privileges to write stored procedures and functions.  This setup your in is really kind of stupid since you should be working in a development environment where you can't really hurt anything anyway.  If the DBA wants to review the code before it goes in the production database, that makes sense though I can guarantee you that trying to do everything in SQL statements alone leads to some very, very bad (undesirably complex, inefficient, difficult to maintain, and impossible to tune) code.

Best of luck,
Bill
ASKER CERTIFIED SOLUTION
Avatar of bret
bret
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Bret,

That works, but the updates to temp table are really expensive, particularly if there is any real data volume.  It also assumes the developer even has privileges to create a temp table.  We have both seen questions from the poor souls trying to do work against that kind of nonsense.

If he can't create a stored procedure, he could still create a block of T-SQL, declare variabales, a cursor against the input table, the while loop, and use the "pull one off the front and do it again" methodology.

I guess the usefulness of doing this in a block of T-SQL depends on what context it all has to happen in.  If it is a report of some kine, this just might work.  If you have to call it from a reporting tool, e.g. Cognos, Crystal, etc., I think it is going to be an uphill battle.

Oneill0003, what is the context for this effort?

Regards,
Bill
Hi guys:

Bret thanks for your input on this.

Bill, this data will be exported to a delimiter file from my dts package, which it will be delivery to another user that will either import it into excel or an Access database to analyze it.

Thanks
Isaac
I think the question may be more on how much data will you be dealing with and how frequently?  Are you having to just occasionally parse out a single input, or will you be parsing out batches (of what size?) of inputs at a time?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Guys:
Once again I really appreciate all of your input on this issue.
Talking to my boss we end up using one of our BIA tools BRIO(Hyperion) which has a beautiful functions that lets you return the position of the mth occurrence of the s2 in S1 starting at n (Thats the definition of the function). I used it and it let me parse that field with no problem. We were trying our best to stay away from BRIO because of some other requirements of the project plus we had to do some other stuff that we cant do in BRIO, but because of this issue we have to do part of it in my DTS package and then do the parsing and the export of the files through BRIO.
I will definitely keep all of your suggestions Bill and we still agree that is better to have a UDF that will do this for us in a much better and cleaner way.

I will double the points and split them.

Thanks again for all of your Help!!!!