We help IT Professionals succeed at work.

oracle sql splitting a comma-separated string but ignoring commas in quotes

Hi,

Kindly help on the following problem. I am using Oracle SQL to split comma separated string values. However, need to ignore the values with comma.

Below is the sample data, which needs to be separated/split (comma seperated)
'franchise': 'Px Oncology (Solid Tumors, all XA Onc projects)', 'zlinical_xombination_study': 'No'

Open in new window


sample output would be,
franchise: Px Oncology (Solid Tumors, all XA Onc projects)
zlinical_xombination_study: No

Open in new window



problem is, the value 'franchise': 'Px Oncology (Solid Tumors, all XA Onc projects)' is also having a comma within its value, need to ignore this comma while splitting

I am using the below code,
select LTRIM (REGEXP_SUBSTR (ERROR_TEXT, '[^,]+', 1, 1), ',')    AS part_1 FROM ERROR_LOG_TEMP_5;

Open in new window



The output I am getting is ['franchise': 'Px Oncology (Solid Tumors]

It should provide,  ['franchise': 'Px Oncology (Solid Tumors, all XA Onc projects)']
Comment
Watch Question

SILVER EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
Will the string ALWAYS have a ', as the delimiter?

See if this works for you:
rtrim(regexp_substr(error_text,'''.*'','),',')


ste5anSenior Developer
BRONZE EXPERT

Commented:
hmm, I'm not aware of a bulletproof version for plain PL/SQL, cause CSV is pretty complex with it simple rules - almost like Go.

So I would:

1) Convince the provider of that data to use XML or JSON.
2) see 1).
3) really see 1).
4) Implement the following logic, it should cover the simpler use-cases:

    a) Split at the comma.
    b) Count the single quotes per item.
    c) If an item has an odd number of single quotes, then merge it with the next item and restore the content comma.
    d) When a merge occurred goto b).
suvmitraManager

Author

Commented:
Hi  slightwv (䄆 Netminder) ,
Your solution is not providing me the expected result which is 'franchise': 'Px Oncology (Solid Tumors, all XA Onc projects)'  instead it is just returning the actual data set.
SILVER EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
>>Your solution is not providing me the expected result which is

It seems to provide that for me.

I pass in the full string and from the fiddle below I get:
'franchise': 'Px Oncology (Solid Tumors, all XA Onc projects)'


Fiddle here:
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=77ee8e88e9af0e56613d4a210b90de90 

suvmitraManager

Author

Commented:
with error_log_temp_5 as (
   select q'['franchise': 'Px Oncology (Solid Tumors, all XA Onc projects)', 'zlinical_xombination_study': 'No', 'clinical_comparator_study': 'No]'
      error_text from dual
)
select
rtrim(regexp_substr(error_text,'''.*'','),',')
from error_log_temp_5
/

Thanks, I understand the problem here.

Please check the code above, I added one more value in the string and now it is returning the first two values ignoring comma. So, what I need is only the franchise value not others. 
suvmitraManager

Author

Commented:
Just one more pattern of the string, may be useful for coding.

The franchise value will be always at the beginning of the string. It may be upto 65 chars. So, incase the string is too long say 4,000 chars - we may need to look at first 100 chars at max to get the franchise value. All other values are separated by comma.

But the franchise value may also contain a comma. need to ignore the comma and get the full value from franchise.
SILVER EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
Try this one.  

I added what I'm thinking may be the next step and split ALL of them into individual rows.

The key is to replace ', with a character that will NEVER be in the field.  I guessed chr(254).  Use whatever you want.

with error_log_temp_5 as (
   select replace(
         q'['franchise': 'Px Oncology (Solid Tumors, all XA Onc projects)', 'zlinical_xombination_study': 'No', 'clinical_comparator_study': 'No]'
         ,''',',chr(254)
      ) error_text
   from dual
)
select
trim(regexp_substr(error_text,'[^' || chr(254) || ']+', 1, column_value)) || ''''
from error_log_temp_5,
   table(
      select collect(level) from dual
      connect by level <= length(error_text) - length(replace(error_text, chr(254))) + 1
   )
/


suvmitraManager

Author

Commented:
some issue with the code. Brackets related.

Error:
ORA-12726: unmatched bracket in regular expression
12726. 00000 -  "unmatched bracket in regular expression"
*Cause:    The regular expression did not have balanced brackets.
*Action:   Ensure the brackets are correctly balanced.

                                  
SILVER EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
Cool "bug".  I'm running 12.2 and what I posted runs fine on my database.  On dbfiddleuk 11g and 18c and livesql, it fails.

Change chr(254) for chr(1) in all three places and it works in dbfiddleuk and livesql.
suvmitraManager

Author

Commented:
Hi  slightwv (䄆 Netminder) ,

Yes, this is working. Thank you.

I just need to know further, instead of the hardcoded value, what would be the updated code if we use the field/column directly from table.

Source field here is 
error_text
So, I guess the below code needs further update.

with error_log_temp_5 as (
   select replace(
         q'['franchise': 'Px Oncology (Solid Tumors, all XA Onc projects)', 'zlinical_xombination_study': 'No', 'clinical_comparator_study': 'No]'
         ,''',',chr(254)
      ) error_text
   from dual
)
SILVER EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
>>what would be the updated code if we use the field/column directly from table.

Remove the CTE (WITH).

select
trim(regexp_substr(error_text,'[^' || chr(1) || ']+', 1, column_value)) || ''''
from error_log_temp_5,
   table(
      select collect(level) from dual
      connect by level <= length(error_text) - length(replace(error_text, chr(1))) + 1
   )
/

I used the CTE to have the same table_name, column name and data as your original SQL.
suvmitraManager

Author

Commented:
Hi  slightwv (䄆 Netminder) ,

Thanks again. I think it is very close but failing at my end. I am attaching data after masking. Kindly just use this data to get the desired result. I am good otherwise.

Book1.xlsx
SILVER EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
I asked a question you never answered back in my first post:
Will the string ALWAYS have a ', as the delimiter?

The answer is now:  No.

What is your Oracle version?

What is the final goal here?

What you appear to have is semi-JSON.  If you wrap the text in '{}', it is JSON.

Depending on your final goal, it might be best to forget hacking together parsing strings, and parse using JSON functions.
suvmitraManager

Author

Commented:
Well, the source is an XML file. But, then I have imported the XML as CSV into Oracle DB. Then after trimming this is the final output from Error_Text field which I posted in Book1.xlsx

So now, it is just a string coming from oracle table. As I mentioned earlier, what if we just consider first 100 characters of the field value and then perform the extraction for franchise value?

Having said that, I just want to acknowledge, I am really happy with your approach so far. And if you want me to close the thread, I will.
SILVER EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
If you have access to the original XML, why are we working with "parsed" data?  Why not just work directly with the RAW XML to extract what you want?  It will be MUCH easier than trying to hack around an XML to CSV hack.

No need to close this out until we eventually get you what you want.
suvmitraManager

Author

Commented:
This is the requirement, making it more readable for the user. In XML file also it is same way coming for the field. The original application is written in python which creates this error log in XML format.
SILVER EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
Splitting out the CSV you have as you say you want it really isn't that mush easier to read.

I'm betting you then want the final results in the key/pair format as rows and columns:
key                  value
------------------   -----------------------------------------------
franchise            Px Oncology (Solid Tumors, all MA Onc projects)
...
DBP-complexity       (null)
...
approved_countries   'JPN'
...


Do you have access to the Python script or the original log files?  I'm betting it has to be JSON in the original format.  Not sure why the CSV string is added to XML in that way, but I would look at changing that!
suvmitraManager

Author

Commented:
Well, you may be 100% correct.
But, right now I am just concerned about the franchise value not anything else.

I am not concerned about changing the source code, could be too much of work there.

Can we just have a try with first 100 chars of the string provided in excel and then try extracting the franchise value ignoring comma within? It will be really nice.
SILVER EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
If all you want is the first value, the SQL I posted will do that.

Where it broke was when it got into number JSON fields.  Just remove the TABLE piece and grab the first one:
with error_log_temp_5 as (
   select replace(
         q'['franchise': 'Px Oncology (Solid Tumors, all XA Onc projects)', 'zlinical_xombination_study': 'No', 'clinical_comparator_study': 'No]'
         ,''',',chr(254)
      ) error_text
   from dual
)
select
trim(regexp_substr(error_text,'[^' || chr(254) || ']+', 1, 1)) || ''''
from error_log_temp_5
/

Continuing this path will only cause you problems when you need to extract one of the number or array columns...  It will ONLY wok on string values.
suvmitraManager

Author

Commented:
Hi,
This is still not working.

SILVER EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019
Commented:
Simple copy/paste error.  I forgot to replace chr(254) with chr(1).  I copied it from above.

with error_log_temp_5 as (
   select replace(
         q'['franchise': 'Px Oncology (Solid Tumors, all XA Onc projects)', 'zlinical_xombination_study': 'No', 'clinical_comparator_study': 'No]'
         ,''',',chr(1)
      ) error_text
   from dual
)
select
trim(regexp_substr(error_text,'[^' || chr(1) || ']+', 1, 1)) || ''''
from error_log_temp_5
/
Fiddle here:
https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=d06e43f5b830267d35cf8220eb65c491