Avatar of jknj72
jknj72
 asked on

Oracle question related to question ID: 27979543

I posted a question to get the text out of a string without special characters included. I got the answer but upon looking at what I actually needed its not that simple. I had an example that looked something like the following

'~(|_|*|football)~' and I wanted football returned and I am able to do that

I then look at the data in the field and it has the following

blablablablablablabla '~(|_|*|football)~' and then more text blablablablabla

so the part of the string with the special characters is only a portion of the text in the field

I would need the following returned
blablablablablablabla football and then more text blablablablabla

I tried figuring it out but the ticks(') are really screwing me up and whenever I run it the variable screen comes up..

Any thoughts?
Oracle DatabaseSQL

Avatar of undefined
Last Comment
jknj72

8/22/2022 - Mon
slightwv (䄆 Netminder)

If 'valid' characters are letters, numbers and a space try this:

select regexp_replace('blablablablablablabla ''~(|_|*|football)~'' and then more text blablablablabla','[^a-zA-Z0-9 ]') from dual;

What '[^a-zA-Z0-9 ]' does is says if it is NOT '^' a lowercase letter, a-z, an upper case letter, A-Z, a number, 0-9 or a space (just inside the closing ']'), replace it with nothing.
David VanZandt

You say you are seeing the variable screen -- meaning you are prompted to replace some placeholder with a value?  A simple example is to code &zzz -- the prompt is then to replace "zzz".

If I am on track, then your session is set with some substibution value other than the default "&" -- snapshot the prompt window, please.  Secondly, if you are not substituting a value, then make sure to begin with, SET DEFINE OFF .
awking00

This query takes all characters before the '~ (whether or not they exist) and concatenates them with the function sdstuber provided against all of the characters between the '~ and ~' and concatenates them with all of the characters after the ~'  (whether or not they exist), adding a space between the three parts then trimming any outside spaces -

select trim(substr(text,instr(text,'~'||chr(39)) + 2)||' '||
REPLACE(SUBSTR(REGEXP_SUBSTR(text, '~[^~]+~'), INSTR(REGEXP_SUBSTR(text, '~[^~]+~'), '|', -1) + 1), ')~')||' '||
substr(text,instr(text,'~'||chr(39)) + 2)) mystring
from txt;

MYSTRING
-------------------------------------------------------
football
Mobile, Alabama,
blahblahblah football blahblahblah
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
awking00

To show the beginning strings -
SQL> select * from txt;
TEXT
-------------------------------------------
'~(`|¤|football)~'
'~(`|¤|{link:10098}|Mobile, Alabama,)~'
blahblah'~(`|¤|football)~'blahblahblah
jknj72

ASKER
awking00 I get an invalid character error when trying to run. Please note that I am very new to Oracle, but learning more every day, and somehow I cant figure out how to declare a variable 'text' and set it and then run the select statement using the variable in TOAD. So I just replaced 'text' everywhere with the actual string so Im not sure that if I figure out how to do that, that it will work but I will keep trying.
Anyway, I have attached a screenshot of the Select statement and you will notice that it is giving me an invalid character when it reaches the football portion of the string. It actually has the open paran highlighted for the error.

I appreciate the help!!
Screenshot-Error.jpg
awking00

Note how I showed the strings in my txt table. They already included the apostrophes. The problem is when you use the command substr('blabla... '~...',..., it's expecting the string to end after the second apostrophe followed by a comma, then the second parameter, etc. To get around this, you would need to issue the command as substr('blabla...''~... with an additional apostrophe. This is what I had to do in order to get the "actual" string (that conatined apostrophes) inserted into the table.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
awking00

In your original question, you stated that you had a text field, presumably in a table, that you needed to manipulate. You should be able to manipulate that field with my query by substituting that field name for "text" and that tablename for "txt."
jknj72

ASKER
ok Im sorry but Im confused. the string will be in the table with just a single quote around that portion of the string? If I put the extra tic mark around the '~(`|¤|football)~' to be
''~(`|¤|football)~'' then it will work but the string is in the table as
'blablablablablablabla '~(|_|*|football)~' and then more text blablablablabla'
How would I get the additional tics in there in order to get this to work. FYI, I really dont wanna do that but I dont know if it can be done otherwise. Also, if I get a list from the front end developer of the characters being used in the football portion of the string would that help? Maybe we can do a replace or something? I know we are close though!!!!
awking00

If it's already in the table with the single quotes, then just apply the query to that field.
Select ... substr(thatfield,...)
as opposed to
Select ... substr('blablablablabla'~(|_|*|football)~' and then more text blablablablabla', ...
since this requires the apostrophes to identify the first argument as a string expression, but "thatfield" is already identified as a string expression. It worked before using sdstuber's method. It's just that now the argument for the middle portion that uses that method is not the text field itself but the substring of the text field between the '~ and the ~'. You can still insert the test field in the substring used to get the first and last portions.
By the way, if there already a space before and after the blabla... , there's no need to concatenate the space and use the trim function. I also see I made a mistake copying and pasting

Breaking it up might make it more understandable -

First part of text field is
select substr(text,1, instr(text,'~'||chr(39)) - 1)
Second part is
REPLACE(SUBSTR(REGEXP_SUBSTR(text, '~[^~]+~'), INSTR(REGEXP_SUBSTR(text, '~[^~]+~'), '|', -1) + 1), ')~') Where the part in bold replaces what sdstuber called "yrcol" in the accepted answer to your previous question.
And the third part is
substr(text,instr(text,'~'||chr(39)) + 2)) mystring
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
jknj72

ASKER
ok I ran the following query without the logic

select rownum, para_text
FROM omni_ent_paragraph where rownum < 3 and para_text like '%|%'
order by para_text ;

And I get these results

Morris and Essex Line  to Liberty State Park - '~(`|¤|6:00am-10:00am)~'
Convent Station
Madison Station
Chatham Station


Then I run this with the logic

select rownum, trim(substr(para_text,instr(para_text,'~'||chr(39)) + 2)||' '||
REPLACE(SUBSTR(REGEXP_SUBSTR(para_text, '~[^~]+~'), INSTR(REGEXP_SUBSTR(para_text, '~[^~]+~'), '|', -1) + 1), ')~')||' '||
substr(para_text,instr(para_text,'~'||chr(39)) + 2)) mystring
from omni_ent_paragraph where rownum < 3 and para_text like '%|%'
order by para_text;

and got these results

Convent Station
Madison Station
Chatham Station
 6:00am-10:00am  
Convent Station
Madison Station
Chatham Station

Here is another result from each of the queries

With logic

Gladstone Station
Peapack Station
Far Hills Station
Millington Station
Gillette Station
Murray Hill Station
 6:00am-10:00am
Gladstone Station
Peapack Station
Far Hills Station
Millington Station
Gillette Station
Murray Hill Station

Without logic

Gladstone Branch to Summit  Station -  '~(`|¤|6:00am-10:00am)~'
Gladstone Station
Peapack Station
Far Hills Station
Millington Station
Gillette Station
Murray Hill Station

It seemns as if the portion after the parsing is being duplicated and replacing the front of the string somehow?

Sorry so long just wanted to make sure Im explaining this correct....

Thanks
slightwv (䄆 Netminder)

For this row:
Morris and Essex Line  to Liberty State Park - '~(`|¤|6:00am-10:00am)~'

What are your expected results?
Do you just want the text or do you want specific text on their own individual rows?
awking00

>>Then I run this with the logic
select rownum, trim(substr(para_text,instr(para_text,'~'||chr(39)) + 2)||' '||<<
Did you notice that I had made a copy and paste error and the first part should have been -
select [rownum],substr(text,1, instr(text,'~'||chr(39)) - 1) ...
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
jknj72

ASKER
I didnt see the error. I added the line you provided and got these results

Gladstone Branch  to Liberty State Park -  '~(`|¤|6:00am-10:00am) 6:00am-10:00am  
Bernardsville Station
Basking Ridge Station
Lyons Station
Berkeley Heights Station
New Providence Station

Im looking for this

Gladstone Branch  to Liberty State Park -  6:00am-10:00am  
Bernardsville Station
Basking Ridge Station
Lyons Station
Berkeley Heights Station
New Providence Station


So we are really close.....
slightwv (䄆 Netminder)

Did you ever look at what I posted in my first post?

Check the following simple test case.  I did add some additional characters to not be replaced based on your latest data.

drop table tab1 purge;
create table tab1(col1 varchar2(100));

insert into tab1 values('Gladstone Branch  to Liberty State Park -  ''~(`|¤|6:00am-10:00am) 6:00am-10:00am');

select regexp_replace(col1,'[^a-zA-Z0-9: -]') from tab1;

Open in new window

awking00

Two issues to resolve. One, I actually need to correct my correction.
This -
select substr(text,1, instr(text,'~'||chr(39)) - 1)
should have been this -
select substr(text,1, instr(text,chr(39)||'~') - 1)

Two, previously your examples would have shown this -
Gladstone Branch  to Liberty State Park -  '~(`|¤|6:00am-10:00am) 6:00am-10:00am
like this -
Gladstone Branch  to Liberty State Park -  '~(`|¤|6:00am-10:00am)~' 6:00am-10:00am
with the ~' occurring after the parenthesis
So the incorrect first part would return nothing. Because the ~' did not exist, it was looking for a substring starting at position 1 for a length of -1.
The second part would return nothing because there is no substring beginning with '~ and ending with~'.
And the third part should have returned the entire string starting at the second position Is there possibly a space at the beginning of that string or are you sure it didn't return a string starting with "ladstone" ... instead of "Gladstone" ...?

Given all of that explanation, my next question is in order to get 'Gladstone Branch  to Liberty State Park -  6:00am-10:00am' as your desired result, "Is that because the '6:00am - 10:00 am' comes after the closing parenthesis or because it comes after the last pipe ("|") and before the closing parenthesis?"
Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
jknj72

ASKER
awking
>>Given all of that explanation, my next question is in order to get 'Gladstone Branch  to Liberty State Park -  6:00am-10:00am' as your desired result, "Is that because the '6:00am - 10:00 am' comes after the closing parenthesis or because it comes after the last pipe ("|") and before the closing parenthesis?"

I need everything before the '~(
and whatever text after the last pipe and before the ~)' without the special characters

Then everything after the ~)'

all concatenated

slight
Im going to try it again now

Thanks to both of you!!
slightwv (䄆 Netminder)

>>I need everything before the '~( and whatever text after the last pipe and before the ~)' without the special characters

Then wouldn't:
Gladstone Branch  to Liberty State Park -  '~(`|¤|6:00am-10:00am) 6:00am-10:00am  

Become:
Gladstone Branch  to Liberty State Park -  6:00am-10:00am 6:00am-10:00am  

and not what you posted as desired?
jknj72

ASKER
>>Then wouldn't:
Gladstone Branch  to Liberty State Park -  '~(`|¤|6:00am-10:00am) 6:00am-10:00am  

Become:
Gladstone Branch  to Liberty State Park -  6:00am-10:00am 6:00am-10:00am  

It never had '~(`|¤|6:00am-10:00am) 6:00am-10:00am
It was always just '~(`|¤|6:00am-10:00am)~'

Bottom line
ABC '~(`|¤|6:00am-10:00am)~' DEF

I need
ABC 6:00am-10:00am DEF

Sorry if Ive gotten off my orig post somewhat....

Also, slight yours is real close I just tried it and I think I may be able to tweak it a little to give me the desried results. I will keep trying
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
slightwv (䄆 Netminder)

>>Also, slight yours is real close

Please post where it is failing and I'll tweak it with you.
awking00

>>It never had '~(`|¤|6:00am-10:00am) 6:00am-10:00am
It was always just '~(`|¤|6:00am-10:00am)~' <<
So the actual data was
Gladstone Branch  to Liberty State Park -  '~(`|¤|6:00am-10:00am)~' ?
jknj72

ASKER
>>So the actual data was
Gladstone Branch  to Liberty State Park -  '~(`|¤|6:00am-10:00am)~' ?

The actualy data for this record is:
Gladstone Branch  to Liberty State Park -  '~(`|¤|6:00am-10:00am)~'  
Bernardsville Station
Basking Ridge Station
Lyons Station
Berkeley Heights Station
New Providence Station
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
awking00

Can we try again? This seems to work -
SQL> select * from txt;
TEXT
---------------------------------------------------------------------
'~(`|¤|football)~'
'~(`|¤|{link:10098}|Mobile, Alabama,)~'
blahblah'~(`|¤|football)~'blahblahblah
Gladstone Branch  to Liberty State Park -  '~(`|¤|6:00am-10:00am)~'

SQL> select substr(text,1, instr(text,chr(39)||'~') - 1)||
  2  REPLACE(SUBSTR(REGEXP_SUBSTR(text, '~[^~]+~'), INSTR(REGEXP_SUBSTR(text, '~
[^~]+~'), '|', -1) + 1), ')~')||
  3  substr(text,instr(text,'~'||chr(39)) + 2) mystr
  4  from txt;
MYSTR
--------------------------------------------------------------------------------
football
Mobile, Alabama,
blahblahfootballblahblahblah
Gladstone Branch  to Liberty State Park -  6:00am-10:00am
awking00

Are there any other cases such as no '~' and no '|' or just '~' without '|' or just '|' without '~'? If so, what is the selection criteria?
slightwv (䄆 Netminder)

>>This seems to work -

Seems like a lot of extra calls?

The sample below seems to replicate the output above with a single call.

More accurate requirements would help greatly.  for example do you ALWAYS have the closing )~' that will ALWAYS match a starting '~( ?

drop table tab1 purge;
create table tab1(col1 varchar2(100));

insert into tab1 values('''~(`|¤|football)~''');
insert into tab1 values('''~(`|¤|{link:10098}|Mobile, Alabama,)~''');
insert into tab1 values('blahblah''~(`|¤|football)~''blahblahblah');
insert into tab1 values('Gladstone Branch  to Liberty State Park -  ''~(`|¤|6:00am-10:00am)~''');
commit;

select regexp_replace(col1,'([^'']*)(.*\|)(.*)(\).*[''])(.*)','\1\3\5') from tab1; 

Open in new window

⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
jknj72

ASKER
yes awkin there are times when there are no special characters and those records are not coming back correct. Furthermore there are instances where there are multiple cases where there are the special characters.

FYI,
the premise behind what we are doing is we are giving users the ability to create web pages and in this paragraph section they can make any words they want italicized, bold, underlined, etc... Thats where the special characters come in, and they can do this multiple times throughout their paragraph or they dont have to do it at all.

slight, your initial replace is working but I guess my next problem is if there is a crlf it is not keeping it. I think if I can get that figured out then I will be good. I have an example if youd like. I also tried to run what you just posted and it runs for a while so I stop it before it executes.

I appreciate all the work you both have been putting in to help me. I hopefully can pass it on one day!!
ASKER CERTIFIED SOLUTION
slightwv (䄆 Netminder)

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
jknj72

ASKER
they are being put there on purpose in the database and they are being displayed without the speical characters in another portion so its not being saved without the characters its just how I am displaying them.

I will try now..Thanks
jknj72

ASKER
That worked great. Very impressive....THANK YOU!!!
Your help has saved me hundreds of hours of internet surfing.
fblack61
slightwv (䄆 Netminder)

>>That worked great

Be careful of edge cases!  That is data that might slip through the cracks that you didn't expect.
jknj72

ASKER
great job