Solved

text data manipulation

Posted on 2009-05-14
10
197 Views
Last Modified: 2013-11-15
I recently received a screenscrape of a website.  The references to images in content was scrambled.  Fortunately there is a pattern to the scramble so I believe an Update statement or query can remedy the situation.  I would appreciate to be provided with the statement and/or string manipulation code to get this done.

Here is the broken pattern:   <IMG SRC="filename.pdf? images kb site>
Here is what it needs to be:  <IMG SRC="/kb/images/filename.pdf">

We have removed the "site" folder in our new system.
0
Comment
Question by:plord1234
  • 5
  • 5
10 Comments
 
LVL 65

Expert Comment

by:rockiroads
ID: 24389171
how about this

Assuming there is always a ? and double quotes

Replace(Left$(x, InStr(1, x, "?") - 1), Chr$(34), Chr$(34) & "/kb/images/") & Chr$(34) & ">"

where x is your fieldname (note the two occurrences of it)
0
 

Author Comment

by:plord1234
ID: 24389613
thanks rocki,

Your solution is valid, however:

I did not explain in my original question that text could contain multiple images and that a question mark (?) does not necessarily always follow an image file name. However, an image file name is always followed by a "?".
0
 
LVL 65

Expert Comment

by:rockiroads
ID: 24389701
Here is the broken pattern:   <IMG SRC="filename.pdf? images kb site>
Here is what it needs to be:  <IMG SRC="/kb/images/filename.pdf">

RIght, multiple images, can u provide an example? and your saying

the image filename can be either before or after ?

0
 

Author Comment

by:plord1234
ID: 24390065
Example in code snippet
<div class="box greyBox"><table cellspacing="0" cellpadding="3" width="50%" align="center" border="0"><tbody><TR><TD width="100%"><P align=center><B><FONT color=#000046 size=3 face=Arial>Rodeo Advantage  - FAQs</FONT></B></P></TD></TR><TR><TD><FONT size=1>All files and documentation are offered on an *AS IS* basis and you assume full responsibility for using them.</FONT></TD></TR></TBODY></TABLE><FONT size=2 face="arial, veranda"><TABLE border=0 cellPadding=5 width="40%" align=center><TBODY><TR bgColor=#aaaaaa><TH colSpan=2>Effects Tab - Turn ON / OFF Effects</TH></TR><TR><TD><PRE style="font-family: Trebuchet MS, Arial, Verdana;font-size: 10pt">Click on the tab to activate the controls <BR>and Virtual Surround Sound effects.<BR><BR><IMG SRC="aa_efare1.gif? kb_files images site><BR><IMG <IMG SRC="aa_efare2.gif? kb_files images site ><BR><BR>ENVIRONMENTS: 

Open in new window

0
 

Author Comment

by:plord1234
ID: 24390202
Rocki,

Also, what I meant was that a (?) could be in the text and be unrelated to an image file. for instance, a question could be asked in the text.
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 65

Expert Comment

by:rockiroads
ID: 24393821
Is it always fixed to prefix with /kb/images ?

If so this will do that, otherwise a little more tweaking required
This reads a line and then displays the parsed lines to the immediate window

    Dim sHtml As String
    Dim i As Integer
    Dim j As Integer
    Dim sSplitHtml() As String
    Dim sImages As String
   
    sHtml = "MY HTML LINE GOES HERE"
   
    sSplitHtml = Split(sHtml, "<IMG SRC=")
    For i = 1 To UBound(sSplitHtml)
       
        sSplitHtml(i) = Replace(Trim$(left$(sSplitHtml(i), InStr(1, sSplitHtml(i), ">") - 1)), Chr$(34), "")
        sImages = "<IMG SRC=" & Chr$(34) & "/kb/images/" & Replace(left$(sSplitHtml(i), InStr(1, sSplitHtml(i), " ") - 1), "?", "") & Chr$(34) & ">"
   
'HERE ARE YOUR RESULTS        
        Debug.Print sImages
    Next


So from the last sample you gave, it comes up with

            <IMG SRC="/kb/images/aa_efare1.gif">
            <IMG SRC="/kb/images/aa_efare2.gif">

0
 

Author Comment

by:plord1234
ID: 24395671
This is great Rocki.

Could you modify your code slightly so that all of the html is returned with the repaired image references?
0
 
LVL 65

Expert Comment

by:rockiroads
ID: 24398622
that makes it a little more interesting, the worry is the length of the line. Also, as you dont want individual values but the whole line, it might need a rethink, current code probably wont support this.
Anything else or is this it. Helps if you give info as much as possible.
0
 
LVL 65

Accepted Solution

by:
rockiroads earned 500 total points
ID: 24398759
ok. I thought about this and how to make use of what has been done so far and this is what I came up with
first pass is to go thru and sort out img src
then we loop again and display data, note loop starts from different start points. this is intentional

    Dim sHtml As String
    Dim i As Integer
    Dim j As Integer
    Dim sSplitHtml() As String
    Dim sImages As String
    Dim iChev As Integer
    Dim sLine As String
   
   
   
'    <div class="box greyBox"><table cellspacing="0" cellpadding="3" width="50%" align="center" border="0"><tbody><TR><TD width="100%"><P align=center><B><FONT color=#000046 size=3 face=Arial>Rodeo Advantage  - FAQs</FONT></B></P></TD></TR><TR><TD><FONT size=1>All files and documentation are offered on an *AS IS* basis and you assume full responsibility for using them.</FONT></TD></TR></TBODY></TABLE><FONT size=2 face="arial, veranda"><TABLE border=0 cellPadding=5 width="40%" align=center><TBODY><TR bgColor=#aaaaaa><TH colSpan=2>Effects Tab - Turn ON / OFF Effects</TH></TR><TR><TD><PRE style="font-family: Trebuchet MS, Arial, Verdana;font-size: 10pt">Click on the tab to activate the controls <BR>and Virtual Surround Sound effects.<BR><BR><IMG SRC="aa_efare1.gif? kb_files images site><BR><IMG <IMG SRC="aa_efare2.gif? kb_files images site ><BR><BR>ENVIRONMENTS:

    sHtml = "YOUR HTML LINE ABOVE HERE - HOWEVER YOU READ IT"
   
    sSplitHtml = Split(sHtml, "<IMG SRC=")
   
    For i = 0 To UBound(sSplitHtml)
        Debug.Print "Before", sSplitHtml(i)
    Next i
   
    For i = 1 To UBound(sSplitHtml)
       
        iChev = InStr(1, sSplitHtml(i), ">")
        If iChev > 0 Then
            sLine = Replace(Trim$(Left$(sSplitHtml(i), InStr(1, sSplitHtml(i), ">") - 1)), Chr$(34), "")

            sSplitHtml(i) = "<IMG SRC=" & Chr$(34) & "/kb/images/" & Replace(Left$(sLine, InStr(1, sLine, " ") - 1), "?", "") & Chr$(34) & ">" & Mid$(sSplitHtml(i), iChev + 1)
   
        End If
    Next

    For i = 0 To UBound(sSplitHtml)
        Debug.Print "After", sSplitHtml(i)
    Next i
 
0
 

Author Comment

by:plord1234
ID: 24424219
Thanks Rocki!

This worked great.  Is there anyway for me to send you $.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

I annotated my article on ransomware somewhat extensively, but I keep adding new references and wanted to put a link to the reference library.  Despite all the reference tools I have on hand, it was not easy to find a way to do this easily. I finall…
Creating and Managing Databases with phpMyAdmin in cPanel.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now