Solved

Remove HTML Code by Query from a Field in Microsoft Access

Posted on 2007-12-06
32
1,846 Views
Last Modified: 2013-11-27
i manage monthly uploads to a number of different shopping channels. google base is one of these channels. the data that i have comes preformatted by my client. for google base, the feed can be in text delimited format. i manage this data in microsoft access 2003. recently, my feeds are being disapproved because they have changed their requirements. a required field is a description of the product but cannot be the same as the product title. the description field that i have in the preformatted file from my client includes HTML code, with many different tags. i need to remove all these tags before upload. obviously each tag begins with "<" and ends with ">", but there are many variations within the tag.

my question is, how do i write this query (queries)? query (queries) would be run monthly, as the client sends me a new database for upload every month.

i greatly appreciate your expertise and look forward to your helpful resolution.

the database holds >20,000 records.
0
Comment
Question by:joibrooks
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 13
  • 11
  • 7
32 Comments
 
LVL 50

Expert Comment

by:Gustav Brock
ID: 20419215
You can easily write a clean-up function with the free ActiveX String Component from ChilkatSoft:

http://www.example-code.com/asp/asp_html_entity_decode.asp 

<quote>
The Chilkat ASP String Component is freeware and may be used in any ASP
application. To install, download and unzip the ASP String Component.
Then copy the CkString.dll to a directory on your ASP Web Server and
register it with regsvr32.exe.
The CkString.dll is a standard ActiveX component. It may also be used in
Visual Basic 6.0, Delphi, Cold Fusion, Visual FoxPro, PowerBuilder, MFC,
or any other programming language that supports ActiveX components.
</quote>

http://www.example-code.com/asp/aspstring.asp 

/gustav
Example code here:
 
<snip>
set cks = Server.CreateObject("CkString.CkString")
 
' Strip HTML from a string.
cks.Str = "<html><body><p>This is a test<br>Here is a <a
href=""http://www.chilkatsoft.com"">test link</a></p></body></html>"
cks.StripHtml
' Prints "This is a testHere is a test link"
Response.Write cks.Str + "<br>"
 
' Perhaps it's better to replace <br> tags with SPACE chars first...
cks.Str = "<html><body><p>This is a test<br>Here is a <a
href=""http://www.chilkatsoft.com"">test link</a></p></body></html>"
cks.ReplaceAll "<br>"," "
cks.ReplaceAll "<BR>"," "
cks.StripHtml
' Prints "This is a test Here is a test link"
Response.Write cks.Str
</snip>

Open in new window

0
 

Author Comment

by:joibrooks
ID: 20419626
i don't have access to an asp server.

i was trying to develop an update query that would find all instances of:

like "<*>"

and replace it with

""
without removing the rest of the character data in the string. i'm looking through all my resource books and the internet to find the correct syntax.

thanks for your response. i wish i could utilize the resolution.
0
 
LVL 50

Expert Comment

by:Gustav Brock
ID: 20419748
> without removing the rest of the character data in the string

If so, just use Replace:

strClean = Replace(Replace(strHtml, "<", ""), ">", "")

/gustav
0
Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

 
LVL 26

Expert Comment

by:jerryb30
ID: 20419842
0
 

Author Comment

by:joibrooks
ID: 20419870
getting warmer. something like building a query with an expression like:

Replace([LongDesc],"<*","",">","")

but this isn't working. i'm getting a data type mismatch in query expression.
0
 
LVL 50

Expert Comment

by:Gustav Brock
ID: 20419915
Try with:

Replace(Replace([LongDesc], "<", ""), ">", "")

/gustav
0
 

Author Comment

by:joibrooks
ID: 20419983
darn! LongDesc is set to Memo datatype and not Text. the field often holds more than 255 characters. what next?
0
 
LVL 50

Expert Comment

by:Gustav Brock
ID: 20420017
Try with:

Replace(Replace(CStr([LongDesc]), "<", ""), ">", "")

/gustav
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20420041
I think I see some confusion here.
gustav is replace two specific characters, and not the entire tag. While there may be a way to invoke regular expressions in a query with some sort of UDF, I do not know it. If replace("<*", "") worked, you word be deleting everything after the first tag.
Did you try the link I posted?
0
 

Author Comment

by:joibrooks
ID: 20420057
getting "invalid use of null" error message. btw...thanks for hanging in there...
0
 

Author Comment

by:joibrooks
ID: 20420090
yes jerry. i did look at that and tried building the function but got a compile error.

user-defined type not defined

referring to Dim varRegex As RegExp

??
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20420112
Did you add the reference to regular expressions I also mentioned?
0
 
LVL 50

Expert Comment

by:Gustav Brock
ID: 20420132
Try with:

Replace(Replace(CStr(Nz([LongDesc], "")), "<", ""), ">", "")

/gustav
0
 

Author Comment

by:joibrooks
ID: 20420359
Replace(Replace(CStr(Nz([LongDesc], "")), "<", ""), ">", "")

WORKS!!!!!!!

can you explain the syntax. i'd like to understand the Nz function.

:c)
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20420732
As far as I can tell, this only removes < and > from field, and not the tags.  If that is what you wanted, fine, but it is not what you asked for in original question. :)
0
 
LVL 50

Expert Comment

by:Gustav Brock
ID: 20420886
Nz() replaces a Null value with something else, here a zero length string.

Replace must have a string to read and CStr() takes care of that, though it should not be needed, as memos normally are read as text. You could try with:

Replace(Replace(Nz([LongDesc], ""), "<", ""), ">", "")

/gustav
0
 

Author Comment

by:joibrooks
ID: 20421049
jerry, i really appreciate your effort to resolve my problem correctly.

i did test the function in Access through a select query in design mode and it produced an exportable result. perhaps i didn't word my original request properly. the replace function that gustav provided removes  opening"<" tags and closing ">" tags and everything between them. this function leaves me with straight text, which is my goal.

-jb
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20421632
That is, of course, the purpose of allowing comments after a question is closed.  And for allowing a question to be re-opened for scoring.
In this case, /gustav provided me such a wealth of knowledge, I do not begrudge him the points.
0
 

Author Comment

by:joibrooks
ID: 20426673
jerryb30 - you were correct!!!!!. what i did not notice was that the replace function was filtering my dataset. how may i reopen this item?

i am now attempting your VB solution and am trying to understand what you meant by:

"add the reference to regular expressions I also mentioned?"

thank you!
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20431411
Sorry, our internet has been down.
In your module, select Tools, References.
Scroll down the list until you see the one for regular expressions:
 Microsoft VBScript Regular Expressions 5.5
and check the box next to it.
0
 

Author Comment

by:joibrooks
ID: 20432665
okay, so the script works as jerryb30 detailed in his last discussion note. but now...

1. some of the records in this new field (i'm calling this nothtml field description) display "#Error"
2. the text in most of these fields have carriage returns (as identified by a box character)
3. the export tab delim file errors out because of number 1 and 2
4. there are fields that are null, but they were null before we ran the select query

0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20432835
(I am thinking this is another question.) Without the actual DB, I cannot think of how to diagnose this issue. If it is a matter of how the text is formatted before it is imported, maybe it can be fixed before this is done.
Do the null fields give #Error##? Is this in the db, or in the exported data?
0
 

Author Comment

by:joibrooks
ID: 20433576
the db comes to me from my client and is littered with inconsistent keystrokes. in this field that i'm trying to standardize there may be null or " " data, text data or html coded data. there is no standard procedure for data input. the datatype is MEMO.

the "#Error" is displayed in the newly created 'nohtml' field - which i've named [description] - generated by our SELECT query. "#Error" is not in every field that appears to be blank. i am thinking that this error message is caused by empty fields or fields with " " data.

as i mentioned, this data is going to Google's shopping channel, called Google Base (aka Froogle). the datafeed is processed but fails to meet their standards. if i push the data as it is with html in the [description] field, the upload is flat-out terminated. with the carriage returns generated by the stripped html data, the text delim file has crazy line breaks. while the upload completes... the file is rejected because it appears to have too many fields. this [description] field is required and Google is apparently running a validity check on the field's data - the field cannot contain the same data as displayed in the record's [title] field, it cannot contain canned marketing phrases, it cannot contain the same data as other records.

i have been pushing this same data as provided by my client on a monthly basis for over five years to various shopping channels. each channel has different requirements. as the channels change their requirements, i further manipulate the data. this current requirement from Google has me stumped.

a little further history of this db: the client uses this data as-is to populate their primary e-commerce system.

i'm upping the point value to 500. this isn't as simple as we all first thought it would be.
0
 
LVL 50

Expert Comment

by:Gustav Brock
ID: 20439846
> i did not notice was that the replace function was filtering my dataset.

Well, when you remove something from the data, you are filtering it whatever method you use.

/gustav

0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20443488
Again, sorry for delay in response. Without trying to recreate your situation, inconsistent inputs are going to be a problem.  While HTML has substitutes for CR/LF, which works well when reading with a browser, it can play havoc with your display.  The inability to export a text-type memo field with a CR seems a little odd.  Maybe playing with the NZ function will take away for first #ERROR# problems, an allow the export.
Otherwise, you might have to post a sample DB with a few records which represent a range of inconsistent inputs at www-ee-stuff.com


0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20443492
That is: www.ee-stuff.com
0
 

Author Comment

by:joibrooks
ID: 20443834
i've uploaded a +3000 record zipped db with a variety of entries that demonstrate the issues i'm having manipulating the data. thank you!
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20443888
Please post your exact SQL which gives the errors.
I will look at this when I can, but cannot promise immediate response.
0
 

Author Comment

by:joibrooks
ID: 20444243
jerryb30 - i can't thank you enough for your patience. i appreciate all that you can do for me.

here is the SQL statement:

SELECT tabCatalog.ID AS id, tabCatalog.Price AS price, "http://www.companyname.com/default.asp?CatalogID=" & [catalogID] & "&SubfolderID=76&AssocID=froogle" AS link, IIf(tabcatalog.imagefilename="" Or tabcatalog.imagefilename Is Null,"http://www.companyname.com/merchant/logo.gif","http://www.companyname.com/merchant/" & tabcatalog.imagefilename) AS image_link, "2007-12-30" AS expiration_date, IIf([desc4]="" Or [desc4] Is Null,"Vitamins & Supplements",Trim([desc4])) AS product_type, tabCatalog.Desc6 AS brand, "US:Ground:6.95" AS shipping, tabCatalog.Desc9 AS upc, [desc1] & " - " & [ID] AS title, "New" AS condition, tabCatalog.ID AS mpn, REP(tabCatalog.LongDesc) AS description INTO froogletable
FROM tabCatalog
WHERE (((tabCatalog.ID) Not Like "mpi-*"));
0
 
LVL 26

Accepted Solution

by:
jerryb30 earned 500 total points
ID: 20508964
OK, I looked at the database, and while I cannot figure out the pattern of errors, I did add some NZ checking which appeared to prevent values from changing to null, causing errors, etc.

SELECT tabCatalog.ID AS id, tabCatalog.Price AS price, "http://www.companyname.com/default.asp?CatalogID=" & [catalogID] & "&SubfolderID=76&AssocID=froogle" AS link, IIf(tabcatalog.imagefilename="" Or tabcatalog.imagefilename Is Null,"http://www.companyname.com/merchant/logo.gif","http://www.companyname.com/merchant/" & tabcatalog.imagefilename) AS image_link, "2007-12-30" AS expiration_date, IIf([desc4]="" Or [desc4] Is Null,"Vitamins & Supplements",Trim([desc4])) AS product_type, tabCatalog.Desc6 AS brand, "US:Ground:6.95" AS shipping, tabCatalog.Desc9 AS upc, [desc1] & " - " & [ID] AS title, "New" AS condition, tabCatalog.ID AS mpn, REP(nz([tabCatalog].[LongDesc],"NONE")) AS description INTO froogletable                                       ^ ------------------------------------->^                
FROM tabCatalog
WHERE (((tabCatalog.ID) Not Like "mpi-*"));
0
 

Author Closing Comment

by:joibrooks
ID: 31413118
jerryb30 has well-earned his rank as GENIUS. many many thanks.
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Describes a method of obtaining an object variable to an already running instance of Microsoft Access so that it can be controlled via automation.
In Part II of this series, I will discuss how to identify all open instances of Excel and enumerate the workbooks, spreadsheets, and named ranges within each of those instances.
In Microsoft Access, learn how to use Dlookup and other domain aggregate functions and one method of specifying a string value within a string. Specify the first argument, which is the expression to be returned: Specify the second argument, which …
The viewer will learn how to  create a slide that will launch other presentations in Microsoft PowerPoint. In the finished slide, each item launches a new PowerPoint presentation and when each is finished it automatically comes back to this slide: …

735 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question