Solved

Remove HTML Code by Query from a Field in Microsoft Access

Posted on 2007-12-06
32
1,740 Views
Last Modified: 2013-11-27
i manage monthly uploads to a number of different shopping channels. google base is one of these channels. the data that i have comes preformatted by my client. for google base, the feed can be in text delimited format. i manage this data in microsoft access 2003. recently, my feeds are being disapproved because they have changed their requirements. a required field is a description of the product but cannot be the same as the product title. the description field that i have in the preformatted file from my client includes HTML code, with many different tags. i need to remove all these tags before upload. obviously each tag begins with "<" and ends with ">", but there are many variations within the tag.

my question is, how do i write this query (queries)? query (queries) would be run monthly, as the client sends me a new database for upload every month.

i greatly appreciate your expertise and look forward to your helpful resolution.

the database holds >20,000 records.
0
Comment
Question by:joibrooks
  • 13
  • 11
  • 7
32 Comments
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 20419215
You can easily write a clean-up function with the free ActiveX String Component from ChilkatSoft:

http://www.example-code.com/asp/asp_html_entity_decode.asp

<quote>
The Chilkat ASP String Component is freeware and may be used in any ASP
application. To install, download and unzip the ASP String Component.
Then copy the CkString.dll to a directory on your ASP Web Server and
register it with regsvr32.exe.
The CkString.dll is a standard ActiveX component. It may also be used in
Visual Basic 6.0, Delphi, Cold Fusion, Visual FoxPro, PowerBuilder, MFC,
or any other programming language that supports ActiveX components.
</quote>

http://www.example-code.com/asp/aspstring.asp

/gustav
Example code here:
 

<snip>

set cks = Server.CreateObject("CkString.CkString")
 

' Strip HTML from a string.

cks.Str = "<html><body><p>This is a test<br>Here is a <a

href=""http://www.chilkatsoft.com"">test link</a></p></body></html>"

cks.StripHtml

' Prints "This is a testHere is a test link"

Response.Write cks.Str + "<br>"
 

' Perhaps it's better to replace <br> tags with SPACE chars first...

cks.Str = "<html><body><p>This is a test<br>Here is a <a

href=""http://www.chilkatsoft.com"">test link</a></p></body></html>"

cks.ReplaceAll "<br>"," "

cks.ReplaceAll "<BR>"," "

cks.StripHtml

' Prints "This is a test Here is a test link"

Response.Write cks.Str

</snip>

Open in new window

0
 

Author Comment

by:joibrooks
ID: 20419626
i don't have access to an asp server.

i was trying to develop an update query that would find all instances of:

like "<*>"

and replace it with

""
without removing the rest of the character data in the string. i'm looking through all my resource books and the internet to find the correct syntax.

thanks for your response. i wish i could utilize the resolution.
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 20419748
> without removing the rest of the character data in the string

If so, just use Replace:

strClean = Replace(Replace(strHtml, "<", ""), ">", "")

/gustav
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20419842
0
 

Author Comment

by:joibrooks
ID: 20419870
getting warmer. something like building a query with an expression like:

Replace([LongDesc],"<*","",">","")

but this isn't working. i'm getting a data type mismatch in query expression.
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 20419915
Try with:

Replace(Replace([LongDesc], "<", ""), ">", "")

/gustav
0
 

Author Comment

by:joibrooks
ID: 20419983
darn! LongDesc is set to Memo datatype and not Text. the field often holds more than 255 characters. what next?
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 20420017
Try with:

Replace(Replace(CStr([LongDesc]), "<", ""), ">", "")

/gustav
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20420041
I think I see some confusion here.
gustav is replace two specific characters, and not the entire tag. While there may be a way to invoke regular expressions in a query with some sort of UDF, I do not know it. If replace("<*", "") worked, you word be deleting everything after the first tag.
Did you try the link I posted?
0
 

Author Comment

by:joibrooks
ID: 20420057
getting "invalid use of null" error message. btw...thanks for hanging in there...
0
 

Author Comment

by:joibrooks
ID: 20420090
yes jerry. i did look at that and tried building the function but got a compile error.

user-defined type not defined

referring to Dim varRegex As RegExp

??
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20420112
Did you add the reference to regular expressions I also mentioned?
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 20420132
Try with:

Replace(Replace(CStr(Nz([LongDesc], "")), "<", ""), ">", "")

/gustav
0
 

Author Comment

by:joibrooks
ID: 20420359
Replace(Replace(CStr(Nz([LongDesc], "")), "<", ""), ">", "")

WORKS!!!!!!!

can you explain the syntax. i'd like to understand the Nz function.

:c)
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20420732
As far as I can tell, this only removes < and > from field, and not the tags.  If that is what you wanted, fine, but it is not what you asked for in original question. :)
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 49

Expert Comment

by:Gustav Brock
ID: 20420886
Nz() replaces a Null value with something else, here a zero length string.

Replace must have a string to read and CStr() takes care of that, though it should not be needed, as memos normally are read as text. You could try with:

Replace(Replace(Nz([LongDesc], ""), "<", ""), ">", "")

/gustav
0
 

Author Comment

by:joibrooks
ID: 20421049
jerry, i really appreciate your effort to resolve my problem correctly.

i did test the function in Access through a select query in design mode and it produced an exportable result. perhaps i didn't word my original request properly. the replace function that gustav provided removes  opening"<" tags and closing ">" tags and everything between them. this function leaves me with straight text, which is my goal.

-jb
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20421632
That is, of course, the purpose of allowing comments after a question is closed.  And for allowing a question to be re-opened for scoring.
In this case, /gustav provided me such a wealth of knowledge, I do not begrudge him the points.
0
 

Author Comment

by:joibrooks
ID: 20426673
jerryb30 - you were correct!!!!!. what i did not notice was that the replace function was filtering my dataset. how may i reopen this item?

i am now attempting your VB solution and am trying to understand what you meant by:

"add the reference to regular expressions I also mentioned?"

thank you!
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20431411
Sorry, our internet has been down.
In your module, select Tools, References.
Scroll down the list until you see the one for regular expressions:
 Microsoft VBScript Regular Expressions 5.5
and check the box next to it.
0
 

Author Comment

by:joibrooks
ID: 20432665
okay, so the script works as jerryb30 detailed in his last discussion note. but now...

1. some of the records in this new field (i'm calling this nothtml field description) display "#Error"
2. the text in most of these fields have carriage returns (as identified by a box character)
3. the export tab delim file errors out because of number 1 and 2
4. there are fields that are null, but they were null before we ran the select query

0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20432835
(I am thinking this is another question.) Without the actual DB, I cannot think of how to diagnose this issue. If it is a matter of how the text is formatted before it is imported, maybe it can be fixed before this is done.
Do the null fields give #Error##? Is this in the db, or in the exported data?
0
 

Author Comment

by:joibrooks
ID: 20433576
the db comes to me from my client and is littered with inconsistent keystrokes. in this field that i'm trying to standardize there may be null or " " data, text data or html coded data. there is no standard procedure for data input. the datatype is MEMO.

the "#Error" is displayed in the newly created 'nohtml' field - which i've named [description] - generated by our SELECT query. "#Error" is not in every field that appears to be blank. i am thinking that this error message is caused by empty fields or fields with " " data.

as i mentioned, this data is going to Google's shopping channel, called Google Base (aka Froogle). the datafeed is processed but fails to meet their standards. if i push the data as it is with html in the [description] field, the upload is flat-out terminated. with the carriage returns generated by the stripped html data, the text delim file has crazy line breaks. while the upload completes... the file is rejected because it appears to have too many fields. this [description] field is required and Google is apparently running a validity check on the field's data - the field cannot contain the same data as displayed in the record's [title] field, it cannot contain canned marketing phrases, it cannot contain the same data as other records.

i have been pushing this same data as provided by my client on a monthly basis for over five years to various shopping channels. each channel has different requirements. as the channels change their requirements, i further manipulate the data. this current requirement from Google has me stumped.

a little further history of this db: the client uses this data as-is to populate their primary e-commerce system.

i'm upping the point value to 500. this isn't as simple as we all first thought it would be.
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 20439846
> i did not notice was that the replace function was filtering my dataset.

Well, when you remove something from the data, you are filtering it whatever method you use.

/gustav

0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20443488
Again, sorry for delay in response. Without trying to recreate your situation, inconsistent inputs are going to be a problem.  While HTML has substitutes for CR/LF, which works well when reading with a browser, it can play havoc with your display.  The inability to export a text-type memo field with a CR seems a little odd.  Maybe playing with the NZ function will take away for first #ERROR# problems, an allow the export.
Otherwise, you might have to post a sample DB with a few records which represent a range of inconsistent inputs at www-ee-stuff.com


0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20443492
That is: www.ee-stuff.com
0
 

Author Comment

by:joibrooks
ID: 20443834
i've uploaded a +3000 record zipped db with a variety of entries that demonstrate the issues i'm having manipulating the data. thank you!
0
 
LVL 26

Expert Comment

by:jerryb30
ID: 20443888
Please post your exact SQL which gives the errors.
I will look at this when I can, but cannot promise immediate response.
0
 

Author Comment

by:joibrooks
ID: 20444243
jerryb30 - i can't thank you enough for your patience. i appreciate all that you can do for me.

here is the SQL statement:

SELECT tabCatalog.ID AS id, tabCatalog.Price AS price, "http://www.companyname.com/default.asp?CatalogID=" & [catalogID] & "&SubfolderID=76&AssocID=froogle" AS link, IIf(tabcatalog.imagefilename="" Or tabcatalog.imagefilename Is Null,"http://www.companyname.com/merchant/logo.gif","http://www.companyname.com/merchant/" & tabcatalog.imagefilename) AS image_link, "2007-12-30" AS expiration_date, IIf([desc4]="" Or [desc4] Is Null,"Vitamins & Supplements",Trim([desc4])) AS product_type, tabCatalog.Desc6 AS brand, "US:Ground:6.95" AS shipping, tabCatalog.Desc9 AS upc, [desc1] & " - " & [ID] AS title, "New" AS condition, tabCatalog.ID AS mpn, REP(tabCatalog.LongDesc) AS description INTO froogletable
FROM tabCatalog
WHERE (((tabCatalog.ID) Not Like "mpi-*"));
0
 
LVL 26

Accepted Solution

by:
jerryb30 earned 500 total points
ID: 20508964
OK, I looked at the database, and while I cannot figure out the pattern of errors, I did add some NZ checking which appeared to prevent values from changing to null, causing errors, etc.

SELECT tabCatalog.ID AS id, tabCatalog.Price AS price, "http://www.companyname.com/default.asp?CatalogID=" & [catalogID] & "&SubfolderID=76&AssocID=froogle" AS link, IIf(tabcatalog.imagefilename="" Or tabcatalog.imagefilename Is Null,"http://www.companyname.com/merchant/logo.gif","http://www.companyname.com/merchant/" & tabcatalog.imagefilename) AS image_link, "2007-12-30" AS expiration_date, IIf([desc4]="" Or [desc4] Is Null,"Vitamins & Supplements",Trim([desc4])) AS product_type, tabCatalog.Desc6 AS brand, "US:Ground:6.95" AS shipping, tabCatalog.Desc9 AS upc, [desc1] & " - " & [ID] AS title, "New" AS condition, tabCatalog.ID AS mpn, REP(nz([tabCatalog].[LongDesc],"NONE")) AS description INTO froogletable                                       ^ ------------------------------------->^                
FROM tabCatalog
WHERE (((tabCatalog.ID) Not Like "mpi-*"));
0
 

Author Closing Comment

by:joibrooks
ID: 31413118
jerryb30 has well-earned his rank as GENIUS. many many thanks.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Entering a date in Microsoft Access can be tricky. A typo can cause month and day to be shuffled, entering the day only causes an error, as does entering, say, day 31 in June. This article shows how an inputmask supported by code can help the user a…
Outlook Free & Paid Tools
What’s inside an Access Desktop Database. Will look at the basic interface, Navigation Pane (Database Container), Tables, Queries, Forms, Report, Macro’s, and VBA code.
The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now