Link to home
Start Free TrialLog in
Avatar of PvBredow
PvBredow

asked on

Is CreateObject("InternetExplorer.Application") incompatible with ASP?

As part of a search engine I am developing, I was having trouble searching files and NOT searching the formatting.
If I opened each potential file as a textstream using a filesystemobject, I would end up unable to distinguish visible text from all the formatting strings, and occasionally end up with false matches.

To get around this, I thought perhaps I could have my ASP page open an InternetExplorer application, have IE open each file, and use IE.document.Body.innerText to get the visible text, which could then be searched.      I wasn't sure how slowly it would run, but I thought it would be effective.

However, it seems that my ASP page hangs on the line:
      Set IE = CreateObject("InternetExplorer.Application")

The page loading seems to run forever once it reaches this point, and doesn't seem to time out until after about 20 minutes!       I have experimented with commenting out lines, and I have confirmed that this line is definitely the source of the problem.

This leads to my questions:
- is there any way that I can open an IE application from ASP
- is there a way to set something so that if my page can't be loaded in a certain length of time, it times out, and the browser will display 'page can not be loaded'

               thanks,
                             PvBredow
Avatar of kevinkcw
kevinkcw

I think you're taking the long road. Why not just load the text into a variable and parse out the tags? Drop everything between and including '<' and '>', or use whatever the ASP equivalent to PHP's striptags() function is, and you're golden. If I'm missing your point entirely, I'm sorry.
You probably want to use ServerXMLHTTP instead of opening a browser on your web server.
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/xmlsdk/html/e5c17f89-0197-496c-9164-ce0bbbd8490f.asp
Is there a problem with that method if the html response is not properly formatted XML? Won't it bomb out?
ServerXMLHTTP will not help as it returns the full page source which PvBredow is trying to avoid happening, i think there must be a way to strip the tags out using ASP... i'll have a quick scout now.
You can use this function to strip the HTML tags from your page source using FSO and then search the documents viewable text.

<%
Function stripHTMLTags(val)
  set re = new RegExp
  re.pattern = "<[\w/]+[^<>]*>"
  re.global=true
  stripHTMLTags = re.replace(val,"")
End Function
%>
Avatar of PvBredow

ASKER

My original question related to whether it was possible to open an Internet.Explorer application from within ASP, and interact with it, such as to read IE.document.Body.innerText as a way to get the text.

I appreciate the responses, but they have all recommended alternate approaches, not the method I was asking about.

Since I couldn't resolve my original design problems, I have redesigned with a different approach, so this question is no longer important me.

Thanks for the suggestions, but I would now like to close/withdraw the question.
ASKER CERTIFIED SOLUTION
Avatar of davecestria
davecestria

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If you want to keep trying to use IE the first thing I would do is check is that IUSR_MACHINE has permissions to C:\Documents and Settings\IUSR_MACHINE.
For testing maybe you could add IUSR_MACHINE to the admin group or try running your code as a vbs script.
I think we posted usable solutions... split is fine for me.