Link to home
Start Free TrialLog in
Avatar of thenone
thenone

asked on

IE Object

How would you use the internet explorer function to take an html page that is in a string and save the ouput as text in another string and put this into a function.
Avatar of Brian Mulder
Brian Mulder
Flag of Netherlands image

Hello thenone,

you could try to use the xmlhttp object for this

something like
--------------
Sub h()
  GetAndSave "https://www.experts-exchange.com"
End Sub

Public Function GetAndSave(sInput As String) As Boolean
Dim objXML As object
Dim fso As Object, fs As Object
Dim strMOIOutput As String

Set objXML = CreateObject("Microsoft.XMLHTTP")

objXML.Open "GET", sInput, False
objXML.send ""
strMOIOutput = objXML.responseText
 
Set fso = CreateObject("Scripting.FileSystemObject")
Set fs = fso.CreateTextFile("d:\\ResultPage.html", 2, True)
fs.write (strMOIOutput)
fs.Close

Set fs = Nothing
Set fso = Nothing
Set objXML = Nothing
End Function
--------------

sub h is only a test to see if it works

hope this helps a bit
bruintje
Avatar of thenone
thenone

ASKER

The thing is I don't need to get the page I already have the page in a string.
then i miss the pooint, you already have the page in a string and now need to copy this to another string why do you need IE for that?
Avatar of thenone

ASKER

I need IE to strip the html and just put it in another string as output.Or another great way of doing this.
not sure if you can load a string into the ie object directly saw Azrasound do that once in a thread in VB, but in a bit of hurry now so i'll check back later if there are no others commenting
Avatar of thenone

ASKER

ok thanks
Avatar of thenone

ASKER

I do have this function I tired modifying it because it cuases my program to not run smooth the hour glass on my cursor keeps coming on.

Public Function strStrip_HTML_Tags(ByVal strText As String) As String

  Dim objInternetExplorer_Application                   As Object
  Dim strReturn                                         As String
 
  On Error GoTo Err_strStrip_HTML_Tags
 
  Set objInternetExplorer_Application = CreateObject("InternetExplorer.Application")
 
  If Not (objInternetExplorer_Application Is Nothing) Then
     objInternetExplorer_Application.Navigate "about:blank"
     
     objInternetExplorer_Application.Application.Document.Open
     objInternetExplorer_Application.Application.Document.Write (strText)
     objInternetExplorer_Application.Application.Document.Close
     
     
        DoEvents
     
     
     strReturn = objInternetExplorer_Application.Document.body.InnerText
     
     objInternetExplorer_Application.Quit
  End If
 
Exit_strStrip_HTML_Tags:

  On Error Resume Next
 
  Set objInternetExplorer_Application = Nothing
 
  strStrip_HTML_Tags = strReturn
 
  Exit Function
 
Err_strStrip_HTML_Tags:

  On Error Resume Next
 
  strReturn = ""
 
  Resume Exit_strStrip_HTML_Tags
 
End Function

Avatar of thenone

ASKER

Maybe a remodifaction of the above would work.
ASKER CERTIFIED SOLUTION
Avatar of Brian Mulder
Brian Mulder
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of [ fanpages ]
I think I was the original author of "strStrip_HTML_Tags()" [it certainly looks like something I would have written in a PAQ].  If I recall correctly, the question was asking to retrieve the contents of a referenced URL, not like you case where the page contents already exist in a string variable.

I would suggest looking at Regular Expressions, as the use of the "InternetExplorer.Application" object is overkill here.

BFN,

fp.

For example:

"how to remove html from excel spreadsheet"
[ https://www.experts-exchange.com/questions/21359240/how-to-remove-html-from-excel-spreadsheet.html ]

Using this function:
Function regExpReplace(strSource, strSearchPattern As String, strReplacePattern As String, Optional IgnoreCase As Boolean = True)
    Dim regEx As Object
    Set regEx = CreateObject("vbscript.regexp")
   
    regEx.Pattern = strSearchPattern
    regEx.IgnoreCase = IgnoreCase
    regEx.Global = True
    regExpReplace = regEx.Replace(strSource, strReplacePattern)
End Function


Usage would be:

Dim strHTML As String

strHTML = "<a href=http://NigelLee.info>Click to view my page</a>"

MsgBox regExpReplace(strHTML, "<[^>]*>", "")


BFN,

fp.
Avatar of thenone

ASKER

Fanpages great you are here I believe you were the one that helped me with this in the past.I want to be able to use the IE becasue it does remove every single kind of script from the html.Ive experimiented with alot thought it would work but it doesn't.I just want to really make sure that all tags are gone!!! Otherwise it will really mess up my program.Ive started over and over again really fustrated.
Avatar of thenone

ASKER

For example if I have javascript perl etc inside of the page it looks like internet exoplorer does return anything to regular text.
Hi,

If you make your application available for download (or e-mail to me & I'll post on my web site for everyone else to review), then we can see if the hourglass issue is only evident in your environment or not.

BFN,

fp.
Avatar of thenone

ASKER

no offense I would rather not do that I think my problem is the looping in what you had wrote.I believe I rememeber what you wrote earlier was a global variable outside of the function made it work faster.Your comment was I believe was I'm no expert but I'm getting there.So maybe if we somehow modified it so no looping at all just straight putting the string into IE and returning the output text as a string,With the global call of IE outside of the function so its not being loaded over and over again.I looked at this function about a month but I'm really stubborn and finally came to the realization that this is the best that will suit my needs.Ive tried stripping with regular functions and it didn't strip everything because there are zillions of different ways of writing code.If all possible to help me with this suggestion would be greatly appreciated.By the way fanpages it seems like the more questions I ask the more you will be better than an expert in here.
Avatar of thenone

ASKER

bruinte I hope you didn't take offense.I did look at your suggestion and tried it.IE when I tried calling it the compiler turned an error statement ref of IE invalid in this matter strStrip_HTML_Tags(strHTML, myIE)
Avatar of thenone

ASKER

Fanpages or bruinte how would I write without the loop and to porperly have the global setup.
ok, not sure what your setup is but you call a function to strip the html, not sure then where the hourglass comes from except for the creation and deletion of the ie object, which can be setup otherwise in a global variable

in the top of your module put something like

Public m_myIE as Object

Then in your initializing code call the creation of the object

Set m_myIE = CreateObject("InternetExplorer.Application")

put this line in the closure of your application

Set m_myIE = Nothing

Now where ever you are in your app you can call the m_myIE object to perform its tricks if you just not set it to nothing while you still need it

call it by the lines

DoEvents
strStrip_HTML_Tags(strHTML)

Public Function strStrip_HTML_Tags(ByVal strText As String) As String

  Dim strReturn As String
 
  On Error GoTo Err_strStrip_HTML_Tags
 
  If Not (m_myIE Is Nothing) Then
     m_myIE.Navigate "about:blank"
     m_myIE.Application.Document.Open
     m_myIE.Application.Document.write (strText)
     m_myIE.Application.Document.Close
     strReturn = m_myIE.Document.body.innerText
  End If
 
Exit_strStrip_HTML_Tags:

  On Error Resume Next
  strStrip_HTML_Tags = strReturn
  Exit Function

Err_strStrip_HTML_Tags:

  On Error Resume Next
  strReturn = ""
  Resume Exit_strStrip_HTML_Tags
End Function
Avatar of thenone

ASKER

I call the strip_Html in another function for example
Private sub_Onclick()
text3 = strip_Html(strhtml)
bla bla bla

then besides the inclusion of the declaration above in the module

it would become something like

Private sub_Onclick()
DoEvents
text3 = strStrip_HTML_Tags(strHTML)
Avatar of thenone

ASKER

so
Private sub_Onclick()

Set m_myIE = CreateObject("InternetExplorer.Application")
Set m_myIE = Nothing

DoEvents
text3 = strStrip_HTML_Tags(strHTML)


did the following

had a form and a button in vb

pasted this code in the form module

---------------------
Option Explicit

Private Sub Form_Load()
  Set m_myIE = CreateObject("InternetExplorer.Application")
End Sub

Private Sub Command1_Click()
Dim strHTML  As String
  DoEvents
  strStrip_HTML_Tags (strHTML)
End Sub

Private Sub Form_Unload(Cancel As Integer)
  Set m_myIE = Nothing
End Sub
---------------------

inserted a module and pasted this code

---------------------
option explicit

Public m_myIE As Object

Public Function strStrip_HTML_Tags(ByVal strText As String) As String

  Dim strReturn As String
 
  On Error GoTo Err_strStrip_HTML_Tags
 
  If Not (m_myIE Is Nothing) Then
     m_myIE.Navigate "about:blank"
     m_myIE.Application.Document.Open
     m_myIE.Application.Document.write (strText)
     m_myIE.Application.Document.Close
     strReturn = m_myIE.Document.body.innerText
  End If
 
Exit_strStrip_HTML_Tags:

  On Error Resume Next
  strStrip_HTML_Tags = strReturn
  Exit Function

Err_strStrip_HTML_Tags:

  On Error Resume Next
  strReturn = ""
  Resume Exit_strStrip_HTML_Tags
End Function

---------------------


Avatar of thenone

ASKER

oh ok on form load is when you set the object!!
yes, because it will be set only once and deleted only once on program init and terminate
this way you do not hae to create and cleanup the object while doing the html strip, but if that takes away the hourglass i do not know
Avatar of thenone

ASKER

i will test it out and let you know it probably should since it only sets it once and not over and over again
Avatar of thenone

ASKER

ok im getting no output when I set it outside on form load.
Avatar of thenone

ASKER

It works great thanks for your help I was missing the option explicit.
glad it works now, thanks for the grade :)

if there are still problems with this just comment
Avatar of thenone

ASKER

One last question what does doevents do
The DoEvents() function yields execution so that the operating system can process other events executing concurrently.
DoEvents yields operation to the operating system so that it can process other events

so if you are processing long calculations or something like a download the processor timeslices areshared between the running process and the long operation
this way the user has the notion that the program is going on while the long operation is running in the background while in reality they just do a bit of their work in a tit for tat sharing of processor time
Avatar of thenone

ASKER

Oh so another words it won't give all of the processor to the current function.
Yes, that's correct.

Concurrently executing applications will be offered their respective timeslice of the multi-tasking environment.

DoEvents (or "Yield") calls were more useful during Windows 3.1 (or earlier) when non-pre-emptive multitasking was used.  This approach allocated the machine's CPU to a process (or application) until that process yielded, or completed.  It was common for the computer to "freeze" or "crash" because a single process had failed, and hence was not providing the relevant system call to yield so that other applications could be serviced.

However, the current Windows approach to use pre-emptive multitasking enables the operating system to switch between processes (applications/programs) at a pre-defined interval time to prevent any single process from taking complete control of the processor.  If a process were to fail, then the remaining processes could continue unaffected.

BFN,

fp.