asked on

Screen Print

Hi

A https website has multiple links, where i have to click on a link and once it opens take the screen print and save it as tiff in specific folder. is there a way that i can automatically download all the files from the webpage. I am trying to automate with wget however i am not able to figure out the complete path of the sub menu of the website

Tnx
UK

Nirvana

ASKER

Thank you

dbrunton

Try Win HTTrack https://www.httrack.com/

Website copier that might do what you want. Might. It's not perfect.

Nirvana

ASKER

I have tried httrack and wget as well, it will not solve my problem. I am not able get complete path of URL from the sub pages. For example if sai there are 30 links in yahoo.com and I have cleack each link and take a screen print, however I another able to see the path of URL I can see only as yahoo.com but not the path

Nirvana

ASKER

what I am looking for is, I work for a company where lot if customers send their invoices in the form of PDF by scanning once it is scanned it will be saved in a database and a lunk will be created for that particular invoice. I need to click on each link and once the invoice opens I will take a screen print for audit purpose.

Merete

Would it help you to use Greenshot to capture the page. Greenshot will capture the entire page
capture a region water mark it save and also print it includes a screen editor
Greenshot only works with internet explorer however
http://getgreenshot.org/
Greenshot - Screen Capture how to
https://www.youtube.com/watch?v=VTtQPx8F9O8

Scott Fell

I'm confused by what you are asking. At first it sounds like you just want to capture screen prints. But in your follow up, it sounds like is you are getting PDF's sent to you and now you need to save them in some form.

Are the pdf invoices coming to you from vendors to be paid? or are they invoices you sent out and your customers saved them and sent in for one reason or another?

Then what is saved to a database? The pdf or are you trying to extract text?

"a lunk will be created " What is a lunk?

"take a screen print for audit purpose." What are you doing with the screen shot? saving the image? or printing it out.

If you can detail your workflow even more, we can help you. I think what you need and what you think you need are two different things.

Nirvana

ASKER

Hi Scott,

Sorry for not detailing it out properly. Here are the detailed steps.

1. I open a website (epserver.com)
2. Login with user credentials
3. Browse through respective page by vendor( selecting from drop down)
4. Where there are links by clicking on it, it would open an invoice
5. Take the screen print
6. Save it in the local drive

I am trying to automate steps 4,5and 6. Green shot would do steps 5 and 6 however, clicking on each link in the page is a challange, is there a way that I can click on each link automatically?

Solutions that I have already tried.

1. Wget : with wget I am unable browse through exact page (vendor page)
2. Green shot: yes it is useful for steps 5 and 6

Hope the question is clear.

By the 'lunk' is a typo, it should be link

Scott Fell

Do you control the server?

Nirvana

ASKER

No I don't have control over the servet

Scott Fell

Where does the database come in? What are you saving to the database? A link, image, text? What are you doing with the screen print? printing, saving?

Nirvana

ASKER

Database what I was referring to is screen prints saving in shared drive. I wouldn't have referred it as database.

Link is an image, not sure what format it is (jpeg, PNG, tiff)

Screen print is saved in a shared drive

Scott Fell

You can use adobe acrobat pro to manage this work flow https://acrobat.adobe.com/us/en/products/acrobat-pro.html.

If you are looking to script something on your own, I have image magick on my server http://www.imagemagick.org/script/index.php and used it to create automated workflows. You essentially feed it command line's via your script. There are other similar software products that can do the same. Joe Winograd has some articles on this topic https://www.experts-exchange.com/articles/13696/Batch-Conversion-of-PDF-and-TIFF-files-via-Command-Line-Interface.html and

If you want to script something from scratch, you should have a start on your own and Experts here can help you troubleshoot any issues you have.

Nirvana

ASKER

Thanks a lot Scott, will try these today at work, I will try and provide you some screenshots of what I am trying to achieve, so that we are on same page.

Nirvana

ASKER

Anybody else who has a solution, I thought it should be a damn easy one for all the geniuses out here

Nirvana

ASKER

OK let's break this

From spreadsheet cell a1 I copy and search in webpage click when I find the contents of cell a1 and loop it for cell a2

Can this be automated

Ark

Assuming you have a html code of main page. Here is a function to grab all links("<a>...</a>" tags) form HTML:

Imports System.Net
Imports System.Text.RegularExpressions
    Private Class LinkItem
        Public Text As String
        Public url As String
    End Class

    Private Function getLinks(html As String) As List(Of LinkItem)
        Dim lst As New List(Of LinkItem)
        Dim m = Regex.Match(html, "<body[^>]*>(.*?)</body>", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
        If m.Success Then
            Dim matches As MatchCollection = Regex.Matches(m.Value, "<a.*?href=[""'](?<url>.*?)[""'].*?>(?<name>.*?)</a>", RegexOptions.IgnoreCase)
            For Each m1 As Match In matches
                lst.Add(New LinkItem With {.Text = m1.Groups(2).Value, .url = m1.Groups(1).Value})
            Next
        End If
        Return lst
    End Function

Open in new window

Next, using hidden webbrowser control, you can make a screen shot from href's urls. Full example:

Imports System.Net
Imports System.Text.RegularExpressions

Public Class Form1
    Private Class LinkItem
        Public Text As String
        Public url As String
    End Class

    Private folderPath = "e:\temp", fileName = "", _loaded As Boolean
    Private WithEvents wb As New WebBrowser With {.ScriptErrorsSuppressed = True}

    Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
        Dim baseUrl = "http://www.vbtutor.net/vb2010/index.html"
         Using client As New WebClient
            If client.Proxy IsNot Nothing Then
                client.Proxy.Credentials = CredentialCache.DefaultCredentials
            End If
            Dim html = client.DownloadString(baseUrl)
            Dim links = getLinks(html)
            Dim count As Integer
            For Each link In links
                If link.Text.Contains("Managing") Then
                    count += 1
                    Label1.Text = "Loading " & link.Text & "..."
                    fileName = link.Text & ".tiff"
                    _loaded = False
                    Dim url = link.url
                    If Not url.StartsWith("http") Then
                        url = baseUrl.Substring(0, baseUrl.LastIndexOf("/") + 1) & link.url
                    End If
                    wb.Navigate(url)
                    Do While Not _loaded
                        My.Application.DoEvents()
                        Threading.Thread.Sleep(200)
                    Loop
                End If
            Next
        End Using
        MsgBox("Done")
    End Sub

    Private Function getLinks(html As String) As List(Of LinkItem)
        Dim lst As New List(Of LinkItem)
        Dim m = Regex.Match(html, "<body[^>]*>(.*?)</body>", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
        If m.Success Then
            Dim matches As MatchCollection = Regex.Matches(m.Value, "<a.*?href=[""'](?<url>.*?)[""'].*?>(?<name>.*?)</a>", RegexOptions.IgnoreCase)
            For Each m1 As Match In matches
                lst.Add(New LinkItem With {.Text = m1.Groups(2).Value, .url = m1.Groups(1).Value})
            Next
        End If
        Return lst
    End Function

    Private Sub wb_DocumentCompleted(sender As Object, e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles wb.DocumentCompleted
        wb.ClientSize = New Size(1024, 768)
        Dim Height As Integer = wb.Document.Body.ScrollRectangle.Bottom
        If Height = 0 Then Height = 768
        wb.ClientSize = New Size(1024, Height)
        Using Bmp = New Bitmap(wb.Bounds.Width, Height)
            wb.DrawToBitmap(Bmp, wb.Bounds)
            Bmp.Save(IO.Path.Combine(folderPath, fileName), Drawing.Imaging.ImageFormat.Tiff)
        End Using
        _loaded = True
    End Sub
End Class

Open in new window

Note that for https secure connection you can get security alert. To avoid it call

ServicePointManager.ServerCertificateValidationCallback = (Function(sender, certificate, chain, sslPolicyErrors) True)

Open in new window

before calling main page.

Nirvana

ASKER

Thanks a lot ARK will try this. Thanks a lot again

Nirvana

ASKER

Hi ARK thank you and extremely sorry for the late reply. consedring i am very new to VBA could you let me know how do I run the code provided by you

ASKER CERTIFIED SOLUTION

Ark

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Nirvana

ASKER

Thank you very much