Nirvana
asked on
Screen Print
Hi
A https website has multiple links, where i have to click on a link and once it opens take the screen print and save it as tiff in specific folder. is there a way that i can automatically download all the files from the webpage. I am trying to automate with wget however i am not able to figure out the complete path of the sub menu of the website
Tnx
UK
A https website has multiple links, where i have to click on a link and once it opens take the screen print and save it as tiff in specific folder. is there a way that i can automatically download all the files from the webpage. I am trying to automate with wget however i am not able to figure out the complete path of the sub menu of the website
Tnx
UK
Try Win HTTrack https://www.httrack.com/
Website copier that might do what you want. Might. It's not perfect.
Website copier that might do what you want. Might. It's not perfect.
ASKER
I have tried httrack and wget as well, it will not solve my problem. I am not able get complete path of URL from the sub pages. For example if sai there are 30 links in yahoo.com and I have cleack each link and take a screen print, however I another able to see the path of URL I can see only as yahoo.com but not the path
ASKER
what I am looking for is, I work for a company where lot if customers send their invoices in the form of PDF by scanning once it is scanned it will be saved in a database and a lunk will be created for that particular invoice. I need to click on each link and once the invoice opens I will take a screen print for audit purpose.
Would it help you to use Greenshot to capture the page. Greenshot will capture the entire page
capture a region water mark it save and also print it includes a screen editor
Greenshot only works with internet explorer however
http://getgreenshot.org/
Greenshot - Screen Capture how to
https://www.youtube.com/watch?v=VTtQPx8F9O8
capture a region water mark it save and also print it includes a screen editor
Greenshot only works with internet explorer however
http://getgreenshot.org/
Greenshot - Screen Capture how to
https://www.youtube.com/watch?v=VTtQPx8F9O8
I'm confused by what you are asking. At first it sounds like you just want to capture screen prints. But in your follow up, it sounds like is you are getting PDF's sent to you and now you need to save them in some form.
Are the pdf invoices coming to you from vendors to be paid? or are they invoices you sent out and your customers saved them and sent in for one reason or another?
Then what is saved to a database? The pdf or are you trying to extract text?
"a lunk will be created " What is a lunk?
"take a screen print for audit purpose." What are you doing with the screen shot? saving the image? or printing it out.
If you can detail your workflow even more, we can help you. I think what you need and what you think you need are two different things.
Are the pdf invoices coming to you from vendors to be paid? or are they invoices you sent out and your customers saved them and sent in for one reason or another?
Then what is saved to a database? The pdf or are you trying to extract text?
"a lunk will be created " What is a lunk?
"take a screen print for audit purpose." What are you doing with the screen shot? saving the image? or printing it out.
If you can detail your workflow even more, we can help you. I think what you need and what you think you need are two different things.
ASKER
Hi Scott,
Sorry for not detailing it out properly. Here are the detailed steps.
1. I open a website (epserver.com)
2. Login with user credentials
3. Browse through respective page by vendor( selecting from drop down)
4. Where there are links by clicking on it, it would open an invoice
5. Take the screen print
6. Save it in the local drive
I am trying to automate steps 4,5and 6. Green shot would do steps 5 and 6 however, clicking on each link in the page is a challange, is there a way that I can click on each link automatically?
Solutions that I have already tried.
1. Wget : with wget I am unable browse through exact page (vendor page)
2. Green shot: yes it is useful for steps 5 and 6
Hope the question is clear.
By the 'lunk' is a typo, it should be link
Sorry for not detailing it out properly. Here are the detailed steps.
1. I open a website (epserver.com)
2. Login with user credentials
3. Browse through respective page by vendor( selecting from drop down)
4. Where there are links by clicking on it, it would open an invoice
5. Take the screen print
6. Save it in the local drive
I am trying to automate steps 4,5and 6. Green shot would do steps 5 and 6 however, clicking on each link in the page is a challange, is there a way that I can click on each link automatically?
Solutions that I have already tried.
1. Wget : with wget I am unable browse through exact page (vendor page)
2. Green shot: yes it is useful for steps 5 and 6
Hope the question is clear.
By the 'lunk' is a typo, it should be link
Do you control the server?
ASKER
No I don't have control over the servet
Where does the database come in? What are you saving to the database? A link, image, text? What are you doing with the screen print? printing, saving?
ASKER
Database what I was referring to is screen prints saving in shared drive. I wouldn't have referred it as database.
Link is an image, not sure what format it is (jpeg, PNG, tiff)
Screen print is saved in a shared drive
Link is an image, not sure what format it is (jpeg, PNG, tiff)
Screen print is saved in a shared drive
You can use adobe acrobat pro to manage this work flow https://acrobat.adobe.com/us/en/products/acrobat-pro.html.
If you are looking to script something on your own, I have image magick on my server http://www.imagemagick.org/script/index.php and used it to create automated workflows. You essentially feed it command line's via your script. There are other similar software products that can do the same. Joe Winograd has some articles on this topic https://www.experts-exchange.com/articles/13696/Batch-Conversion-of-PDF-and-TIFF-files-via-Command-Line-Interface.html and
If you want to script something from scratch, you should have a start on your own and Experts here can help you troubleshoot any issues you have.
If you are looking to script something on your own, I have image magick on my server http://www.imagemagick.org/script/index.php and used it to create automated workflows. You essentially feed it command line's via your script. There are other similar software products that can do the same. Joe Winograd has some articles on this topic https://www.experts-exchange.com/articles/13696/Batch-Conversion-of-PDF-and-TIFF-files-via-Command-Line-Interface.html and
If you want to script something from scratch, you should have a start on your own and Experts here can help you troubleshoot any issues you have.
ASKER
Thanks a lot Scott, will try these today at work, I will try and provide you some screenshots of what I am trying to achieve, so that we are on same page.
ASKER
Anybody else who has a solution, I thought it should be a damn easy one for all the geniuses out here
ASKER
OK let's break this
From spreadsheet cell a1 I copy and search in webpage click when I find the contents of cell a1 and loop it for cell a2
Can this be automated
From spreadsheet cell a1 I copy and search in webpage click when I find the contents of cell a1 and loop it for cell a2
Can this be automated
Assuming you have a html code of main page. Here is a function to grab all links("<a>...</a>" tags) form HTML:
Imports System.Net
Imports System.Text.RegularExpressions
Private Class LinkItem
Public Text As String
Public url As String
End Class
Private Function getLinks(html As String) As List(Of LinkItem)
Dim lst As New List(Of LinkItem)
Dim m = Regex.Match(html, "<body[^>]*>(.*?)</body>", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
If m.Success Then
Dim matches As MatchCollection = Regex.Matches(m.Value, "<a.*?href=[""'](?<url>.*?)[""'].*?>(?<name>.*?)</a>", RegexOptions.IgnoreCase)
For Each m1 As Match In matches
lst.Add(New LinkItem With {.Text = m1.Groups(2).Value, .url = m1.Groups(1).Value})
Next
End If
Return lst
End Function
Next, using hidden webbrowser control, you can make a screen shot from href's urls. Full example:Imports System.Net
Imports System.Text.RegularExpressions
Public Class Form1
Private Class LinkItem
Public Text As String
Public url As String
End Class
Private folderPath = "e:\temp", fileName = "", _loaded As Boolean
Private WithEvents wb As New WebBrowser With {.ScriptErrorsSuppressed = True}
Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
Dim baseUrl = "http://www.vbtutor.net/vb2010/index.html"
Using client As New WebClient
If client.Proxy IsNot Nothing Then
client.Proxy.Credentials = CredentialCache.DefaultCredentials
End If
Dim html = client.DownloadString(baseUrl)
Dim links = getLinks(html)
Dim count As Integer
For Each link In links
If link.Text.Contains("Managing") Then
count += 1
Label1.Text = "Loading " & link.Text & "..."
fileName = link.Text & ".tiff"
_loaded = False
Dim url = link.url
If Not url.StartsWith("http") Then
url = baseUrl.Substring(0, baseUrl.LastIndexOf("/") + 1) & link.url
End If
wb.Navigate(url)
Do While Not _loaded
My.Application.DoEvents()
Threading.Thread.Sleep(200)
Loop
End If
Next
End Using
MsgBox("Done")
End Sub
Private Function getLinks(html As String) As List(Of LinkItem)
Dim lst As New List(Of LinkItem)
Dim m = Regex.Match(html, "<body[^>]*>(.*?)</body>", RegexOptions.IgnoreCase Or RegexOptions.Singleline)
If m.Success Then
Dim matches As MatchCollection = Regex.Matches(m.Value, "<a.*?href=[""'](?<url>.*?)[""'].*?>(?<name>.*?)</a>", RegexOptions.IgnoreCase)
For Each m1 As Match In matches
lst.Add(New LinkItem With {.Text = m1.Groups(2).Value, .url = m1.Groups(1).Value})
Next
End If
Return lst
End Function
Private Sub wb_DocumentCompleted(sender As Object, e As System.Windows.Forms.WebBrowserDocumentCompletedEventArgs) Handles wb.DocumentCompleted
wb.ClientSize = New Size(1024, 768)
Dim Height As Integer = wb.Document.Body.ScrollRectangle.Bottom
If Height = 0 Then Height = 768
wb.ClientSize = New Size(1024, Height)
Using Bmp = New Bitmap(wb.Bounds.Width, Height)
wb.DrawToBitmap(Bmp, wb.Bounds)
Bmp.Save(IO.Path.Combine(folderPath, fileName), Drawing.Imaging.ImageFormat.Tiff)
End Using
_loaded = True
End Sub
End Class
Note that for https secure connection you can get security alert. To avoid it callServicePointManager.ServerCertificateValidationCallback = (Function(sender, certificate, chain, sslPolicyErrors) True)
before calling main page.
ASKER
Thanks a lot ARK will try this. Thanks a lot again
ASKER
Hi ARK thank you and extremely sorry for the late reply. consedring i am very new to VBA could you let me know how do I run the code provided by you
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you very much
ASKER