WaterStreet
asked on
Automated procedure for cutting and pasting sequential web pages.
Hi
I'm doing personal research at a certain web site that only allows me to cut and paste one hundred items per page, one page at a time, to Excel or NotePad out of maybe 20 to 60 pages. I often need to do this tedious process for various searches.
Is their some way to automate this process so that all pages of the search are cut and pasted sequentially to one Excel or NotePad file?
Thanks in advance.
WS
I'm doing personal research at a certain web site that only allows me to cut and paste one hundred items per page, one page at a time, to Excel or NotePad out of maybe 20 to 60 pages. I often need to do this tedious process for various searches.
Is their some way to automate this process so that all pages of the search are cut and pasted sequentially to one Excel or NotePad file?
Thanks in advance.
WS
ASKER
Isn't there a Java script or Win 8 macro procedure that can do the heavy lifting on this?
Or, for example, any good macro recorder software?
Thanks
WS
Or, for example, any good macro recorder software?
Thanks
WS
Possibly you are aware how to manage, but overall, consider size issue at 60 pages unknown content. Where Excel had limit of lines and NotePad limit of bytes, why not permit Word or at least something more like a dump file, say .data or .raw.
ASKER
I'm using TextPad as my default text editor instead of NotePad. It does not have the size limit of NotePad. But see what I'm saying about Word, below.
According to MS, in Excel 2013, which I use, the maximum worksheet size is 1048576 rows by 16384 columns.
I had just been ignoring Word to receive the cut and pasting. Tried it. Actually, Word works better for me then the ".txt" editors for the formats of the copied pages. Easier to clean up certain things there than with Excel. Thanks.
I use Chrome as my browser. Don't know how to do a .data or .raw dump from Chrome, if you were suggesting doing the from the browser.
WS
According to MS, in Excel 2013, which I use, the maximum worksheet size is 1048576 rows by 16384 columns.
I had just been ignoring Word to receive the cut and pasting. Tried it. Actually, Word works better for me then the ".txt" editors for the formats of the copied pages. Easier to clean up certain things there than with Excel. Thanks.
I use Chrome as my browser. Don't know how to do a .data or .raw dump from Chrome, if you were suggesting doing the from the browser.
WS
I hesitate to use term like flat file, or variable length file, where my guess would be to prefer .text without confining it to be NotePad, while not knowing if input is text, picture, 256 bit-char, language, etc. Just refining Q. By raw I would more mean original single file yet to be massaged. That can lend to become .csv for example, which is not same as Excel, while remaining of text only, being massaged of variable length, while not formatted for Word. Raw is unprocessed unknown content, as is dump, but the latter more often means bit-for-bit, lossless, which I presume is not necessary, still depends on goal. Aramaic? Chinese? I'd doubt.
ASKER
None of the above comes even close to answering my question in light of the following: For example there is commercial software does scripting (or macros), I think that some macros can be done in Win 8.1, and their might even be Java code that does this.
I don't really expect a full solution, where I simple hit a hot key, but I do expect to find something that will automate most of the keyboard or mouse operations.
WS
I don't really expect a full solution, where I simple hit a hot key, but I do expect to find something that will automate most of the keyboard or mouse operations.
WS
ASKER
Maybe I posted this in the wrong Zones, especially for what I last posted, above. Asking Community Support for help.
I had suspected zones are right except Web Browsers, and scripting s/b first, You might try swapping https://www.experts-exchange.com/Programming/Languages/Java/
for the browser. I suspected Misc would cover OTC prewritten code/package, perhaps at cost, and others (eg Viki) to have soon contributed for that - and that the Java issue would have been addressed by those wanting to contribute own code. And I suspected use of macro emulating keystrokes to be partial but incomplete solution. Also suspected you'd considered programming like VB, C, etc. would not be within answerset.
for the browser. I suspected Misc would cover OTC prewritten code/package, perhaps at cost, and others (eg Viki) to have soon contributed for that - and that the Java issue would have been addressed by those wanting to contribute own code. And I suspected use of macro emulating keystrokes to be partial but incomplete solution. Also suspected you'd considered programming like VB, C, etc. would not be within answerset.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Qlemo,
Looked at and downloaded both AutoIT and AutoHotkey, and read their reviews. My guess is that AutoHotkey should meet my various needs. It's robust with good user reviews. However, it's said to have a steep learning curve with a good manual.
Thanks
Looked at and downloaded both AutoIT and AutoHotkey, and read their reviews. My guess is that AutoHotkey should meet my various needs. It's robust with good user reviews. However, it's said to have a steep learning curve with a good manual.
Thanks
Try initial workaround in DOS (CMD) window.
Convert each page to text.
Number pages sequentially. (page1.txt, page2.txt)
Use copy command to append them together (pages.txt)
Now there's one file to search, but still full of junk.
Recognize that a web page is an html file that can thus be copied.
Knowledge of source can have some value.
For example, it may already have its source pages numbered sequentially.
It is likely that there is content such as images that are not needed.
The images may have text also worth searching, as well as tables that may or may not have value for searching, Some images may have text worth searching, requiring an OCR reader.
Page may be oriented to having LHS, middle, RHS, etc., and may have option for 'text version' and numerous links, ads,.
Maybe revert to goal (re)definition. For example, Were you to 'select all', then copy/save in notepad the images and clickable links go away, but question also directed to excel, where all could be pasted there, preserving or removing links, possibly wanting to preserve columns and thus retain tabular info.
One issue I do not understand is main limiting factor beyond user time - "only 100 items per page". I'd like to think that a manual <cntl>A 'select all' can get it copied in bulk easily enough. As such the pp limit is applied to us outsiders scripting against it online to refrain from slowing down network??
Assumption: For 10-20 pages a manual method should suffice, Create NotePad txt file, Place a sequence of <break>:Page 1, Page 2, Page 3... to 20, then copy in each page appropriately. Continue, process, and what more (or less) should/could have been done.
For 10 -60 pages, where prior method works out, copy each link into the notepad (or URL to excel file), copy|paste once each in sequence, You now have a list of files. Suppose 50 filenames, all in sequence. You now ask question here, receive answer for how to merge them into single file. Now that you've a single file, all text in sequence, this is not the answer sought. Why not? Notably, where Excel was potential goal, not MS word, even though a goal was to search (presumably for words). Why not? Presumption is to more easily dispose of ads and images. So OCR may not be part of question?
How would you know when & how to go from page 21 to 22 etc.? Is that, or could that be automated?
https://en.wikipedia.org/wiki/Screen_scraping#Screen_scraping