Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

How to extract text from an .XPS file using VBA

Posted on 2010-09-23
19
Medium Priority
?
3,239 Views
Last Modified: 2013-11-27
I am connected to Corporate via VPN. I cannot install ANY software. I am trying to "screen scrape" from a web page. I can copy-paste, but I need to save up to 5 screens for each of 2,000 customers. I can print to a file, but ONLY to a .XPS file. I am further constrained in that I can only use Internet Explorer 7 (drat, no Greasemonkey).

If I change .XPS to .ZIP, then I can see all of the internal files and folders. I've tried to manually strip the text from the "Glyph" nodes I found in one of the XML files, but that was less than satisfactory (sentences were returned as individual words; all of the text was duplicated).

I am looking for a way to automate this copy-paste nightmare. I can use Office 2007 (either Access or Excel). If I knew how, I would attach to the Internet Explorer process via the Windows API, but alas, my knowledge in this area is limited.

I am a seasoned veteran of the I.T. world (meaning I remember xenix). You can throw technical stuff at me and I will not recoil from it.

I will do ANYTHING to overcome this dilemma, within the restrictions placed upon me by Corporate. And no, They will not grant me authority to do anything, As far as They are concerned, I do not exist.

I am out of the office until 6:00 PM this evening, so please do not think I am ignoring your valued responses.
0
Comment
Question by:Badotz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 5
  • 4
  • +2
19 Comments
 
LVL 82

Expert Comment

by:leakim971
ID: 33745259
Why Javascript ? You should replace it by a VB zone
0
 
LVL 6

Expert Comment

by:CRJ2000
ID: 33749342
Could you provide a bit more detail about the structure of the web page you're trying to "screen scrape?" There are a couple methods I've used in the past for such things, but they're very dependent on the structure of the web page(s).

1) Parse the HTML source. This works if the page is truly rendered as HTML (i.e. JavaScript isn't used to change the data display dynamically).

2) Embed a WebBrowser control in a form in Excel or Access, and drive it programatically. You can reference the DOM Document (and related) objects to step through the items you care about. This is very dependent on the structure of the data on the page.

Any additional information you can offer will help us understand what approaches may work,.

Chris
0
 
LVL 74

Expert Comment

by:Jeffrey Coachman
ID: 33750635
Have you looked into all/any of the apps that claim that they can do the Screen Scraping for you?

http://www.google.com/#sclient=psy&hl=en&q=screen+scraper&aq=0&aqi=g5&aql=&oq=&gs_rfai=&pbx=1&fp=8d9c50a61d5b9175

Jeff
0
Get your Disaster Recovery as a Service basics

Disaster Recovery as a Service is one go-to solution that revolutionizes DR planning. Implementing DRaaS could be an efficient process, easily accessible to non-DR experts. Learn about monitoring, testing, executing failovers and failbacks to ensure a "healthy" DR environment.

 
LVL 29

Author Comment

by:Badotz
ID: 33754000
leakim971:
Why JavaScript? Because if there is an IE-7 version of something like GreaseMonkey, I would try that. A quick Google search was not promising, although perhaps a bookmarklet?

CRJ2000:
After system startup, CTRL-ALT-DEL logs me into the Corporate VPN, with a user name and password.

The web page I want to scrape is a DOS program running in an <iframe> - no chrome whatsoever. I have to use the mouse to select all of the screen because CTRL-A does not work (go figure). I can use the arrow keys to move to the top of the screen and then holding down SHIFT select the screen one character across then by line. There is no accelerator key (again, go figure).

After capturing the first screen, I must press Enter to move to the next screen, where once more I use the mouse to select all of the screen because CTRL-A does not work (go figure).

I do not remember if I can view the source of this Dos beast (I do not think I can). I can view the source of the <iframe>, but honestly I do not recall seeing any of the content I am after. I will double-check today.

boag2000:
Thanks, but I cannot install any applications. I can only use what is already on the machine. Corporate pre-installs what They think we will need, and that is that.

Of course, my wanting to do this is because Corporate has put an end-of-life date on the DOS app. Consequently, we are moving all of our clients from the DOS program into a new, shiny web app. Corporate will not provide us with any sort of the data conversion (don't ask). They actually recommend that we cut-and-paste from the old into the new. Truthfully, the apps present data in such different ways that there is no straightforward way to copy and paste; we must copy into Notepad, then pick that data apart (phone numbers, driver's license numbers, dates, comments, etc) and paste each item into a specific portion of the web app. And the web app is a p.o.s. - designed by sadists with no concept of workflow. I want to yell at someone, but all we have is the Corporate Help Desk, which is staffed by troglodytes. There is no way for me to speak to the Gods in their ivory tower.

But that is my cross to bear. I am just trying to make my life a little easier.
0
 
LVL 6

Expert Comment

by:CRJ2000
ID: 33754490
It sounds like there's actually an embedded control that is displaying the content -- I don't think you're really looking for something to scrape a webpage, but rather than a "window" within the webpage. Could you confirm this by looking at the source of the IFRAME and looking for an embedded object?

If it truly isn't a web page that you're trying to scrape, then you may find a macro application such as Macro Express or AutoHotKey to be of help. If you're not afraid to learn a new scripting language (it's pretty straightforward), AutoHotKey would be a good way to go on this one. It could easily automate the repetitive cut-paste stuff.

To clarify -- the "p.o.s." web app is the new one that you're putting this information into? You may be able to automate some of that work, too, using one of these programs.

Here are the links to the product websites. Macro Express is a commercial product. AutoHotKey is freeware.

http://www.macroexpress.com/
http://www.autohotkey.com/

Chris




0
 
LVL 29

Author Comment

by:Badotz
ID: 33754699
Thanks, but I cannot install any applications, as much as I might like to. Corporate has a draconian policy (which we in the office AND the owner had to sign), and there are -=NO=- exceptions. The policy states:

"...the result of installing Outside Applications upon a (CORPORATION NAME HERE) desktop and/or laptop system shall result in immediate termination of the individual registered to said desktop and/or laptop system; said termination may ultimately result in revocation of the owner's reseller license."

I will investigate the environment this morning, but I will not be able to respond to any messages until after 6:00 PM.

Thanks for your help.
0
 
LVL 6

Expert Comment

by:CRJ2000
ID: 33754776
I understand about the no-install policy. Anything you can tell us about what's "really" happening on the web page would be helpful in figuring out next steps.

Chris
0
 
LVL 29

Author Comment

by:Badotz
ID: 33758333
The DOS app is indeed running in an <iframe>- the @SRC@ attribute points to a file with a @.DO@ extension - could be anything, since it is on the Corporate server.
0
 
LVL 29

Author Comment

by:Badotz
ID: 33758336
Not sure what is up with @ instead of "...either my KB is borked or EE borked something...looks like reboot time (sigh)...
0
 
LVL 6

Expert Comment

by:CRJ2000
ID: 33759361
.DO is the default extension for Struts (a Java-based framework). You can find more information about it here: http://struts.apache.org/#Welcome

I'm not sure what to suggest to you at this point. It's difficult to understand what this thing really looks and feels like, and there are many constraints. Unless you can get someone in your IT department to help out a bit, I'm afraid you may be stuck with a manual process.

Chris
0
 
LVL 40

Expert Comment

by:als315
ID: 33759410
Did you try to record macro from Excel or Word, which is switching to your window, copy all necessary information and paste it to Excel or Word?
Scenario can be complicated enough.
0
 
LVL 29

Author Comment

by:Badotz
ID: 33759548
CRJ2000:
Yeah, I'm just hosed. There is no way for me - a mere user - to reach anyone of import in the  I.T. department, except indirectly through the Help(less) Desk.

I might investigate using the Windows API to attach to the window, but I feel that the more complicated I make the process, the less likely it is to work. Diminishing returns, and all that.

als315:
Copying into Notepad works just fine, thanks. There is no need to use Excel or Word - especially NOT Word: yuck.
0
 
LVL 40

Expert Comment

by:als315
ID: 33759708
In recorded office macro you can automate process of select field-copy-paste > move to next field > select-copy-paste, etc, which can't be done in Notepad.
0
 
LVL 29

Author Comment

by:Badotz
ID: 33760590
I have written VBA code that rips the five captured screens apart and stuffs a database with the results. I then prepare an HTML report from this which, after printing, is filed for reference.

While Excel might be able to automate some formatting tasks, it cannot cut from the DOS app and paste into the web app.

Here's my question to you: how, exactly, would Excel help me? Please be specific.
0
 
LVL 40

Accepted Solution

by:
als315 earned 2000 total points
ID: 33760728
"I am trying to "screen scrape" from a web page".
If text in your application can be selected with mouse and and keys (Shift+Arrows), you can send this keys from macro. Example from help:
Dim ReturnValue, I
ReturnValue = Shell("CALC.EXE", 1)    ' Run Calculator.
AppActivate ReturnValue     ' Activate the Calculator.
For I = 1 To 100    ' Set up counting loop.
    SendKeys I & "{+}", True    ' Send keystrokes to Calculator
Next I    ' to add each value of I.
SendKeys "=", True    ' Get grand total.
SendKeys "%{F4}", True    ' Send ALT+F4 to close Calculator.

Switch to your application, select, then copy (Ctrl+C) and paste (CTRL+V) text to word (excel)

0
 
LVL 29

Author Comment

by:Badotz
ID: 33760792
Instead of CALC.EXE, how do I refer to a running instance of Internet Explorer?
0
 
LVL 40

Expert Comment

by:als315
ID: 33760812
You can try to use Alt+Tab to switch from one window to other.
0
 
LVL 29

Author Closing Comment

by:Badotz
ID: 33760848
I never thought about SendKeys, and for that, I thank you. And while your example wasn't the exact solution, I think I can now figure it out.

Again, thanks for the prod in the right direction!
0
 
LVL 40

Expert Comment

by:als315
ID: 33760996
Thanks. It will be not so simple, but may be possible.
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes how you can use Custom Document Properties to store settings and other information in your workbook so that they will be available the next time you open the workbook.
Explore the ways to Unlock VBA Project Password Excel 2010 & 2013 documents. Go through the article and perform the steps carefully to remove VBA Excel .xls file.
This Micro Tutorial will demonstrate the scrolling table in Microsoft Excel using the INDEX function.
Many functions in Excel can make decisions. The most simple of these is the IF function: it returns a value depending on whether a condition you describe is true or false. Once you get the hang of using the IF function, you will find it easier to us…

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question