Link to home
Start Free TrialLog in
Avatar of dustock
dustockFlag for United States of America

asked on

Copy certain text from word to excel

I have a pretty tall order on my hands to copy certain text from word documents in a folder and past it into an excel document.  Each folder could contain anywhere from 10 to 2500 word documents.  Plus there are multiples of some documents and I only need the most recent one.  For example I have 123456-A-1.doc, 123456-A-2.doc, 123456-B-1.doc.  And I only need the information out of 123456-B-1.doc.  As far as the information I need, I’ll attempt to show you what a document looks like.

Part Name     : 123456-B
Material          : Aluminum
Make Qty        : 10
_____________________________________________________________________________

Program Notes: Load sheet left
Station      Shape       Orient      Bin      Notes
_____________________________________________________________________________
T3                  RND            0       CAB      Pgmr> Check clamps
T7                  OBRND       0       CAB      Pgrm>
T1                  SQR            0      CAB      Pgrm> Check orient

Sometimes there are no Program notes and as you can see sometimes there are no other notes next to Pgrm>.  Also the stations can range from 3 to 30 depending on the document.  I also need to extract the Rev from the part name and put it in a separate column.  Here is my idea of what the excel document should look like.

PART_NUM   REV    PGRM_NOTES        NOTES_1        NOTES_2     NOTES_3      NOTES_4   etc..
123456          B        Load sheet left       Check clamps                      Check Orient      

the formatting is a little off but hopefully you get the gist of it

Am I asking for the impossible?  Is a macro not the best solution to this problem?

Thanks,
Avatar of redmondb
redmondb
Flag of Afghanistan image

dustock,

The only real issue in this is the actual layout of the Word documents. Could you post sample copies - one with "Program Notes" and one without, please?

Thanks,
Brian.
Avatar of dustock

ASKER

Brian,

Here are the files.  One thing to note, the file without notes doesnt have a rev letter but a rev number.
3-02346-1-1.doc
5-02527-C-1.doc
dustock,

Thanks for those. How long more today will you be looking in on the site?

Regards,
Brian.
Avatar of dustock

ASKER

I'll be on for at least a few more hours
dustock,

The bit I was most concerned about was the layout of the Word document, so I did a quick and dirty to check it - completely ignoring folder and version processing.

In the attached, please change the directory name in "OpenAndReadWordDoc" to point to the folder where the two specimen files are. Then run it (or simply click on the button on Sheet1).

Few things...
 - It'll process the specified word documents, creating a separate sheet for each. (I know you actually want the final output to be a single sheet with a row for each document, but bear with me!)
 - Each sheet will show that documents output row.
 - I added an extra column ("Comments") to store any funnies (no Stations, a Station without the word "Pgmr>", more than 30 Stations).
 - If you want to run this more than once you'll need to delete the sheets created from the earlier run(s). ("Quick and dirty").

Let me know if I'm on the right track,

Regards,
Brian.
Copy-certain-text-from-word-to-e.xlsm
Avatar of dustock

ASKER

Brian,

This is exaclty what I was looking for!  I know its quick and dirty but i'm going to ask a couple things just to make sure were on the same wave length here.
With the final product, will the entire document be copied into column A like it was here?
Will it be able to just grab any word doc in a given folder and process it?  (I see the macro does currently call out the current file names)

Just an FYI, I did get a 'Compile error: Can't find project or library'.  I clicked ok and it has 'MISSING: Microsoft Word 14.0 Object Library' highlighted.  I pointed it back to MSWORD.olb and then I 'Microsoft Word 12.0 Object Library' shows as highlighted.  I checked the 12.0 unchecked 14.0 and saved and it worked.
dustock,

(1) No, as I understand your requirements, your final output should be a single sheet with a row for each document with "PART_NUM" in column A, etc.

(2) I presumed that you want to be prompted for a folder and then have everything happen automatically after that (i.e. process all files, but ignore "older" versions) . Is that correct? (It would actually be easier to give you the prompt and let you pick the files, thus avoiding the macro having to worry about versions.)

(3) My bad, I should have mentioned that you'd need to change the reference back to 2007.

BTW, In the "Rev" column I was simply using the 8th character of the file name. Couple of things about this (a) Shouldn't I be using three characters (e.g. "A-2" from 123456-A-2.doc)? (b) Is there any possibility that the file's name and the Part No. could, in error, be different? The macros, as it stands wouldn't care but you might!

Regards,
Brian.
Avatar of dustock

ASKER

Brian,

1) Sounds good

2) You are correct, process newer files first.  If its easier for me to select files I can, but some folders have 2500 documents that I need to select the latest version which could get painful, but I can always just select all, and use excel to get rid of the ones I dont want.

3) No worries on the version

The Rev column might create issues as we have some part number that are differet lengths like:
1672054-A-2
10036684-33-B-1
300001717783-1-1
2-55555-0172-A-1

Good question on why your are using A vs A-2.  The reason for this is because of another application that stops seeing the A-1 Version so an A-2 version gets created.  In some cases we are up to A-4 but the -# means nothing for what we are actually doing.  In the document file itself after the -A or -B there is no -#.

I'm going to be heading out soon, but i'll monitor this as I have time and answer any additonal questions you have.  Thank you very much for all your help!

~Dustin
Dustin,

Thanks for all that!

I'm just being lazy, it would be silly for you to have to manually manage the versions. (BTW, I'll probably output the file name as the last column for checking purposes, but we can drop that at the end.)

I reckon you've given me everything I need. Managing my time, could you give me an idea how long it'll be before you're back on site again?

Thanks,
Brian.
Dustin,

OK, the attached seems to be doing what I want it to to. Over to you to see whether it meets your needs!

(1) Click on the blue button to start it. You'll be prompted for files - navigate to the required folder, select the required files and click on "Open". Be careful, the list of files will only show files of the form "*-?-?.doc" where the question marks are single characters. Let me know if this is a problem.

(2) This will take a while for a lot of files. To let you know it's actually doing something, the status bar shows the current file name, how many files processed and the the total number. (Don't worry if processed files jumps - that's older versions being skipped.)

(3) The file names ended up being output in the first column. There's a commented out line of code to delete the column (look for "Uncomment the following to drop the file name column from the Output sheet..."). I strongly suggest leaving it alone until you're happy.

(4) I've tried to anticipate funnies in the data, but there are bound to be surprises. I've deliberately set things up so that commenting out a few deletes will leave all the workings behind.

Regards,
Brian.
Copy-certain-text-from-word-to-e.xlsm
Dustin,

Couple of changes to my previous post...

(5) There's now an edit that the Part No.+Rev. in the file name agrees with that in the file.

(6) Because of this, I've dropped the file name column that I added ((3) above).

Regards,
Brian.
Copy-certain-text-from-word-to-e.xlsm
Avatar of dustock

ASKER

Brian,

This is awesome!!  I ran this againts a folder with 490 files, it took about 10 or 15 minutes and the output is pretty close to perfect.  I did run into something unexpected though.  The attached document has a second line of information below Program Notes:.  Can it be modified to accept up to 4 program notes just in case?  So we could end up with PGRM_NOTES_1, PGRM_NOTES_2 and so on to 4.  That way we can ensure we are capturing that information.  I did also see one where they put the note above program notes, but i'm not even going to bother looking for that instance.  If we can get 95% of the data(which it looks like we did) i'll be happy.  This is nothing sort of amazing as far as turn around!  It would have taken me weeks of trial and error to get half of what you gave me here.

Thank you,

Dustin
1832901-A-1.doc
Dustin,

Hey, thanks for the kind words!

That extra line in "Program Notes" was what I was worried about from the start, but 2 in 490 is fewer surprises than I thought you'd get. I'll look at a change for that.

(1) Finding that a part is in error, can you go to the source system and
modify the entry so it's subsequently correct? The succeeding extra line(s) sounds like it could be "correct" (Part genuinely has multiple notes) but the preceding one sounds like a bug. Could you send me that one?

(2) I was concerned about run time. However, that may simply be the file processing overhead. If an hour or so is too long for the 2,500 files (is it?)...
(a) I've never tested the timing of this, but perhaps it should be a Word macro with a little bit of Excel at the end?
(b) This isn't the cleanest code I've ever written, doing a lot of shuffling of data back and forwards to sheets. (Perhaps I was over-concerned about you being able to "debug" it by leaving worksheets behind.)
(c) Is the file overhead being added to by a virus checker?

Regards,
Brian.

Avatar of dustock

ASKER

Brian,

I spoke too soon with the 4 lines for program notes, lets go to 6 if possible.

1) I can modify the documents so they'll be process but I need a way to pin point them.  Is there a way we could find out if there is text before Program Notes that's after that first line break?  Or if the text Program Notes is missing?  And Also if Pgmr> is missing?  If it can just spit out the doc name, into another sheet, or file then we know what we need to correct to get it to run through?  2 of 490 wasnt bad, but when I get the folder with 2500 files I have a feeling we'll have more issues.  I attached some files so you can see what I mean for these.

2) Run time is nothing to worry about.  This could take all night for all I care, i'm just happy to have a solution.  
a) If a word macro is easier, then by all means we can do it that way.  
b) I did notice a few comments in your code which did help the area's I dont understand.  I have some basic knowledge with VBscript, VB.NET and C#.NET but when it comes to office macro's I'm pretty limited so maybe a couple heading comments would be nice if there is a minor modification I need to do.  But I think we've figured a lot of the issues out.

c) I've scanned the file with antivirus, are you having issues with it?


12222755-36-K-1.doc
2338195-B-1.doc
Dustin,

Relax - I didn't mean my file, I was referring to your anti-virus scanning your word files during the run!

OK lots here...
(1) "I need a way to pin point them" etc.
All of those edits are already there, did you notice "Comment" (column AH)?
(2) OK.
(3) Nope! It's already written so converting is more work.

I'd made the change for multiple "Program Notes", but I'll hold it until you've checked out (1) above. If you hadn't seen that there are likely to be other issues!

No pressure, but I'll be gone in an hour, I'm afraid.

Regards,
Brian.
Dustin,

Best laid plans and all that,  I'm still here.

Regards,
Brian.
Dustin,

Definitely gone in 35 minutes!

Regards,
Brian.
Avatar of dustock

ASKER

Brian,

Sorry for the lack of response last night.

I now understand what you meant about the antivirus.
1) II did not notice this column.  Let me go back and rerun the 490 and i'll figure out if there are other issues.  I like how you thought of this already!!  I'll get back to once I have reviewed.
Dustin,

Unless you've deleted your first run, don't bother to re-run yet! There's a newer version - I was just waiting to incorporate any errors from the first run.

Regards,
Brian.
Avatar of dustock

ASKER

I saw no major errors/issues, so i'll wait and run the new updated version.
Dustin,

Sure, please see attached.

Could I suggest you run it for a few of the funnies (especially around "Program Notes") to see that it's what you want.

Regards,
Brian.


Copy-certain-text-from-word-to-e.xlsm
Avatar of dustock

ASKER

Brian,

I'll run it againsts the folder with 490 and i'll throw a couple dummy part numbers in there to verify its working.  I'll let you know if there are any issues, but it'll probably be Monday or Tuesday.  Have a good weekend.

Dustin
Dustin,

Sure.

Have a good one.

Brian.
Avatar of dustock

ASKER

Hey sorry about the wait.  I am getting run time error's every once and awhile with the latest version you sent me.  Its saying the word file is in use and ask's if I want to open it read only.  I choose cancel and then word opens and I get Run time error 4198.  If I go into task manager and kill winword it seems to work fine, but then I get Run time error 1004 Method 'Paste' of object '_Wordkseet' failed.

Thanks,

Dustin
dustock,

No problem.

I think I know what the problem is. The macro issues the command to open the word file, but is too quick in moving on to the next command.

Let me do some checking and I'll submit a new version shortly.

Regards,
Brian.
Avatar of dustock

ASKER

Thanks Brian!
dustock,

OK, I've made some changes...
 - Added a 5 second delay after Word is started.
 - Added a 2 second delay after each Word file is opened.
 - Commented out the delete of the Files sheet.

Please do the following (or as much of it as you reasonably can) ...
(1) Reboot the PC and wait until everything is stable before continuing.
(2) Start Task Manager.
(3) Run the Macro and don't run anything else until it's finished. ( I should probably have said this originally - it's far to easy to mess up the clipboard!).
(4) If the error occurs again,
          - Make a note of the file name (assuming the message tells you).
          - Look at Task Manager's Performance tab and see if "Physical Memory Usage
             History" chart (the bottom one) has gone close to 100%.
          - Allow the file to open read-only. (The macro doesn't change the Word file, so no
             harm done.)
(5) When the run is finished, assuming the error happened, please send me the Word document and the File_List sheet. Let me know what the memory usage was at the error.

BTW, I'm here for an hour or so more, but I'm afraid I'm going away for the week-end then.

Regards,
Brian.
Copy-certain-text-from-word-to-e.xlsm
Avatar of dustock

ASKER

Brian,

I have a meeting starting in 10 minutes that will go for a couple of hours.  I'll catch up with you next week.

Thanks,

Dustin
Dustin,

Sure. Sorry about not being available. If this delay causes you any problem, please feel free to close this (allocate all points to yourself) and create a linking question.

Regards,
Brian.
Dustin,

Hi, I'm back.

Regards,
Brian.
Avatar of dustock

ASKER

Brian,

I did some testing today and its working perfectly.  One thing I seemed to overlook this whole time was that I do need the Machine Name in the output also.  Is that hard to add?

Thanks,

Dustin
ASKER CERTIFIED SOLUTION
Avatar of redmondb
redmondb
Flag of Afghanistan image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of dustock

ASKER

Thanks Brian!  I am going home for the day if I have the chance to test tonight I will otherwise I will do it first thing tomorrow.  

Thanks,

Dustin
Avatar of dustock

ASKER

Thanks again for all the help Brian!
Thanks, Dustin. All the best with your files!
Avatar of dustock

ASKER

Brian,

Can I ask another favor?  I guess there was one more piece of information I needed...