Solved

Move PDF files according to the customer information inside them

Posted on 2016-11-03
11
24 Views
Last Modified: 2016-11-04
Hi all,

We generate invoices for all different customers as per our own invoice numbers. By looking at the invoice numbers, you cannot tell which invoice belongs to which customer. Is there any way, we can setup a powershell script that can go inside the file and check for the customer name like *abc chemicals* and then move that PDF to a folder belonging to ABC chemicals.

I am then planning to write a script that will pull all PDFs from that folder of ABC chemicals and email all those PDFs to the customer.

Thanks guys.
0
Comment
Question by:Exchange User
11 Comments
 
LVL 51

Expert Comment

by:Bill Prew
Comment Utility
Can you provide a sample of the PDF involved?

~bp
0
 

Expert Comment

by:Manuel Del Villar
Comment Utility
You may be able to convert your existing PDF to an OCR document. Then you can setup a powershell/SQL command script command to read the file contents and rename the before or after it is moved to its destination. You can let SQL Agent run it until it is processed completely.

This is something that a document management system can do easily. With the right workflow you can tailor your processes according to your needs.
0
 
LVL 3

Author Comment

by:Exchange User
Comment Utility
@bill, I just created a custom pdf as an example
ABC.pdf
0
 
LVL 3

Author Comment

by:Exchange User
Comment Utility
@Manuel,

Thank you for your reply. So there is not straightforward way to do it ?
0
 
LVL 51

Accepted Solution

by:
Joe Winograd, EE MVE earned 500 total points
Comment Utility
This EE article discusses a program that I wrote a few years ago that does what you're looking for:
How To Rename-Move a Batch of PDF Files Based on Contents of the Files

As stated at the article, I removed the source code last year, which I thought would be temporary, but now I'm thinking may be permanent. In any event, I have done numerous customizations of the program, which include a Windows installer (a Setup.exe), for numerous EE members (all back in the day of the "Hire Me" button, which no longer exists — replaced by Live and Gigs). I'll check to see if one of my existing programs will work on your sample PDF, either as-is or with minor changes. The program itself already recurses into the source folder to an unlimited depth, so there's no need for PowerShell or a script of any kind.

I just tested your PDF with a key part of the solution (the PDFtoText tool, which is one of the Xpdf utilities), and it worked fine, creating this text:

ABC Company  Invoice # 12345

Questions for you:

(1) Are the Customer Name and Invoice Number always on the first line of the page? If not, where can they be on the page?

(2) Are the Customer Name and Invoice Number always in the same columns? If not, where can they be on the line? And is any other text between them?

(3) Are invoices all one page or can an invoice contain multiple pages?

(4) Is a single PDF file just one invoice or can there be multiple invoices in one PDF file?

(5) Do you want to rename each file with the Customer Name (and Invoice Number?) or leave the file name as-is and simply move the file into the folder for that Customer Name?

(6) Will the destination folder for each Customer Name already exist or will it need to be created when it doesn't exist?

(7) If a folder needs to be created based on Customer Name, what should the program do if the Customer Name has a character that is illegal in a Windows folder/file name? For example, E*Trade, "K" Line, Royal Dutch/Shell, etc. — what to do with the asterisk, double quote, and slash? FYI, the following characters are not allowed in folder/file names (other than in the drive/path and as wildcards):

< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)

Btw, I do not recommend using OCR for this. Since your PDFs are PDF Normal files, meaning they already have text, it's not a good idea to go from text to image and then back to text via OCR. Regards, Joe
0
Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 
LVL 3

Author Comment

by:Exchange User
Comment Utility
Hi Joe,

Many thanks for your reply. I will check out the article right now, but I thought about replying to your questions first.

(1) Are the Customer Name and Invoice Number always on the first line of the page? If not, where can they be on the page?

No, the company name is not on the first line of the page, I just created a sample PDF as an example. I tried a tool called A-PDF Automail in which I could point a macro to the location where the company name is located and that point of the page is giving a reading of (68,154). Not sure if that helps.

(2) Are the Customer Name and Invoice Number always in the same columns? If not, where can they be on the line? And is any other text between them?

Yes customer name and invoice numbers are always in the same columns. And yes there is the address of the customer in between them and probably a serial number.

(3) Are invoices all one page or can an invoice contain multiple pages?

Invoices are usually one page, but it depends on the number of items being invoiced, but I would say 90% of the time it is one page.

(4) Is a single PDF file just one invoice or can there be multiple invoices in one PDF file?

We actually want single invoice per PDF. Some of our customers dont like all the invoices to be merged under 1 PDF.

(5) Do you want to rename each file with the Customer Name (and Invoice Number?) or leave the file name as-is and simply move the file into the folder for that Customer Name?

No we dont want to rename any file. Or if there is a neat option to rename files and put customer name and invoice number together, then we can take a look at it.

(6) Will the destination folder for each Customer Name already exist or will it need to be created when it doesn't exist?

Destination folder already exists, but again, this is a setting which we can change easily.

(7) If a folder needs to be created based on Customer Name, what should the program do if the Customer Name has a character that is illegal in a Windows folder/file name? For example, E*Trade, "K" Line, Royal Dutch/Shell, etc. — what to do with the asterisk, double quote, and slash? FYI, the following characters are not allowed in folder/file names (other than in the drive/path and as wildcards):

I think the program should not do anything in such case or create folders based on first few letters of the customer name ?

< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
> No, the company name is not on the first line of the page, I just created a sample PDF as an example.

OK, but if not the first line, is it always on the same line? If not, can you change the format of the PDF to have a unique identifier for the name, something like this:

Customer Name: ABC Company

The point is, the program must know how/where to find the Customer Name and must also know what ends it — end-of-line? two or more spaces in a row? a special character? The two methods I've used are (1) a fixed location (line number and column number) and (2) a unique ID, such as shown above (Customer Name:), which typically ends at the end-of-line.

> (68,154)

Those are probably the (X,Y) coordinates in pixels.

> Yes customer name and invoice numbers are always in the same columns.

That's good. And will be even better if they're always on the same line number.

> And yes there is the address of the customer in between them and probably a serial number.

Not an issue in this case, since the program doesn't need to get the invoice number from the file's contents.

> but I would say 90% of the time it is one page

Not an issue, since there's only one invoice per PDF file.

> We actually want single invoice per PDF.

That's good — makes the solution easier.

> No we dont want to rename any file.

Also makes for an easier solution.

> Or if there is a neat option to rename files and put customer name and invoice number together, then we can take a look at it.

Yes, that's the most common approach that my clients have wanted — some want to replace the file name completely, but most want to add something in the file contents (such as Customer Name) to the current file name (such as Invoice Number or Account Number).

> Destination folder already exists, but again, this is a setting which we can change easily.

The program can easily check for the existence of the folder and create it when it doesn't exist, but the exact name is important. In other words, if the folder name is ABC Corporation, but the invoice says ABC Corp, an "exact match" search won't get a hit. Punctuation can also be an issue, such as ABC LLC vs. ABC, LLC vs. ABC L.L.C.

> I think the program should not do anything in such case or create folders based on first few letters of the customer name ?

Another idea is to substitute a legal character for each illegal one, but make it a character that doesn't appear in company names, such as a tilde or underscore (otherwise, the renamed company name could bump into another company's name).

> I will check out the article right now

OK, let me know what you think after reading it. Also, I recommend reading the user comments, which resulted in some interesting discussions. Regards, Joe
0
 
LVL 3

Author Comment

by:Exchange User
Comment Utility
Hi Joe,

I went through the article and the comments. I think this is exactly what I was looking for.

Regards
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
Good to know that's what you're looking for. Now go back and answer the questions in my previous post.
0
 
LVL 3

Author Comment

by:Exchange User
Comment Utility
OK, but if not the first line, is it always on the same line? If not, can you change the format of the PDF to have a unique identifier for the name, something like this:

Customer Name: ABC Company

The point is, the program must know how/where to find the Customer Name and must also know what ends it — end-of-line? two or more spaces in a row? a special character? The two methods I've used are (1) a fixed location (line number and column number) and (2) a unique ID, such as shown above (Customer Name:), which typically ends at the end-of-line.

Yes, the customer number is always on the same line. But where it ends ? I have copied below from one of our invoices. Maybe this will answer your question ?

ABC Company Limited
12345 Young Avenue
Attn: John Smith
City State Postal Code


> Destination folder already exists, but again, this is a setting which we can change easily.

The program can easily check for the existence of the folder and create it when it doesn't exist, but the exact name is important. In other words, if the folder name is ABC Corporation, but the invoice says ABC Corp, an "exact match" search won't get a hit. Punctuation can also be an issue, such as ABC LLC vs. ABC, LLC vs. ABC L.L.C.

I'm sure we can test this one out as well. If the requirement of the program is that it needs a folder with the exact name as in the invoice, then we can probably ask the program to create folders according the invoice name convention probably.

> I think the program should not do anything in such case or create folders based on first few letters of the customer name ?

Another idea is to substitute a legal character for each illegal one, but make it a character that doesn't appear in company names, such as a tilde or underscore (otherwise, the renamed company name could bump into another company's name).

Yes that is a very good option we can test with.
0
 
LVL 51

Expert Comment

by:Joe Winograd, EE MVE
Comment Utility
> Yes, the customer number is always on the same line.

What line number?

> But where it ends ?

If it always looks like that — with ABC Company Limited on a line by itself — then the company name ends at the end-of-line, which is easy to handle.

> create folders according the invoice name convention

That should work.

> Yes that is a very good option we can test with.

OK, so have your program that creates the folders substitute an underscore for any character that is illegal in a file name.

At this point I need a real invoice to test. I realize there may be private/sensitive info on it, so feel free to change whatever you don't want to expose, such as company name, address, phone, etc. If you don't want to post it publicly, send it to me via EE's Message system.

One other question — what is your company's budget for this software? It doesn't have to be exact — just looking for a ballpark figure. Regards, Joe
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Exchange server is not supported in any cloud-hosted platform (other than Azure with Azure Premium Storage).
This article explains how to prepare an HTML email signature template file containing dynamic placeholders for users' Azure AD data. Furthermore, it explains how to use this file to remotely set up a department-wide email signature policy in Office …
In this video we show how to create a Distribution Group in Exchange 2013. We show this process by using the Exchange Admin Center. Log into Exchange Admin Center.: First we need to log into the Exchange Admin Center. Navigate to the Recipients >>…
how to add IIS SMTP to handle application/Scanner relays into office 365.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now