Link to home
Start Free TrialLog in
Avatar of Exchange User
Exchange User

asked on

Move PDF files according to the customer information inside them

Hi all,

We generate invoices for all different customers as per our own invoice numbers. By looking at the invoice numbers, you cannot tell which invoice belongs to which customer. Is there any way, we can setup a powershell script that can go inside the file and check for the customer name like *abc chemicals* and then move that PDF to a folder belonging to ABC chemicals.

I am then planning to write a script that will pull all PDFs from that folder of ABC chemicals and email all those PDFs to the customer.

Thanks guys.
Avatar of Bill Prew
Bill Prew

Can you provide a sample of the PDF involved?

~bp
You may be able to convert your existing PDF to an OCR document. Then you can setup a powershell/SQL command script command to read the file contents and rename the before or after it is moved to its destination. You can let SQL Agent run it until it is processed completely.

This is something that a document management system can do easily. With the right workflow you can tailor your processes according to your needs.
Avatar of Exchange User

ASKER

@bill, I just created a custom pdf as an example
ABC.pdf
@Manuel,

Thank you for your reply. So there is not straightforward way to do it ?
ASKER CERTIFIED SOLUTION
Avatar of Joe Winograd
Joe Winograd
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Joe,

Many thanks for your reply. I will check out the article right now, but I thought about replying to your questions first.

(1) Are the Customer Name and Invoice Number always on the first line of the page? If not, where can they be on the page?

No, the company name is not on the first line of the page, I just created a sample PDF as an example. I tried a tool called A-PDF Automail in which I could point a macro to the location where the company name is located and that point of the page is giving a reading of (68,154). Not sure if that helps.

(2) Are the Customer Name and Invoice Number always in the same columns? If not, where can they be on the line? And is any other text between them?

Yes customer name and invoice numbers are always in the same columns. And yes there is the address of the customer in between them and probably a serial number.

(3) Are invoices all one page or can an invoice contain multiple pages?

Invoices are usually one page, but it depends on the number of items being invoiced, but I would say 90% of the time it is one page.

(4) Is a single PDF file just one invoice or can there be multiple invoices in one PDF file?

We actually want single invoice per PDF. Some of our customers dont like all the invoices to be merged under 1 PDF.

(5) Do you want to rename each file with the Customer Name (and Invoice Number?) or leave the file name as-is and simply move the file into the folder for that Customer Name?

No we dont want to rename any file. Or if there is a neat option to rename files and put customer name and invoice number together, then we can take a look at it.

(6) Will the destination folder for each Customer Name already exist or will it need to be created when it doesn't exist?

Destination folder already exists, but again, this is a setting which we can change easily.

(7) If a folder needs to be created based on Customer Name, what should the program do if the Customer Name has a character that is illegal in a Windows folder/file name? For example, E*Trade, "K" Line, Royal Dutch/Shell, etc. — what to do with the asterisk, double quote, and slash? FYI, the following characters are not allowed in folder/file names (other than in the drive/path and as wildcards):

I think the program should not do anything in such case or create folders based on first few letters of the customer name ?

< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
> No, the company name is not on the first line of the page, I just created a sample PDF as an example.

OK, but if not the first line, is it always on the same line? If not, can you change the format of the PDF to have a unique identifier for the name, something like this:

Customer Name: ABC Company

The point is, the program must know how/where to find the Customer Name and must also know what ends it — end-of-line? two or more spaces in a row? a special character? The two methods I've used are (1) a fixed location (line number and column number) and (2) a unique ID, such as shown above (Customer Name:), which typically ends at the end-of-line.

> (68,154)

Those are probably the (X,Y) coordinates in pixels.

> Yes customer name and invoice numbers are always in the same columns.

That's good. And will be even better if they're always on the same line number.

> And yes there is the address of the customer in between them and probably a serial number.

Not an issue in this case, since the program doesn't need to get the invoice number from the file's contents.

> but I would say 90% of the time it is one page

Not an issue, since there's only one invoice per PDF file.

> We actually want single invoice per PDF.

That's good — makes the solution easier.

> No we dont want to rename any file.

Also makes for an easier solution.

> Or if there is a neat option to rename files and put customer name and invoice number together, then we can take a look at it.

Yes, that's the most common approach that my clients have wanted — some want to replace the file name completely, but most want to add something in the file contents (such as Customer Name) to the current file name (such as Invoice Number or Account Number).

> Destination folder already exists, but again, this is a setting which we can change easily.

The program can easily check for the existence of the folder and create it when it doesn't exist, but the exact name is important. In other words, if the folder name is ABC Corporation, but the invoice says ABC Corp, an "exact match" search won't get a hit. Punctuation can also be an issue, such as ABC LLC vs. ABC, LLC vs. ABC L.L.C.

> I think the program should not do anything in such case or create folders based on first few letters of the customer name ?

Another idea is to substitute a legal character for each illegal one, but make it a character that doesn't appear in company names, such as a tilde or underscore (otherwise, the renamed company name could bump into another company's name).

> I will check out the article right now

OK, let me know what you think after reading it. Also, I recommend reading the user comments, which resulted in some interesting discussions. Regards, Joe
Hi Joe,

I went through the article and the comments. I think this is exactly what I was looking for.

Regards
Good to know that's what you're looking for. Now go back and answer the questions in my previous post.
OK, but if not the first line, is it always on the same line? If not, can you change the format of the PDF to have a unique identifier for the name, something like this:

Customer Name: ABC Company

The point is, the program must know how/where to find the Customer Name and must also know what ends it — end-of-line? two or more spaces in a row? a special character? The two methods I've used are (1) a fixed location (line number and column number) and (2) a unique ID, such as shown above (Customer Name:), which typically ends at the end-of-line.

Yes, the customer number is always on the same line. But where it ends ? I have copied below from one of our invoices. Maybe this will answer your question ?

ABC Company Limited
12345 Young Avenue
Attn: John Smith
City State Postal Code


> Destination folder already exists, but again, this is a setting which we can change easily.

The program can easily check for the existence of the folder and create it when it doesn't exist, but the exact name is important. In other words, if the folder name is ABC Corporation, but the invoice says ABC Corp, an "exact match" search won't get a hit. Punctuation can also be an issue, such as ABC LLC vs. ABC, LLC vs. ABC L.L.C.

I'm sure we can test this one out as well. If the requirement of the program is that it needs a folder with the exact name as in the invoice, then we can probably ask the program to create folders according the invoice name convention probably.

> I think the program should not do anything in such case or create folders based on first few letters of the customer name ?

Another idea is to substitute a legal character for each illegal one, but make it a character that doesn't appear in company names, such as a tilde or underscore (otherwise, the renamed company name could bump into another company's name).

Yes that is a very good option we can test with.
> Yes, the customer number is always on the same line.

What line number?

> But where it ends ?

If it always looks like that — with ABC Company Limited on a line by itself — then the company name ends at the end-of-line, which is easy to handle.

> create folders according the invoice name convention

That should work.

> Yes that is a very good option we can test with.

OK, so have your program that creates the folders substitute an underscore for any character that is illegal in a file name.

At this point I need a real invoice to test. I realize there may be private/sensitive info on it, so feel free to change whatever you don't want to expose, such as company name, address, phone, etc. If you don't want to post it publicly, send it to me via EE's Message system.

One other question — what is your company's budget for this software? It doesn't have to be exact — just looking for a ballpark figure. Regards, Joe