Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Importing invoice lines in an array in powershell

Posted on 2014-12-12
14
Medium Priority
?
122 Views
Last Modified: 2014-12-12
Hello,

I need to script the import of invoices into an array.
The problem is that the "table" is not always the same....
I have been trying to do it manually with notepad and just with find and replace create char delimited file.
The only way I see it working is to start at the end:

so first put the "SR" or "OS" in a column, then the total price if it exists, then the price, then the amount, then the description and then the partnumber.

I have no idea if it is possible due to the bad source...

so my output would be 6 columns with data

THanks
invoice-lines.txt
0
Comment
Question by:Hans de Jongh
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 3
  • +1
14 Comments
 
LVL 28

Expert Comment

by:Dan McFadden
ID: 40495741
Could adjust the data source?  What I mean is, could the rows with the 2 missing columns, have "0" entered if no entry exists.  If the data is coming out of a database, can the 2 fields in the table be defaulted to have an automatic entry of "0" when a record is created?

This would make every row have 6 columns.  If make it only a little easier to import.

Otherwise... You could read in a line and split it on the separator 'space'.  Then trim the whitespace, so there are no extra blank characters on the field.  Check to see if the split destination array has either 4 or 6 elements, if there are 6 elements, just copy that data into a new array.  If the array has 4 elements, move the data into the new array but add the missing fields and define them as "0".

Now you'll have the final array with 6 elements, and all values are populated.

Dan
0
 

Author Comment

by:Hans de Jongh
ID: 40495755
hi,

no these are converted pdf's. so no adjusting possible.
No I thought about that aswell but the problem is that sometimes in the text you find whitepaces of 6 elements so that doesn't work either...
0
 
LVL 41

Accepted Solution

by:
footech earned 1400 total points
ID: 40495870
Regex is very powerful.  Luckily there appears to be enough of a pattern to match here.  It was a bit of a pain that the description can include a block of spaces.  Anyway, here's what I have.
$pattern = "(?<part>^[0-9]{3}-[a-z0-9]{4,5}) +(?<desc>.+?) {7,}(?<qty>[0-9]{1,3}) +(?:(?<price>[0-9.,]+) +(?<total>[0-9.,]+) +)?(?<code>(SR|OS))$"
Get-Content .\temp\invoice-lines.txt | Where {$_ -match $pattern} | ForEach `
{
    New-Object PsObject -Property ([ordered]@{
                    Part = $Matches.part
                    Description = $Matches.Desc
                    Quantity = [int32]$Matches.qty
                    Price = $Matches.price
                    Total = $Matches.total
                    Code =  $Matches.code
                    })
}

Open in new window

I might even replace line 6 with the following to remove the blocks of spaces from the description.
                    Description = $Matches.Desc -replace " {2,}"," "

Open in new window

0
Introducing the WatchGuard 420 Access Point

WatchGuard's newest access point includes an 802.11ac Wave 2 chipset, providing the fastest speeds for VoIP, video and music streaming, and large data file transfers. Additionally, enjoy the benefits of strong security as the 3rd radio delivers dedicated WIPS protection!

 
LVL 28

Expert Comment

by:Dan McFadden
ID: 40495875
So here's a partial solution.  The code below reads the file into a variable, then goes thru each line and replaces anything that match 2 or more space with a pipe.

$SourceData = gc source.txt -totalcount 10
foreach ($row in $SourceData)
{
    $row = [regex]::replace($row,"\b[ ]{2,}\b",'|')
    $row
}

Open in new window


This leaves 6 lines with an issue because there is only 1 space between the first field and second.  

Dan
0
 
LVL 28

Expert Comment

by:Dan McFadden
ID: 40495883
The example above is for test and only looks at the first 10 lines.  Limits the data for testing.

Only thing left is to try to find lines with only 1 space between the first field and second.  Knowing that the second field can contain a phrase that may contain single spaces between words.

Dan
0
 
LVL 71

Assisted Solution

by:Qlemo
Qlemo earned 600 total points
ID: 40495894
Similar solution as footech, who had to post while I was assembling something @*#. Doesn't rely on PowerShell 4:
cls
Get-Content c:\temp\ee\invoice-lines.txt |
  ? {$_ -match '(?<PartNo>\S*)\s+(?<Description>.*)\s\s+(?<Amount>\d+)\s+(?<Prize>[0-9.]+,\d\d)?\s+(?<Total>[0-9.]+,\d\d)?\s+(?<SR>..)'} |
  % {
    New-Object PsObject -Property @{
      PartNo      = $matches.PartNo
      Description = $matches.Description
      Amount      = $matches.Amount
      Prize       = $matches.Prize
      Total       = $matches.Total
      SR          = $matches.SR
    }
  } |
  select PartNo, Description, Amount, Prize, Total, SR | ft -AutoSize

Open in new window

0
 

Author Comment

by:Hans de Jongh
ID: 40495933
thanks everybody, I guess I give the points to the first who solved this? (sorry)
0
 

Author Comment

by:Hans de Jongh
ID: 40495934
check 10000 lines and not a single failure!
0
 
LVL 41

Expert Comment

by:footech
ID: 40495948
:)
I find it a bit interesting to see the differing regex patterns that others use to parse the same input.

I know Qlemo knows this, but for others who may not, you can adjust mine so that it doesn't rely on PowerShell 3+ (not just 4) just by removing [ordered] from line 4.  You would then have a use a Select statement like Qlemo did to order the columns if so desired.
0
 
LVL 41

Expert Comment

by:footech
ID: 40495955
It's a judgement call.  When you have multiple working solutions posted at about the same time, I think it's appropriate to split points.  As a topic adviser, Qlemo could probably advise you better on this than I.
0
 
LVL 28

Expert Comment

by:Dan McFadden
ID: 40495956
Interesting parsing challenge!  Glad the issue was resolved and I get to take away a nice regex lesson.

Dan
0
 

Author Comment

by:Hans de Jongh
ID: 40495967
so how can I get back the points and give them away (split them)
in a 70/30 split
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How to get Spreadsheet Compare 2016 working with the 64 bit version of Office 2016
In this post we will be converting StringData saved within a text file into a hash table. This can be further used in a PowerShell script for replacing settings that are dynamic in nature from environment to environment.
The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …
This Micro Tutorial will demonstrate how to create pivot charts out of a data set. I also added a drop-down menu which allows to choose from different categories in the data set and the chart will automatically update.

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question