Link to home
Start Free TrialLog in
Avatar of RichardPWolf
RichardPWolfFlag for United States of America

asked on

Remove duplicate value in header of csv

I'm trying to create a monthly report and I've run into a stumbling block.

Setup; each month I get an exported .csv from a security appliance and it has a lengthy csv file.
On the header line I've got 2 cells that are very similar "DestinationLogonID" and "DestinationLogonid". As you can see the "only" difference is the case of the last 2 characters.

import-csv won't work as it sees these 2 header items as identical.
Now the good part is I don't need either of these columns as they are blank so either renaming one of them or deleting one or both columns are an option.

Using the -header option isn't really an option as there are about 147 columns.

My preferred method would be to rename one of the offending column headers.

I can read the file using the Get-Content and that "seems" to be the way to get the data into an array but that's where I'm stuck.

Here's where I'd like to go with this script;
-> Read input file and in this process rename offending header
-> Process specific columns into my report.
-> Write output file for report.

At this point I only need help with the first step as the others are more or less canned scripts that I've already created.

Thanks.
Richard
Avatar of Steve Whitcher
Steve Whitcher

If you can get the data into an array, could you use the .replace method on the array (or the specific element of the array) to replace the bit needed?  for example:

$array = "One","Two","Three"
$array2 = $array.replace("Two", "Four")

or to address the specific element,

$array2 = $array[2].replace("Three", "Five")
Avatar of RichardPWolf

ASKER

I suppose but I was hoping for a search and replace as there are at least 147 headers and I didn't want to make the code "too" unwieldy. But maybe I'm making a mountain out of a molehill. One difficulty is some of these headers have a  rather long name.
Avatar of oBdA
Brute-force approach: get the content, replace one of the offending names in the first element, convert from csv to PS objects, and filter out the offending columns:
$Content = Get-Content C:\Temp\test.csv
$Content[0] = $Content[0].Replace("DestinationLogonid", "DestLogonId")
$Csv = $Content | ConvertFrom-Csv | Select-Object -Property * -ExcludeProperty "DestinationLogonID", "DestLogonId"

Open in new window

OK, looks good but one question since powershell is generally case insensitive how do I handle the two headers that are identical except for the case?
The .replace method is case sensitive.

PS C:\> $array
Test
Three
PS C:\> $array.replace("Three","One")
Test
One
PS C:\> $array.replace("three","One")
Test
Three
PS C:\>
Unlike the "-replace" operator, the "Replace()" method of a string object is case sensitive, so line 2 in the script above will only replace the "DestinationLogonid" with the lower case "id" with "DestLogonId"; "DestinationLogonID" will remain untouched.
From then on, the header is okay and the content can be converted to an object array.
Cool.
did just run it and this is what I got. I did change one variable to go with my script $filecontent.

ConvertFrom-Csv : The member "DestinationLogonId" is already present.
At E:\Scripts\VPN Report\Get-VPNReport.ps1:61 char:24
+     $Csv = $FileContent | ConvertFrom-Csv | Select-Object -Property * -ExcludePrope ...
+                           ~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [ConvertFrom-Csv], ExtendedTypeSystemException
    + FullyQualifiedErrorId : AlreadyPresentPSMemberInfoInternalCollectionAdd,Microsoft.PowerShell.Commands.ConvertFro
   mCsvCommand
Try manually running the 1st 2 lines oDbA provided, then "$content.[0]" and see if the target header was in fact replaced as intended.
The column in the error message is called "DestinationLogonId" with an uppercase "I" and a lowercase "d".
My script has the "DestinationLogonid" with a lowercase "i" and a lowercase "d" as per your first comment.
$Content = Get-Content C:\Temp\test.csv
$Content[0] = $Content[0].Replace("DestinationLogonId", "DestLogonId")
$Csv = $Content | ConvertFrom-Csv | Select-Object -Property * -ExcludeProperty "DestinationLogonID", "DestLogonId"

Open in new window

One thing I just tried was to use the "first" iteration of the duplicate header which is DestinationLogonID and that allowed the script to run.

For testing purposes I added this -> Export-CSV .\vpn-outfile2.csv -notype but didn't get any data in the output file.
Ahhh. Let me try that.
OK, that fixed the original script.

Output file only has two elements A1=Length and A2 = 15.
Everything's running here as expected.
Here's the sophisticated csv I used:
"Header1","Header2","DestinationLogonID","DestinationLogonId","Header5"
"Cola1","Cola2","Cola3","COLa4","Cola5"
"Colb1","Colb2","Colb3","COLb4","Colb5"
"Colc1","Colc2","Colc3","COLc4","Colc5"

Open in new window

Here's the script:
$Content = Get-Content C:\Temp\test.csv
$Content[0] = $Content[0].Replace("DestinationLogonId", "DestLogonId")
$Csv = $Content | ConvertFrom-Csv | Select-Object -Property * -ExcludeProperty "DestinationLogonID", "DestLogonId"
$Csv | Export-Csv C:\Temp\Outfile.csv -NoTypeInformation

Open in new window

And here's the content of Outfile.csv:
"Header1","Header2","Header5"
"Cola1","Cola2","Cola5"
"Colb1","Colb2","Colb5"
"Colc1","Colc2","Colc5"

Open in new window

I'll be offline for a few hours, so I'm not ignoring you if you don't hear from me ...
Hmmmm. Here's my information. As you can see there are a LOT of headers.
Header line from csv -> Event Name      EventInfo      InsertionIP      Manager      DetectionIP      InsertionTime      DetectionTime      Severity      ToolAlias      InferenceRule      ProviderSID      ExtraneousInfo      UniqueID      AccessGranted      AccessProperties      AccessRequested      AccessTarget      ActionTaken      AgentAddress      AgentUid      AgentUniqueId      AgentVersion      AlertActivityType      ApplicationName      ApplicationVersion      AuthPackage      BIOSVersion      BootCleaned      BootSectorInfected      BootSectorsScanned      Category      ChangeDetails      ChangeType      Component      ConnectionName      ConnectionStatus      DN      Description      DestinationAccount      DestinationAccountType      DestinationDomain      DestinationDomainID      DestinationDomainType      DestinationHandleID      DestinationLogonID      DestinationLogonId      DestinationMACAddress      DestinationMachine      DestinationPort      DestinationProcessID      Detail      DisplayName      DomainMember      DomainName      DumpFile      ElementName      ElementProperty      EventMessage      FailureCount      FailurePoint      FailureReason      FastPack      FeaturePackage      FileHandleID      FileName      FilesCleaned      FilesDeleted      FilesInfected      FilesLeftAlone      FilesQuarantined      FilesScanned      Filter      GroupMember      GroupName      GroupType      ImageFile      InfectedFile      InfoMessage      InformationName      InstallOperation      Interface      Interfaces      InternalUserName      LicenseMaximum      LicensedTo      LinkName      ListName      Locale      LogonProcess      LogonType      Machine      MachineType      ManagerVersion      Manufacturer      MechanismName      MechanismType      MemberID      MemberName      MemoryCleaned      MemoryInfected      Message      ModType      Mode      Model      Modification      ModificationType      NewPropertyValue      NewVersion      ObjectHandleID      ObjectName      ObjectServer      ObjectType      OldPropertyValue      OldVersion      OperationID      OperationType      OriginalAccount      OriginalAccountID      PackVersion      Parameters      ParentPID      PatchId      PeerIdentity      PolicyName      Port      PrimaryActionAttempt      PrivilegesExercised      PrivilegesUsed      ProcessID      ProductId      ProfileConfiguration      ProfileName      Protocol      QueryCommand      References      RegistryKeyHandleID      RegistryKeyName      RiskFactor      ScanMessage      ScanType      SecondaryActionAttempt      Service      ServiceName      ServingProcess      SetCommand      SignatureName      SoftwareDate      SoftwarePackage      SoftwareSource      SourceAccount      SourceDomain      SourceHandleID      SourceLogonID      SourceMACAddress      SourceMachine      SourcePort      SourceProcessID      StackTrace      StartMessage      StatusLevel      StatusMessage      StopCondition      StopMessage      SuggestedSolution      TargetMachineList      ToolID      TrojanName      URL      Version      VirusDetected      VirusName      WarningMessage

Code at this point ->
      $FileContent = Get-Content .\vpn-infile.csv
      $FileContent[0] = $FileContent[0].Replace("DestinationLogonid", "DestLogonid")
      $Csv = $FileContent | ConvertFrom-Csv | Select-Object -Property * -ExcludeProperty "DestinationLogonID", "DestLogonId"
      Export-CSV .\vpn-outfile2.csv -notype

Results as before A1 = Length, A2 = 15
Got it. missing a pipe before export-csv.
It actually looks like it deleted both columns. Not a problem but I will need to look more closely at the output. Will save that for tomorrow.

So far Thanks.
You excluded both columns with your Select statement.

It may be unnecessary to post this, since oDbA's solution might do everything you need, but I had started it already, so...
Here's something that doesn't require you to know the name of the duplicate headers.  It just appends a "0" to each duplicate header it finds.  Because it works recursively, it doesn't care how many duplicates there are, it'll just keep appending more "0"s.  Not real pretty, but should handle whether your fields in the .CSV are surrounded by quotes or not.
function unique-header ($header)
{
    if ($headers -contains $header)
    {
        $header = """$($header -replace '"')$i"""
        unique-header $header 
    }
    Else
    {
        $Global:headers += $header
        $header
    }
}
$data = Get-Content  somefile.csv
$data[0] = ($data[0] -split "," | % -begin {$headers = @(); $i=0 } -process { unique-header $_ }) -join ","
$data | ConvertFrom-Csv

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of oBdA
oBdA

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks to both oBdA and footech. I'll work with both of these and see which will work better.
-oBda - I see now. What I didn't understand was the -ExcludeProperty statement. I've been working with Powershell for about a year now but still have much to learn.

-footech- I like the method you have presented as I don't know if any of these columns will ever have anything that I need.

I'll work with both these ideas and let everyone know later today.

Thanks.
-footech- Sorry but still same problem;
ConvertFrom-Csv : The member "DestinationLogonId" is already present.
At E:\Scripts\VPN Report\Get-VPNReport.ps1:90 char:9
+ $data | ConvertFrom-Csv | Export-CSV .\vpn-outfile2.csv -notype
+         ~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [ConvertFrom-Csv], ExtendedTypeSystemException
    + FullyQualifiedErrorId : AlreadyPresentPSMemberInfoInternalCollectionAdd,Microsoft.PowerShell.Commands.ConvertFro
   mCsvCommand
Hmmm...  I can't replicate your error with my code.  I even tested with the big list that you have above.
I did make one change to handle quotes better.
function unique-header ($header)
{
    if ($headers -contains $header)
    {
        $noquote = $header -replace '"'
        $header = $header -replace "$noquote","$noquote$i"
        unique-header $header
    }
    Else
    {
        $Global:headers += $header
        $header
    }
}

Open in new window


One thing it wouldn't handle would be if your .CSV wasn't using a comma as a delimiter, or if there were commas inside a field.
Thanks, I'll try it out.
Thank you to both of you but in the end I've used oDbA solution as it worked out better for my situation. Definitely opened up possibilities with the .Replace object as I've never seen that before.

Again thank you both.
Richard