John Darby
asked on
trim the first 2 characters from each line in a file
I have 4 types of files I receive which are outputs from a source I have no control over. Each file has in common that there are a consistent 2-4 characters in front of each line (in the 1st column for XLS docs). I want a scripted way to remove these. Can you suggest a method?
Thank you!
JohnD
Thank you!
JohnD
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks folks! I don't have an example for you atm, but the extra characters seem to be random ASCII which precede the contents of each line in the docs. Usually it is 2 characters, sometimes 4 and occasionally it will be 5-9 characters, including one or more spaces, between the "garbage" characters.
The docs come from text dumps (they call them reports) from an MVS app called Universe, where one of the conversions that happens is EBCDIC-->ASCII. It's a black box to me.
The docs come from text dumps (they call them reports) from an MVS app called Universe, where one of the conversions that happens is EBCDIC-->ASCII. It's a black box to me.
i think that we really need you to prepare an example...
there must be something to figure out how many characters to delete...
How can we identify those "garbage" characters...
there is something missing in the formulation of this problem.
there must be something to figure out how many characters to delete...
How can we identify those "garbage" characters...
there is something missing in the formulation of this problem.
ASKER
I can get you an example...I just need to strip customer-specific data, first since this has payroll info in it. :)
Agree with that. We are waiting for you then.
Here is a code to select an Excel file and replace string bits from the 1st column. Note, multi-select Excel file is possible as well as the string bits (for cleaning) and Column Index (data column) is configurable.
To run the code in the attached file, press CNTRL + SHIFT + M
To run the code in the attached file, press CNTRL + SHIFT + M
Option Explicit
Private Const sJunk As String = "#$%!"
Dim sJunkFormula As String
Public Sub CleanFileData()
Dim oOut, oWB As Workbook, nCtr As Integer
On Error Resume Next
oOut = Application.GetOpenFilename("Excel Files, *.xl*;*.xls;*.xlt", 2, "Select the files to clean", , True)
If Not IsArray(oOut) Then GoTo ErrCancel
sJunkFormula = ""
For nCtr = 1 To UBound(oOut)
Set oWB = Application.Workbooks.Open(oOut(nCtr))
CleanRange oWB.Sheets(1).Range("A:A")
oWB.Save
oWB.Close
Set oWB = Nothing
Next
Exit Sub
ErrCancel:
MsgBox "No File was selected.", vbCritical
End Sub
Public Sub CleanRange(oRange As Range)
Dim nCtr As Integer, oCell As Range
Dim m_sJunkFormula As String
If sJunkFormula = "" Then
sJunkFormula = """~|~"""
For nCtr = 1 To Len(sJunk)
sJunkFormula = "SUBSTITUTE(" & sJunkFormula & ",""" & Mid(sJunk, nCtr, 1) & ""","""")"
Next
End If
Set oRange = Application.Intersect(oRange, oRange.Worksheet.UsedRange)
oRange.NumberFormat = "General"
'oCell.Formula = "=" & Replace(sJunkFormula, "~|~", Replace(oRange.Cells(1).Address, "$", ""))
For Each oCell In oRange.Cells
m_sJunkFormula = "=" & Replace(sJunkFormula, "~|~", oCell.Value)
oCell.Value = Application.Evaluate(m_sJunkFormula)
Next
End Sub
CleanFiles.xlsm
John
If you can open these files in Excel and all you want to do is remove the first 2 characters of each line you could use Data>Text to columns... with fixed width.
You would only need one break line after the characters you don't need and you can chose not to import that column on the 3rd step.
If it isn't going to be the first 2 characters each time then you could create code that does the text to columns for however many characters it is.
Here's the code for 2 characters.
Selection.TextToColumns Destination:=Range("A1"), DataType:=xlFixedWidth, _
FieldInfo:=Array(Array(0, 9), Array(2, 1)), TrailingMinusNumbers:=True
The Array... part breaks down to this.
Array(0,9) - 1st column, do not import
Array(2,1) -2nd column, import starting at character 2
Note the character position starts at 0.
So for 3 characters you would change Array(2,1) to Array(3,1), for 4 Array(4,1)... and so on.
This would work if you can open the file in Excel and determine the no of characters to ignore.
If you can open these files in Excel and all you want to do is remove the first 2 characters of each line you could use Data>Text to columns... with fixed width.
You would only need one break line after the characters you don't need and you can chose not to import that column on the 3rd step.
If it isn't going to be the first 2 characters each time then you could create code that does the text to columns for however many characters it is.
Here's the code for 2 characters.
Selection.TextToColumns Destination:=Range("A1"), DataType:=xlFixedWidth, _
FieldInfo:=Array(Array(0, 9), Array(2, 1)), TrailingMinusNumbers:=True
The Array... part breaks down to this.
Array(0,9) - 1st column, do not import
Array(2,1) -2nd column, import starting at character 2
Note the character position starts at 0.
So for 3 characters you would change Array(2,1) to Array(3,1), for 4 Array(4,1)... and so on.
This would work if you can open the file in Excel and determine the no of characters to ignore.
ASKER
Guys, I am having a hard time getting you a sample file. The problem lies with how much data (SSN, names, addresses, policy number...) is sensitive. I think I just have to be thnkful for the help you've given already. Thank you so much!
JohnD
JohnD
ASKER
Thank you again!
=RIGHT(A2,LEN(A2)-2)
Also, you say between 2-4 characters. What is the logic behind removing 2 or 4?