Solved

Removing Extra Line Breaks from a field in CSV file

Posted on 2011-03-02
27
2,049 Views
Last Modified: 2013-11-15
I have a CSV file that was exported from a Goldmine database and I am attempting to import into another Goldmine database.

There are roughly 30ish fields that are exported with field headers.

One of the fields in NOTES.  In the notes field, there are line breaks/returns that break up having each record on its own row. I have to bring into Crystal Reports and then export to be "Goldmine" Friendly.

The problem is that it's reading these line breaks and screwing up all the data - there are roughly 90kish records so this isn't something I can go and fix manually.

How can I go about removing the line breaks just in this NOTES field before bringing it into Crystal Reports?

NOTE ADDITIONAL: This notes field has html code for formatting that the version of Goldmine I am importing into can't support anyway.  If I have to delete the NOTES column altogether I might have to, but I have no way of accessing that column due to the fact the line breaks push data into other columns!
0
Comment
Question by:webdevsquare
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 5
  • 5
  • +3
27 Comments
 
LVL 40

Expert Comment

by:lcohan
ID: 35020710
Why not export the data again into a new csv file that does not include the "notes" column?
0
 
LVL 40

Expert Comment

by:lcohan
ID: 35020746
0
 
LVL 12

Expert Comment

by:Amick
ID: 35020805
In Windows PowerShell try  Get-Content .\MyDocument.txt | %{ $_.Replace("`n",''); }
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 11

Accepted Solution

by:
stevengraff earned 375 total points
ID: 35020825
Other options:

1. Why export/import? Are you changing GoldMine versions? Otherwise you could simply copy/paste and re-attach.
2. If you really must export/import, use dbf, not csv or ascii. Text-based "databases" are the least manageable, least reliable to work with.
0
 

Author Comment

by:webdevsquare
ID: 35021022
@stevengraff yes they are different Goldmine Versions and I don't have access to original source any more

@icohan I can't get access to source anymore.  On those utilities, how does it know not to replace the line break where it needs to be: at the end of the record only?

@Amick, I have no idea what windows Powershell is
0
 
LVL 40

Expert Comment

by:lcohan
ID: 35021104
True - you don't know what it would replace however if you have the CSV file only and can't re-export then I would just open it with EXCEL or OpenOffice and delete the "notes" column entirely if data is no use anyway!
0
 
LVL 40

Expert Comment

by:lcohan
ID: 35021111
Obviously....save a origunal copy somehere safe before any edits to the file. Sorry I missed that
0
 
LVL 11

Expert Comment

by:stevengraff
ID: 35021113
If you import it into Excel... doesn't Excel prompt you to identify field and record delimiters?
0
 

Author Comment

by:webdevsquare
ID: 35021127
That sounds good, except I cant open with Excel/Open Office accurately. Everything gets jumbled in other columns.

when I open in Wordpad and remove the line breaks in the notes column, I'm golden. But cant do that for everyone - there are 90k records!
0
 
LVL 11

Assisted Solution

by:stevengraff
stevengraff earned 375 total points
ID: 35021156
Ouch!!! I'm sorry.

Hey, check out GoldBox. I'm not sure, specifically, what it can do with that particular csv, but one of its purposes in life is to import data into GoldMine. It's no longer supported, but I know you can still buy it. It's either at the add-on store or at redstonesoftbase.com.
0
 
LVL 12

Expert Comment

by:Amick
ID: 35021180
@webdevsquare
Windows PowerShell is Microsoft's task automation framework.

From the Windows command prompt type powershell to access the tool.
0
 
LVL 40

Expert Comment

by:lcohan
ID: 35021282
Did you tried Open Office?

http://download.openoffice.org/
0
 
LVL 13

Assisted Solution

by:Superdave
Superdave earned 125 total points
ID: 35021410
The possibilities of how it is supposed to work in the first place are:
1.  Use linefeed character within the field, but carriage-return linefeed combination at the end of lines.  Excel does this, so if that didn't work, either Goldmine is using a different convention, or your file got mangled by some piece of software trying to fix line endings.

2.  Maybe the fields are all in quotation marks, in which case you will have to figure out how to use a search-and-replace tool (powershell or sed or whatever) to only replace line endings after an odd number of quotes on a line.
0
 
LVL 11

Expert Comment

by:stevengraff
ID: 35021506
MS Word is pretty good and doing global search and replace on control characters, isn't it?
0
 
LVL 101

Expert Comment

by:mlmcc
ID: 35021530
Are you trying to eliminate the move to Crystal step?

mlmcc
0
 

Author Comment

by:webdevsquare
ID: 35021587
@mlmcc I use Crystal because Goldmine needs the quotations around every field and from what I know, Excel won't do that for me automatically (could be wrong)

@superdave  when I open with excel, it jumbles. When I use the Import Data feature, I set it as comma delimited. where is the carriage return info?

0
 
LVL 101

Expert Comment

by:mlmcc
ID: 35021615
Can you create a small CSV file that shows the issue.

I assume by using Crystal you can massage the CSV file so it works on input.

mlmcc
0
 

Author Comment

by:webdevsquare
ID: 35021668
sampledb.txt

Check it out.
0
 

Author Comment

by:webdevsquare
ID: 35022068
Any thoughts guys?
0
 
LVL 13

Expert Comment

by:Superdave
ID: 35022255
It has CRLF everywhere but all fields are in quotes, so basically, if there are an odd number of quotation marks on a line, append the next line to get rid of the interior line feed.  Then repeat the process starting with the same line.

I could figure out how to do it in the Vim editor or in Python or Perl; you could probably use Powershell too but I don't know about that.
0
 

Author Comment

by:webdevsquare
ID: 35022379
@superdave, thanks for the feedback.  I see what you mean about the odd number of quotation marks logic.  Are any of the other editors you mentioned (free) and easy to download/use?
0
 
LVL 13

Expert Comment

by:Superdave
ID: 35022432
Vim is an editor; the other two are scripting languages.  They're all free and available at the obvious www.*.org URLs.
0
 
LVL 12

Expert Comment

by:Amick
ID: 35023541
Because the account number appears to be a distinct pattern, perhaps that's where the logic should be applied. Just keep appending segments together until the beginning of the segment matches the account number pattern.
0
 

Author Comment

by:webdevsquare
ID: 35023579
I appreciate all the feedback, but my strength is definitely not in figuring out a new editor and applying this logic.  
0
 
LVL 11

Assisted Solution

by:stevengraff
stevengraff earned 375 total points
ID: 35023630
Clicking one's profile may lead to contact information. I'd love to take a crack at it. Post a snippet if you'd like.
0
 

Author Comment

by:webdevsquare
ID: 35023651
@mlmcc didn't know. thanks.
0

Featured Post

[Live Webinar] The Cloud Skills Gap

As Cloud technologies come of age, business leaders grapple with the impact it has on their team's skills and the gap associated with the use of a cloud platform.

Join experts from 451 Research and Concerto Cloud Services on July 27th where we will examine fact and fiction.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to protecting Oracle Database servers and systems, there are a ton of myths out there. Here are the most common.
This post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question