Remove single characters, extra spaces, insert brackets around keyword

Posted on 2010-09-05
Last Modified: 2012-05-10
I'm looking for a way to remove single characters, extra spaces and insert brackets around keywords:

a word a word
b words words
words z words
z z z words
i words i words etc

output I'm interested in:
word word
words words
words words
words words
Question by:faithless1

Expert Comment

ID: 33608218
Made this little script, seems like it works, it is in python though.

import sys;

hFile = open("a.txt", 'r');

line = hFile.readline();
iCnt = 0;
iLast = str();
iNLine = str();
while len(line) > 0:
        for by in line:
                if by == " " or by == "\n":
                        if iCnt > 1:
                                sys.stdout.write(iLast + " ");
                        iCnt = 0;
                        iLast = "";
                        iLast += by;
                        iCnt += 1;
        line = hFile.readline();
LVL 84

Accepted Solution

ozo earned 500 total points
ID: 33608238
I don't see any brackets in the output you say you are interested in
perl -lpe 's/etc//g;s/\b\w\b//g;s/\s+/ /g;s/^\s+//;s/\s+$//;' << ENDHERE
a word a word
b words words
words z words
z z z words
i words i words etc
LVL 65

Expert Comment

ID: 33609160
this procedure takes a line and shows all the words with length > 1

sub SplitString {

      my($line) = shift;

      @myStrings = split(' ', $line);

      foreach (@myStrings) {
            my($lineLen) = length($_);
            if ($lineLen > 1) {
                  print $_ . ' ';

Only issue is what you mean about brackets as you say insert around keywords. it so then you might need to handle them and wrap as you iterate
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

LVL 13

Expert Comment

ID: 33611877
To put ozo's regular expressions into PHP it would be something like this, if your original file is in $string:

$string = preg_replace('/\b\w\b/','',$string)
$string = preg_replace('/\s+/',' ',$string)
$string = preg_replace('/^\s+/','',$string)
$string = preg_replace('/\s+$/','',$string)

LVL 15

Expert Comment

ID: 33612408
Python solution is just:
for line in open('data'):
  print ' '.join(word for word line.strip().split() if len(word)>1)

Open in new window


Expert Comment

ID: 33633893
I think your example is not very clear for the question. Can you give some more?

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How to install Ubuntu 16 in DELL venue 8 pro 20 297
Penetration Testing home based work 3 91
remove one line from a file in solaris 8 42
LogmeIn using Linux Ubuntu 16.04 6 63
Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

828 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question