Solved

Matching a random pattern with one common character

Posted on 2016-11-18
2
106 Views
Last Modified: 2016-11-22
Hi, I have a file with a large number of character conversion errors and all non-ASCII characters were converted to question marks - "?" - so there are a number of instances of such strings as: Jos?, Company?s, ???????ahs-dhdh, The???hdh--dhd?, etc. The length of the string will vary along with the number of questions mark in it

Is there a regular expression(s) I can use in a Perl script that will match any string with x number of characters and at least one question mark or more in it? Thanks
0
Comment
Question by:hadrons
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 20

Accepted Solution

by:
jmcg earned 500 total points
ID: 41893815
Perhaps this little snippet will get you started.

my @TestStrings = ("NoMatch", "Jos?", "Company?s", "???????ahs-dhdh", "The???hdh--dhd?");
for (@TestStrings) {
        printf "%s: %s\n", $_, ($_ =~ m/\?/ ? "matched" : "not matched");
        }

Open in new window


The part of your question I'm not sure I'm understanding properly is the "x number of characters". Using the above script, you can decide what divides strings into strings, then check each one for whether or not it contains a question mark. The results look like:
NoMatch: not matched
Jos?: matched
Company?s: matched
???????ahs-dhdh: matched
The???hdh--dhd?: matched

Open in new window


Another approach might look something like the following:
my $TestData = "NoMatch Jos? Company?s ???????ahs-dhdh The???hdh--dhd?";
for ($TestData =~ m/([\w\?\-]*\?[\w\?\-]*)/g) {
        printf "%s: %s\n", $_, "matched";
        }

Open in new window

In this case, you're pulling out the strings of interest from a large batch of data. The character class [\w\?\-] can be expanded if there are other characters you want considered part of your strings. In this case, the results leave out that first non-matched string and look like:
Jos?: matched
Company?s: matched
???????ahs-dhdh: matched
The???hdh--dhd?: matched

Open in new window

I don't envy whoever has the task of trying to make sensible back-substitutions for the lost characters.
0
 

Author Comment

by:hadrons
ID: 41897954
Hi, I few days I thought I hit the best solutions button, but it may not have went thru, but the solution worked great; thanks
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question