?
Solved

Matching a random pattern with one common character

Posted on 2016-11-18
2
Medium Priority
?
150 Views
Last Modified: 2016-11-22
Hi, I have a file with a large number of character conversion errors and all non-ASCII characters were converted to question marks - "?" - so there are a number of instances of such strings as: Jos?, Company?s, ???????ahs-dhdh, The???hdh--dhd?, etc. The length of the string will vary along with the number of questions mark in it

Is there a regular expression(s) I can use in a Perl script that will match any string with x number of characters and at least one question mark or more in it? Thanks
0
Comment
Question by:hadrons
2 Comments
 
LVL 20

Accepted Solution

by:
jmcg earned 2000 total points
ID: 41893815
Perhaps this little snippet will get you started.

my @TestStrings = ("NoMatch", "Jos?", "Company?s", "???????ahs-dhdh", "The???hdh--dhd?");
for (@TestStrings) {
        printf "%s: %s\n", $_, ($_ =~ m/\?/ ? "matched" : "not matched");
        }

Open in new window


The part of your question I'm not sure I'm understanding properly is the "x number of characters". Using the above script, you can decide what divides strings into strings, then check each one for whether or not it contains a question mark. The results look like:
NoMatch: not matched
Jos?: matched
Company?s: matched
???????ahs-dhdh: matched
The???hdh--dhd?: matched

Open in new window


Another approach might look something like the following:
my $TestData = "NoMatch Jos? Company?s ???????ahs-dhdh The???hdh--dhd?";
for ($TestData =~ m/([\w\?\-]*\?[\w\?\-]*)/g) {
        printf "%s: %s\n", $_, "matched";
        }

Open in new window

In this case, you're pulling out the strings of interest from a large batch of data. The character class [\w\?\-] can be expanded if there are other characters you want considered part of your strings. In this case, the results leave out that first non-matched string and look like:
Jos?: matched
Company?s: matched
???????ahs-dhdh: matched
The???hdh--dhd?: matched

Open in new window

I don't envy whoever has the task of trying to make sensible back-substitutions for the lost characters.
0
 

Author Comment

by:hadrons
ID: 41897954
Hi, I few days I thought I hit the best solutions button, but it may not have went thru, but the solution worked great; thanks
0

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans
Suggested Courses

601 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question