• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1357
  • Last Modified:

Java ReplaceAll Control Characters

Is there a better way to replace all control characters in a String using java than the below.
I am sure there is some better regex I could use
product.getDescription().replaceAll("\\v", " ").replaceAll("\\c_","").replaceAll("\\c]","")
.replaceAll("\\c^", "").replaceAll("\\cA", "").replaceAll("\\cB","").replaceAll("\\cC","")
.replaceAll("\\cD", "").replaceAll("\\cE", "").replaceAll("\\cF","").replaceAll("\\cG","")
.replaceAll("\\cH","").replaceAll("\\cI","").replaceAll("\\cJ","").replaceAll("\\cK","")
.replaceAll("\\cL","").replaceAll("\\cM","").replaceAll("\\cN","").replaceAll("\\cO","")
.replaceAll("\\cP","").replaceAll("\\cQ","").replaceAll("\\cR","").replaceAll("\\cS","")
.replaceAll("\\cT","").replaceAll("\\cU","").replaceAll("\\cV","").replaceAll("\\cW","")
.replaceAll("\\cX","").replaceAll("\\cY","").replaceAll("\\cZ",""));

Open in new window

0
booktopia
Asked:
booktopia
  • 6
  • 3
  • 3
  • +1
1 Solution
 
objectsCommented:
there are various 'classes' of characters available
check the javadoc for what matches your needs

http://download.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
0
 
for_yanCommented:

this is about all cntrl charcaters in PHP - perhaps it could help
http://stackoverflow.com/questions/1497885/remove-control-characters-from-php-string
0
 
objectsCommented:
eg.

\p{Cntrl}      A control character: [\x00-\x1F\x7F]

or if one of the classes does not meet your need then use a range
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
for_yanCommented:
This seems to work with those cntrls which could  check:

 System.out.println(s.replaceAll("[\\p{Cntrl}]", ""));
0
 
CEHJCommented:
You really need to define exactly what you want - there are some characters that are unprintable that are NOT control characters
0
 
for_yanCommented:

This seems to be really working:
replaceAll("[\\000-\\037].*?","")

            String s = "a\011tr\\\020?,\026c\013/\023ab" + '\024' + '\025' +"!\003df";
            System.out.println("s (original):  " + s);
            System.out.println("s (processed): " +  s.replaceAll("[\\000-\\037].*?",""));

Open in new window


Output:

s (original):  a	tr\?,c/ab!df
s (processed): atr\?,c/ab!df

Open in new window

0
 
for_yanCommented:
Mind that this
replaceAll("[\\000-\\037].*?","")
will also remove tabs and new lines.

If we want to keep those, we should rather use this:
s.replaceAll("[\\000-\\010\\013\\014\\016-\\037].*?","")

code:
            String s = "a\011tr\t\\\020?,\026c\013/\023ab" + "\015\012" /*System.getProperty("line.separator")*/ +'\024' + '\025' + "!\003df";
            System.out.println("s (original):  " + s);
            System.out.println("s (processed): " +  s.replaceAll("[\\000-\\010\\013\\014\\016-\\037].*?",""));

Open in new window


output:
s (original):  a	tr	\?,c/ab
!df
s (processed): a	tr	\?,c/ab
!df

Open in new window


0
 
CEHJCommented:
Char cleaning will also depend on your default charset, which can be determined by executing
System.out.println(System.getProperty("file.encoding"));

Open in new window

0
 
objectsCommented:
split 35489937 35495864
0
 
for_yanCommented:
Objects,

They selected this answer because it actually works.
Because I spent half a day looking at this question, testing, figuring out, and actually running all variants and comparing the results.

And you should stop your senseless attacks which are just distracting people from business.

 
0
 
booktopiaAuthor Commented:
Thank you Yan for putting it so elegantly.
0
 
for_yanCommented:
Thanks a lot, booktopia, I appreciate it.
Most of the other authors unfortunately do not dare to tell the truth.
0
 
CEHJCommented:
As i mentioned at http:#35496625 , the identification of control characters depends on the character encoding. While it's safer to make assumptions about the lower end of the 8-bit range, it's not as safe when you get to unprintable characters elsewhere in the range.
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

  • 6
  • 3
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now