Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 490
  • Last Modified:

Java regex replace

i have a string in an xml file <Boxed_Length>   </Boxed_Length> i need a regular expression in java to replace all spaces so instead it should say <Boxed_Length></Boxed_Length>
0
samjud
Asked:
samjud
  • 5
  • 3
  • 2
  • +1
1 Solution
 
käµfm³d 👽Commented:
You can try:

<Boxed_Length>\s+</Boxed_Lenth>

Open in new window

0
 
samjudAuthor Commented:
i am running this in talend and got a invalid escape sequence error so i tried <Boxed_Length>\\s+</Boxed_Lenth> it still does not work
0
 
samjudAuthor Commented:
here is the full string i tried "<Boxed_Length>\\s+</Boxed_Length>&#(.*);"
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
Terry WoodsIT GuruCommented:
Different tools and languages have different standards for regular expression syntax. The string you're using looks good for Java, but I don't know whether talend uses the same syntax; do you know?

It's worth noting, in case you didn't notice, that kaufmed's pattern was missing a letter g from the closing tag ie Boxed_Lenth should be Boxed_Length. It looks like you've fixed it in your latest pattern, but I'm pointing it out just in case it makes a difference.
0
 
samjudAuthor Commented:
As far as i know talend uses regular java expression syntax and yes i did fix the spelling in my test.
0
 
Terry WoodsIT GuruCommented:
If it's easy and quick to change the pattern and retest, a good technique is to start with a very simple pattern like "Boxed_Length" and build up the complexity once you have the simple one working.

Alternatively, you may like to provide a copy and paste of the data you're trying to match. There might be (I'm guessing almost certainly is) something in the data that causes it to not match. It might be something as simple as a space between the > and & characters.
0
 
samjudAuthor Commented:
below is an example of 1 xml record..

<Row>
            <Item_SKU>MANOWAR16</Item_SKU>
            <Promo_Title>12" Guitar Speakers</Promo_Title>
            <QTY_on_hand>7</QTY_on_hand>
            <COST>61</COST>
            <UPC>876358001583</UPC>
            <Weight>9.9</Weight>
            <Brand>EMINENCE</Brand>
            <MSRP>89.99</MSRP>
            <UAP>89.99</UAP>
            <TOPCATEGORY>DJ,HOME,STAGE,PERSONAL,RECREATION,SCHOOL,INSTRUMENTAL,PORTABLE,CLUB</TOPCATEGORY>
            <Boxed_Length>   </Boxed_Length>
            <Boxed_Height />
            <Boxed_Width>   </Boxed_Width>
            <CCREATEDATE>2011-11-29T12:40:50.68</CCREATEDATE>
            <DATEMODIFIED>2014-1-22</DATEMODIFIED>
            <MFG_PROD_ID>MANOWAR16</MFG_PROD_ID>
            <Image_x0020_URL>/images/XYZ123/MANOWAR16.jpg</Image_x0020_URL>
            <CATEGORY>WOOFERS-GUITAR-12IN</CATEGORY>
            <MFGCOUNTRY>CHINA</MFGCOUNTRY>
            <Long_Description>SPECIFICATION    
Nominal Basket Diameter  12", 304.8mm
Nominal Impedance*  16 ohms
Power Rating    
Watts 120W
Music Program  
Resonance 102Hz
Usable Frequency Range  70Hz-5.5kHz
Sensitivity*** 101.6
Magnet Weight  38 oz.
Gap Height  0.312", 7.92mm
Voice Coil Diameter 1.75", 44.5mm
                SOUND CLIPS
THIELE &amp; SMALL PARAMETERS          Clean    Heavy    OD
Resonant Frequency (fs)  102Hz  
DC Resistance (Re)  13.1  
Coil Inductance (Le)  0.74mH      
      Download PDF Spec Sheet  
 
Mechanical Q (Qms)  12.39
Electromagnetic Q (Qes)  0.97
Total Q (Qts)  0.85
Compliance Equivalent Volume (Vas)  31.5 liters / 1.1 cu.ft.
Mechanical Compliance of Suspension (Cms)  0.08mm/N  
BL Product (BL)  16.5 T-M  
Diaphram Mass inc. Airload (Mms)  30 grams  
Efficiency Bandwidth Product (EBP)  105  
Maximum Linear Excursion (Xmax)  0.8mm  
Surface Area of Cone (Sd)  519.5 cm2  
Maximum Mechanical Limit (Xlim)    
     
MOUNTING INFORMATION      
Recommended Enclosure      
Sealed Acceptable  
Vented Acceptable  
Overall Diameter  12.02", 305.3mm  
Baffle Hole Diameter  10.97", 278.6mm  
Front Sealing Gasket  fitted as standard  
Rear Sealing Gasket  fitted as standard  
Mounting Holes Diameter  0.25", 6.4mm  
Mounting Holes B.C. D.  11.63", 295.4mm  
Depth 5.2", 132mm  
Net Weight  8.1 lbs., 3.7 kg  
Shipping Weight  9.9 lbs., 4.5 kg  
     
MATERIALS OF CONSTRUCTION      
Coil Construction  Copper voice coil  
Coil Former Polyimide former  
Magnet Composition  Ferrite magnet  
Core Details  Non-vented core  
Basket Materials  Pressed steel basket    
Cone Composition  Paper Cone  
Cone Edge Composition  Paper cone edge  
Dustcap Composition Zurette dust cap  
   
</Long_Description>
      </Row>
0
 
CEHJCommented:
Some considerations:

a. is this actually a Java question?
b. why in fact are you concerned with that particular whitespace - it's not as if there's much of it ..?
0
 
samjudAuthor Commented:
a. i think so
b. it looks like talend has a feature (bug) so that when there is a record in an xml file that just has whitespace without any data it breaks and does not continue.. hence the need to remove the whitespace.
0
 
Terry WoodsIT GuruCommented:
Unless I'm missing something, the pattern you said you tried:
"<Boxed_Length>\\s+</Boxed_Length>&#(.*);"

Open in new window

won't work because there's no &# characters immediately after the closing boxed_length tag.

The more basic pattern:
"<Boxed_Length>\\s+</Boxed_Length>"

Open in new window

should work, provided that we're really dealing with the same patterns that Java accepts. Are you sure the backslash needs the extra escape character, for example?

As I previously mentioned above, a good technique for fixing patterns that don't work is to start with a simple pattern and get that working before building on it. Start with something like:
"Boxed_Length"

Open in new window

Once the above pattern is known to work, try:
"<Boxed_Length>"

Open in new window

then try:
"<Boxed_Length>\\s+"

Open in new window

and keep building up the pattern until you either encounter a problem (in which case you can ask for more help) or you have the final result you need. This isn't difficult; it just requires a number of iterations of testing/debugging.
0
 
CEHJCommented:
b. it looks like talend has a feature (bug) so that when there is a record in an xml file that just has whitespace without any data it breaks and does not continue.. hence the need to remove the whitespace.
Then that's your real problem. Needless to say, it shouldn't be necessary to be finding workarounds like this for such a major platform
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 5
  • 3
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now