Hey guys, gals and supreme overlords.
I'm developing an application to simplify a shop's stock inventory.
The shop has POS software that uses an MS Access database - easy enough to pull a list of products out of it.
The stock table, I want to append to the DB, is a CSV file.
My problem lies in the fact there is no common variable between the shop data source and the delivery CSV.
Perhaps some kind of "name similarity" comparison class would help, but I wouldn't know where to start searching for such a class.
I need to create an easy-to-use form to allow the users to match existing stock records with the new "unknown" records.
Take the following data sources as an example:
SHOP:
#stock_id,barcode,description,longdesc,cost,sell,quantity
1,11,Steam Cooker,,$40.91,$63.63,6
2,98,Overseas Freight,,$0.00,$25.00,0
3,99,Freight,,$0.00,$9.09,2.5
4,12,Saucepan Set - Non Stick,,$53.64,$90.90,6
DELIVERY:
#UID,description,qty,price,total
123,Pressure Cooker,1,40.1,40.1
456,Overseas Freight,3,0,0
789,Freight,15,0,0
1122,Saucepan Set - Teflon,1,53.6366,53.6366
by: gernstPosted on 2004-06-12 at 04:39:34ID: 11295247
Hi ASCII_Man,
) { essing "+(char)i+" "+(char)r);
(). To os-match.l ength()),m atch.lengt h()) and the start of the data String. ull)) throw new IllegalArgumentException(" Arguements must not be null");
+(match.le ngth()));
ngth());n+ +){
sing "+b.charAt(n+m)+" "+match.charAt(m)); At(n+m)][( int)match. charAt(m)] ;
I found this, I think is does the thing you want: having a sort of match score between two strings.
This is a match that I find quite useful. Its based on correlation . It is flawed in that characters that are not
int the ASCII range (0-255) will break it but this can be fixed.
The method returns a score for all possible matches between the data and the match string. Larger
value is better match. It is quite flexible in that you can assign values (int the table) for different kinds of
match to tailor it to your needs. At the moment it performs a straight case insensitive match (does not
handle numbers). If you want to make it better able to handle typoes on a qwerty keyboard then you
could modify the values in the table to be inversly proportional to the distance between two keys. A
Gaussian type function may work better.
/**
* this method creates the weight table for the correlation. Its just sets up a simple case insensitive
* match. All characters that do not match have a zero entry. All characters that do match have a 1.0f entry
* Depnding on your domain you may wish to refine these weights. (to perhaps account for typoes on a
* keyboard)
*
*/
public static float[][] generateWeights() {
// build case insensitive cost matrix
float[][] tmp = new float[256][256];
int r = (int)'A';
for(int i=(int)'a';i<=(int)'z';i++
tmp[i][i] = 1.0f;
tmp[i][r] = 1.0f;
tmp[r][i] = 1.0f;
tmp[r][r] = 1.0f;
//System.out.println("Proc
r++;
}
return tmp;
}
/**
* performs a correlation match on the two strings. The weighting table is used to hold the
* measure of similarity between two characters. Returns an array of floats which are in the range
* 0 to 1 inclusive. The number of elements in the returned array is data.length()+match.length
* find the index into the data array from a position in the result array you subtract match.length().
* if this results in a value less then zero then it indicates that a partial match exists between
* match.substring(Math.abs(p
*/
public static float[] softMatch(String data, String match, float[][] weighting) {
if((data==null)||(match==n
// pad the input string to make the correlation easier.
// idealy this would not happen but not doing it makes the correlation loop more complex
StringBuffer b = new StringBuffer(data.length()
for(int i=0;i<match.length();i++) b.append(" ");
b.append(data);
float[] results = new float[b.length()];
for(int i=0;i<match.length();i++) b.append(" ");
for(int n=0;n<(b.length()-match.le
float res = 0.0f;
for(int m=0;m<match.length();m++) {
// System.out.println("Proces
res+=weighting[(int)b.char
}
results[n] = res/match.length();
}
return results;
}
Regards,
Gerben Ernst