Peter Kwan
asked on
How to tune the following function to be faster?
Hi, I have a J2EE application in which one function is to retrieve all author names in the database and their corresponding number of strokes in Chinese. The application is to be run on JBoss application server 7.1.
I have designed the following SQL tables:
BIOGRAPHY (Storing real name and display name (which equals either real name or one of the pen name of the author) in Traditional Chinese)
AUTHOR_ID, REAL_NAME, DISPLAY_NAME, SHOW_REAL_NAME, IS_DELETED, <other fields>
BIO_PENNAME (Storing pen names of the author in Traditional Chinese)
AUTHOR_ID, PEN_NAME, DISPLAY_NAME, DISPLAY_IN_AUTHOR_LIST
UNICODE_CHAR_STROKE_MAP (Storing all the unicode characters and their corresponding number of strokes)
UNI_CHAR, STROKE_NO
Below please find our program that performs the retrieval of author name list:
I have tried to check the code and found that the bottleneck is the above code. It takes around 4 second to return (around 3000 records) which is not acceptable. However, if I commented out the section "Getting stroke number", results returns within 300ms, but it does not contain the stroke number information which is crucial.
May I know how can I modify this code to make the retrieval to return results within 1 second? Please note that we need the stroke number of the author names that are in corresponding language (Simplified / Traditional Chinese), which means that some characters may have different stroke number before and after converting the author names in Traditional Chinese to Simplified Chinese.
Our database is SQL Server 2008 R2 and using Java 6 (JBoss AS 7.1.1).
Grateful if someone can help. Thank you.
I have designed the following SQL tables:
BIOGRAPHY (Storing real name and display name (which equals either real name or one of the pen name of the author) in Traditional Chinese)
AUTHOR_ID, REAL_NAME, DISPLAY_NAME, SHOW_REAL_NAME, IS_DELETED, <other fields>
BIO_PENNAME (Storing pen names of the author in Traditional Chinese)
AUTHOR_ID, PEN_NAME, DISPLAY_NAME, DISPLAY_IN_AUTHOR_LIST
UNICODE_CHAR_STROKE_MAP (Storing all the unicode characters and their corresponding number of strokes)
UNI_CHAR, STROKE_NO
Below please find our program that performs the retrieval of author name list:
public List<AuthorName> getAuthorNameList(String lang) throws SQLException {
List<AuthorName> nameList = new ArrayList<AuthorName>();
PreparedStatement stmt = connection.prepareStatement("SELECT AUTHOR_ID, REAL_NAME AS NAME, DISPLAY_NAME, SHOW_REAL_NAME AS SHOW_NAME FROM PUBLISH_BIOGRAPHY WHERE IS_DELETED = 0 UNION "
+ "SELECT N.AUTHOR_ID, N.PENNAME AS NAME, B.DISPLAY_NAME, N.DISPLAY_IN_AUTHOR_LIST AS SHOW_NAME FROM PUBLISH_BIO_PENNAME N, PUBLISH_BIOGRAPHY B "
+ "WHERE N.AUTHOR_ID = B.AUTHOR_ID AND B.IS_DELETED = 0");
ResultSet rs = stmt.executeQuery();
while (rs.next()) {
AuthorName authorName = new AuthorName();
authorName.setAuthorID(rs.getString("AUTHOR_ID"));
String name = rs.getString("NAME");
String displayName = rs.getString("DISPLAY_NAME");
boolean isShowName = rs.getBoolean("SHOW_NAME");
int strokeNo = -1;
if (AppConstants.LANG_SIMPLIFIED_CHINESE.equals(lang)) {
name = StringUtil.convertToSimplified(name);
displayName = StringUtil.convertToSimplified(displayName);
}
/**** Getting Stroke number ******/
PreparedStatement stmt2 = connection.prepareStatement("SELECT STROKE_NO FROM UNICODE_CHAR_STROKE_MAP WHERE UNI_CHAR = ?");
stmt2.setString(1, name.substring(0, 1));
ResultSet rs2 = stmt2.executeQuery();
if (rs2.next())
strokeNo = rs2.getInt("STROKE_NO");
rs2.close();
stmt2.close();
/******* End getting stroke number ******/
authorName.setName(name);
authorName.setDisplayName(displayName);
authorName.setShowName(isShowName);
authorName.setNoStroke(strokeNo);
nameList.add(authorName);
}
rs.close();
stmt.close();
final Map<Integer, Integer> chineseStrokeMap = getUnicodeCharMap();
Collections.sort(nameList, new Comparator<AuthorName>() {
@Override
public int compare(AuthorName n1, AuthorName n2) {
// TODO Auto-generated method stub
return StringUtil.compareString(n1.getName(), n2.getName(), chineseStrokeMap);
}
});
return nameList;
}
Since our application will need to display either Traditional Chinese or Simplified Chinese whenever the public user wants to click on a button to change the language. If the user selects "Simplified Chinese", then the author names will be converted into Simplified Chinese using Java library and their corresponding stroke number will be returned.I have tried to check the code and found that the bottleneck is the above code. It takes around 4 second to return (around 3000 records) which is not acceptable. However, if I commented out the section "Getting stroke number", results returns within 300ms, but it does not contain the stroke number information which is crucial.
May I know how can I modify this code to make the retrieval to return results within 1 second? Please note that we need the stroke number of the author names that are in corresponding language (Simplified / Traditional Chinese), which means that some characters may have different stroke number before and after converting the author names in Traditional Chinese to Simplified Chinese.
Our database is SQL Server 2008 R2 and using Java 6 (JBoss AS 7.1.1).
Grateful if someone can help. Thank you.
This will probably give you the best performance, but you will need to create a table-valued function in SQL Server. This function will sum up the number of strokes for each character in the [name] field:
CREATE FUNCTION [dbo].[fGetStrokeNo]
(
@name NVARCHAR(2000)
)
RETURNS TABLE
AS
RETURN
SELECT SUM([Value]) AS NumStrokes FROM
(
SELECT
[Value] = s.STROKE_NO
FROM (
SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
FROM sys.all_objects
) AS x
JOIN UNICODE_CHAR_STROKE_MAP s
ON s.UNI_CHAR = SUBSTRING(@name, [Number], 1)
WHERE Number <= LEN(@name)
) AS y
;
Then you will be able to get the number of strokes during your first PreparedStatement call to the database by replacing that query with this query, using CROSS APPLY to get the number of strokes from the function above:SELECT p.AUTHOR_ID, p.REAL_NAME AS NAME, p.DISPLAY_NAME, p.SHOW_REAL_NAME AS SHOW_NAME, s.NumStrokes
FROM PUBLISH_BIOGRAPHY p
CROSS APPLY fGetStrokeNo(p.REAL_NAME) s
WHERE IS_DELETED = 0
UNION ALL
SELECT N.AUTHOR_ID, N.PENNAME AS NAME, B.DISPLAY_NAME, N.DISPLAY_IN_AUTHOR_LIST AS SHOW_NAME, s.NumStrokes
FROM PUBLISH_BIO_PENNAME N
JOIN PUBLISH_BIOGRAPHY B
ON N.AUTHOR_ID = B.AUTHOR_ID
CROSS APPLY fGetStrokeNo(N.PENNAME) s
WHERE B.IS_DELETED = 0
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi all. Thanks all for your help.
Brian, my queries takes less than 100ms for each query.
Ruzell, I am afraid it would be hard to put it into a single SQL because if I am choosing Simplified Chinese, I need to use Java to convert the data stored in database in Traditional Chinese to Simplified Chinese before I can select the number of strokes from the UNICODE_CHAR_STROKE_MAP table.
Chris, all UNI_CHAR values are unique, I will try to see if adding index to it can improve the performance.
Brian, my queries takes less than 100ms for each query.
Ruzell, I am afraid it would be hard to put it into a single SQL because if I am choosing Simplified Chinese, I need to use Java to convert the data stored in database in Traditional Chinese to Simplified Chinese before I can select the number of strokes from the UNICODE_CHAR_STROKE_MAP table.
Chris, all UNI_CHAR values are unique, I will try to see if adding index to it can improve the performance.
How the Get Stroke Number works? It's only for the first character of the Name?
ASKER
Vitor, I just get the stroke number of the first character.
Chris, your suggestion is very good, it speeds up the retrieval.
Chris, your suggestion is very good, it speeds up the retrieval.
The index is the real performance booster here but you can use only a single Select statement to reduce the number of queries on the database:
strokeNo = rs.getInt("STROKE_NO");
SELECT AUTHOR_ID, REAL_NAME AS NAME, DISPLAY_NAME, SHOW_REAL_NAME AS SHOW_NAME
, (SELECT STROKE_NO
FROM UNICODE_CHAR_STROKE_MAP
WHERE UNI_CHAR = LEFT(PUBLISH_BIOGRAPHY.REAL_NAME,1)) AS STROKE_NO
FROM PUBLISH_BIOGRAPHY
WHERE IS_DELETED = 0
UNION
SELECT N.AUTHOR_ID, N.PENNAME AS NAME, B.DISPLAY_NAME, N.DISPLAY_IN_AUTHOR_LIST AS SHOW_NAME
, (SELECT STROKE_NO
FROM UNICODE_CHAR_STROKE_MAP
WHERE UNI_CHAR = LEFT(N.PENNAME,1)) AS STROKE_NO
FROM PUBLISH_BIO_PENNAME N, PUBLISH_BIOGRAPHY B
WHERE N.AUTHOR_ID = B.AUTHOR_ID AND B.IS_DELETED = 0
So no need to the GetStrokeNo section in your code as you can get it directly from the main query:strokeNo = rs.getInt("STROKE_NO");
ASKER
Vitor, I need to check if my input language before getting the stroke number, so single query does not work.
If you run the queries in SSMS how long do they take?