asked on

How to tune the following function to be faster?

Hi, I have a J2EE application in which one function is to retrieve all author names in the database and their corresponding number of strokes in Chinese. The application is to be run on JBoss application server 7.1.

I have designed the following SQL tables:

BIOGRAPHY (Storing real name and display name (which equals either real name or one of the pen name of the author) in Traditional Chinese)
AUTHOR_ID, REAL_NAME, DISPLAY_NAME, SHOW_REAL_NAME, IS_DELETED, <other fields>

BIO_PENNAME (Storing pen names of the author in Traditional Chinese)
AUTHOR_ID, PEN_NAME, DISPLAY_NAME, DISPLAY_IN_AUTHOR_LIST

UNICODE_CHAR_STROKE_MAP (Storing all the unicode characters and their corresponding number of strokes)
UNI_CHAR, STROKE_NO

Below please find our program that performs the retrieval of author name list:

	public List<AuthorName> getAuthorNameList(String lang) throws SQLException {
		List<AuthorName> nameList = new ArrayList<AuthorName>();
		
		PreparedStatement stmt = connection.prepareStatement("SELECT AUTHOR_ID, REAL_NAME AS NAME, DISPLAY_NAME, SHOW_REAL_NAME AS SHOW_NAME FROM PUBLISH_BIOGRAPHY WHERE IS_DELETED = 0 UNION "
				+ "SELECT N.AUTHOR_ID, N.PENNAME AS NAME, B.DISPLAY_NAME, N.DISPLAY_IN_AUTHOR_LIST AS SHOW_NAME FROM PUBLISH_BIO_PENNAME N, PUBLISH_BIOGRAPHY B "
				+ "WHERE N.AUTHOR_ID = B.AUTHOR_ID AND B.IS_DELETED = 0");
		ResultSet rs = stmt.executeQuery();
		
		while (rs.next()) {
			AuthorName authorName = new AuthorName();
			authorName.setAuthorID(rs.getString("AUTHOR_ID"));
			
			String name = rs.getString("NAME");
			String displayName = rs.getString("DISPLAY_NAME");
			boolean isShowName = rs.getBoolean("SHOW_NAME");
			int strokeNo = -1;
			
			if (AppConstants.LANG_SIMPLIFIED_CHINESE.equals(lang)) {
				name = StringUtil.convertToSimplified(name);
				displayName = StringUtil.convertToSimplified(displayName);
			}

/****  Getting Stroke number ******/
			PreparedStatement stmt2 = connection.prepareStatement("SELECT STROKE_NO FROM UNICODE_CHAR_STROKE_MAP WHERE UNI_CHAR = ?");
			stmt2.setString(1, name.substring(0, 1));
			ResultSet rs2 = stmt2.executeQuery();
			if (rs2.next())
				strokeNo = rs2.getInt("STROKE_NO");
			rs2.close();
			stmt2.close(); 
/******* End getting stroke number ******/

			authorName.setName(name);
			authorName.setDisplayName(displayName);
			authorName.setShowName(isShowName);
			authorName.setNoStroke(strokeNo);
			nameList.add(authorName);
		}
		rs.close();
		stmt.close();
		
		final Map<Integer, Integer> chineseStrokeMap = getUnicodeCharMap();
		
		Collections.sort(nameList, new Comparator<AuthorName>() {

			@Override
			public int compare(AuthorName n1, AuthorName n2) {
				// TODO Auto-generated method stub
				return StringUtil.compareString(n1.getName(), n2.getName(), chineseStrokeMap);
			}
			
		});
		
		return nameList;
	}

Open in new window

Since our application will need to display either Traditional Chinese or Simplified Chinese whenever the public user wants to click on a button to change the language. If the user selects "Simplified Chinese", then the author names will be converted into Simplified Chinese using Java library and their corresponding stroke number will be returned.

I have tried to check the code and found that the bottleneck is the above code. It takes around 4 second to return (around 3000 records) which is not acceptable. However, if I commented out the section "Getting stroke number", results returns within 300ms, but it does not contain the stroke number information which is crucial.

May I know how can I modify this code to make the retrieval to return results within 1 second? Please note that we need the stroke number of the author names that are in corresponding language (Simplified / Traditional Chinese), which means that some characters may have different stroke number before and after converting the author names in Traditional Chinese to Simplified Chinese.

Our database is SQL Server 2008 R2 and using Java 6 (JBoss AS 7.1.1).

Grateful if someone can help. Thank you.

Brian Crowe

DDL (Data Design Language) would be helpful in this case so we can see the datatypes involved. You're not returning many rows of data so either your data is very wide or you have network issues because your queries are very simple and don't leave much room for optimization. I would suggest moving the queries to the database side but I can't claim that it would drastically improve your performance.

If you run the queries in SSMS how long do they take?

Russell Fox

This will probably give you the best performance, but you will need to create a table-valued function in SQL Server. This function will sum up the number of strokes for each character in the [name] field:

CREATE FUNCTION [dbo].[fGetStrokeNo]
	(
		@name NVARCHAR(2000)
	)
	RETURNS TABLE
	AS
		RETURN 
		SELECT SUM([Value]) AS NumStrokes FROM 
		  ( 
			SELECT 
				[Value] = s.STROKE_NO
			FROM (
				SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
				FROM sys.all_objects
				) AS x
				JOIN UNICODE_CHAR_STROKE_MAP s
					ON s.UNI_CHAR = SUBSTRING(@name, [Number], 1)
			WHERE Number <= LEN(@name)
			) AS y
		;

Open in new window

Then you will be able to get the number of strokes during your first PreparedStatement call to the database by replacing that query with this query, using CROSS APPLY to get the number of strokes from the function above:

SELECT p.AUTHOR_ID, p.REAL_NAME AS NAME, p.DISPLAY_NAME, p.SHOW_REAL_NAME AS SHOW_NAME, s.NumStrokes
FROM PUBLISH_BIOGRAPHY p
	CROSS APPLY fGetStrokeNo(p.REAL_NAME) s
WHERE IS_DELETED = 0 
UNION ALL
SELECT N.AUTHOR_ID, N.PENNAME AS NAME, B.DISPLAY_NAME, N.DISPLAY_IN_AUTHOR_LIST AS SHOW_NAME, s.NumStrokes
FROM PUBLISH_BIO_PENNAME N
	JOIN PUBLISH_BIOGRAPHY B
		ON N.AUTHOR_ID = B.AUTHOR_ID
	CROSS APPLY fGetStrokeNo(N.PENNAME) s
WHERE B.IS_DELETED = 0

Open in new window

ASKER CERTIFIED SOLUTION

Chris Luttrell

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Peter Kwan

ASKER

Hi all. Thanks all for your help.

Brian, my queries takes less than 100ms for each query.
Ruzell, I am afraid it would be hard to put it into a single SQL because if I am choosing Simplified Chinese, I need to use Java to convert the data stored in database in Traditional Chinese to Simplified Chinese before I can select the number of strokes from the UNICODE_CHAR_STROKE_MAP table.
Chris, all UNI_CHAR values are unique, I will try to see if adding index to it can improve the performance.

Vitor Montalvão

How the Get Stroke Number works? It's only for the first character of the Name?

Peter Kwan

ASKER

Vitor, I just get the stroke number of the first character.
Chris, your suggestion is very good, it speeds up the retrieval.

Vitor Montalvão

The index is the real performance booster here but you can use only a single Select statement to reduce the number of queries on the database:

SELECT AUTHOR_ID, REAL_NAME AS NAME, DISPLAY_NAME, SHOW_REAL_NAME AS SHOW_NAME
	, (SELECT STROKE_NO 
	FROM UNICODE_CHAR_STROKE_MAP 
	WHERE UNI_CHAR = LEFT(PUBLISH_BIOGRAPHY.REAL_NAME,1)) AS STROKE_NO
FROM PUBLISH_BIOGRAPHY 
WHERE IS_DELETED = 0 
UNION 
SELECT N.AUTHOR_ID, N.PENNAME AS NAME, B.DISPLAY_NAME, N.DISPLAY_IN_AUTHOR_LIST AS SHOW_NAME 
	, (SELECT STROKE_NO 
	FROM UNICODE_CHAR_STROKE_MAP 
	WHERE UNI_CHAR = LEFT(N.PENNAME,1)) AS STROKE_NO
FROM PUBLISH_BIO_PENNAME N, PUBLISH_BIOGRAPHY B 
WHERE N.AUTHOR_ID = B.AUTHOR_ID AND B.IS_DELETED = 0

Open in new window

So no need to the GetStrokeNo section in your code as you can get it directly from the main query:
strokeNo = rs.getInt("STROKE_NO");

Peter Kwan

ASKER

Vitor, I need to check if my input language before getting the stroke number, so single query does not work.