How to tune the following function to be faster?

Peter Kwan
Peter Kwan used Ask the Experts™
on
Hi,  I have a J2EE application in which one function is to retrieve all author names in the database and their corresponding number of strokes in  Chinese.  The application is to be run on JBoss application server 7.1.

I have designed the following SQL tables:

BIOGRAPHY (Storing real name and display name (which equals either real name or one of the pen name of the author) in Traditional Chinese)
AUTHOR_ID, REAL_NAME, DISPLAY_NAME, SHOW_REAL_NAME, IS_DELETED, <other fields>

BIO_PENNAME (Storing pen names of the author in Traditional Chinese)
AUTHOR_ID, PEN_NAME, DISPLAY_NAME, DISPLAY_IN_AUTHOR_LIST

UNICODE_CHAR_STROKE_MAP (Storing all the unicode characters and their corresponding number of strokes)
UNI_CHAR, STROKE_NO

Below please find our program that performs the retrieval of author name list:
	public List<AuthorName> getAuthorNameList(String lang) throws SQLException {
		List<AuthorName> nameList = new ArrayList<AuthorName>();
		
		PreparedStatement stmt = connection.prepareStatement("SELECT AUTHOR_ID, REAL_NAME AS NAME, DISPLAY_NAME, SHOW_REAL_NAME AS SHOW_NAME FROM PUBLISH_BIOGRAPHY WHERE IS_DELETED = 0 UNION "
				+ "SELECT N.AUTHOR_ID, N.PENNAME AS NAME, B.DISPLAY_NAME, N.DISPLAY_IN_AUTHOR_LIST AS SHOW_NAME FROM PUBLISH_BIO_PENNAME N, PUBLISH_BIOGRAPHY B "
				+ "WHERE N.AUTHOR_ID = B.AUTHOR_ID AND B.IS_DELETED = 0");
		ResultSet rs = stmt.executeQuery();
		
		while (rs.next()) {
			AuthorName authorName = new AuthorName();
			authorName.setAuthorID(rs.getString("AUTHOR_ID"));
			
			String name = rs.getString("NAME");
			String displayName = rs.getString("DISPLAY_NAME");
			boolean isShowName = rs.getBoolean("SHOW_NAME");
			int strokeNo = -1;
			
			if (AppConstants.LANG_SIMPLIFIED_CHINESE.equals(lang)) {
				name = StringUtil.convertToSimplified(name);
				displayName = StringUtil.convertToSimplified(displayName);
			}

/****  Getting Stroke number ******/
			PreparedStatement stmt2 = connection.prepareStatement("SELECT STROKE_NO FROM UNICODE_CHAR_STROKE_MAP WHERE UNI_CHAR = ?");
			stmt2.setString(1, name.substring(0, 1));
			ResultSet rs2 = stmt2.executeQuery();
			if (rs2.next())
				strokeNo = rs2.getInt("STROKE_NO");
			rs2.close();
			stmt2.close(); 
/******* End getting stroke number ******/

			authorName.setName(name);
			authorName.setDisplayName(displayName);
			authorName.setShowName(isShowName);
			authorName.setNoStroke(strokeNo);
			nameList.add(authorName);
		}
		rs.close();
		stmt.close();
		
		final Map<Integer, Integer> chineseStrokeMap = getUnicodeCharMap();
		
		Collections.sort(nameList, new Comparator<AuthorName>() {

			@Override
			public int compare(AuthorName n1, AuthorName n2) {
				// TODO Auto-generated method stub
				return StringUtil.compareString(n1.getName(), n2.getName(), chineseStrokeMap);
			}
			
		});
		
		return nameList;
	}

Open in new window

Since our application will need to display either Traditional Chinese or Simplified Chinese whenever the public user wants to click on  a button to change the language. If the user selects "Simplified Chinese", then the author names will be converted into Simplified Chinese using Java library and their corresponding stroke number will be returned.

I have tried to check the code and found that the bottleneck is the above code.  It takes around 4 second to return (around 3000 records) which is not acceptable.  However, if I commented out the section "Getting stroke number", results returns within 300ms, but it does not contain the stroke number information which is crucial.

May I know how can I modify this code to make the retrieval to return results within 1 second? Please note that we need the stroke number of the author names that are in corresponding language (Simplified / Traditional Chinese), which means that some characters may have different stroke number before and after converting the author names in Traditional Chinese to Simplified Chinese.

Our database is SQL Server 2008 R2 and using Java 6 (JBoss AS 7.1.1).

Grateful if someone can help.  Thank you.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Brian CroweDatabase Administrator
Top Expert 2005

Commented:
DDL (Data Design Language) would be helpful in this case so we can see the datatypes involved.  You're not returning many rows of data so either your data is very wide or you have network issues because your queries are very simple and don't leave much room for optimization.  I would suggest moving the queries to the database side but I can't claim that it would drastically improve your performance.

If you run the queries in SSMS how long do they take?
Russell FoxDatabase Developer
Top Expert 2014

Commented:
This will probably give you the best performance, but you will need to create a table-valued function in SQL Server. This function will sum up the number of strokes for each character in the [name] field:
CREATE FUNCTION [dbo].[fGetStrokeNo]
	(
		@name NVARCHAR(2000)
	)
	RETURNS TABLE
	AS
		RETURN 
		SELECT SUM([Value]) AS NumStrokes FROM 
		  ( 
			SELECT 
				[Value] = s.STROKE_NO
			FROM (
				SELECT Number = ROW_NUMBER() OVER (ORDER BY name)
				FROM sys.all_objects
				) AS x
				JOIN UNICODE_CHAR_STROKE_MAP s
					ON s.UNI_CHAR = SUBSTRING(@name, [Number], 1)
			WHERE Number <= LEN(@name)
			) AS y
		;

Open in new window

Then you will be able to get the number of strokes during your first PreparedStatement call to the database by replacing that query with this query, using CROSS APPLY to get the number of strokes from the function above:
SELECT p.AUTHOR_ID, p.REAL_NAME AS NAME, p.DISPLAY_NAME, p.SHOW_REAL_NAME AS SHOW_NAME, s.NumStrokes
FROM PUBLISH_BIOGRAPHY p
	CROSS APPLY fGetStrokeNo(p.REAL_NAME) s
WHERE IS_DELETED = 0 
UNION ALL
SELECT N.AUTHOR_ID, N.PENNAME AS NAME, B.DISPLAY_NAME, N.DISPLAY_IN_AUTHOR_LIST AS SHOW_NAME, s.NumStrokes
FROM PUBLISH_BIO_PENNAME N
	JOIN PUBLISH_BIOGRAPHY B
		ON N.AUTHOR_ID = B.AUTHOR_ID
	CROSS APPLY fGetStrokeNo(N.PENNAME) s
WHERE B.IS_DELETED = 0

Open in new window

Senior Database Architect
Commented:
Can you add an index on UNI_CHAR on your UNICODE_CHAR_STROKE_MAP table, even better if it can be unique?  If that is where is slows down then there must be a lot of data in that table.
CREATE [UNIQUE?] INDEX idx_name ON UNICODE_CHAR_STROKE_MAP (UNI_CHAR) INCLUDE (STROKE_NO);

Open in new window

Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Peter KwanAnalyst Programmer

Author

Commented:
Hi all.  Thanks all for your help.

Brian, my queries takes less than 100ms for each query.
Ruzell, I am afraid it would be hard to put it into a single SQL because if I am choosing Simplified Chinese, I need to use Java to convert the data stored in database in Traditional Chinese to Simplified Chinese before I can select the number of strokes from the UNICODE_CHAR_STROKE_MAP table.
Chris, all UNI_CHAR values are unique, I will try to see if adding index to it can improve the performance.
Vitor MontalvãoIT Engineer
Distinguished Expert 2017

Commented:
How the Get Stroke Number works? It's only for the first character of the Name?
Peter KwanAnalyst Programmer

Author

Commented:
Vitor, I just get the stroke number of the first character.
Chris, your suggestion is very good, it speeds up the retrieval.
Vitor MontalvãoIT Engineer
Distinguished Expert 2017

Commented:
The index is the real performance booster here but you can use only a single Select statement to reduce the number of queries on the database:
SELECT AUTHOR_ID, REAL_NAME AS NAME, DISPLAY_NAME, SHOW_REAL_NAME AS SHOW_NAME
	, (SELECT STROKE_NO 
	FROM UNICODE_CHAR_STROKE_MAP 
	WHERE UNI_CHAR = LEFT(PUBLISH_BIOGRAPHY.REAL_NAME,1)) AS STROKE_NO
FROM PUBLISH_BIOGRAPHY 
WHERE IS_DELETED = 0 
UNION 
SELECT N.AUTHOR_ID, N.PENNAME AS NAME, B.DISPLAY_NAME, N.DISPLAY_IN_AUTHOR_LIST AS SHOW_NAME 
	, (SELECT STROKE_NO 
	FROM UNICODE_CHAR_STROKE_MAP 
	WHERE UNI_CHAR = LEFT(N.PENNAME,1)) AS STROKE_NO
FROM PUBLISH_BIO_PENNAME N, PUBLISH_BIOGRAPHY B 
WHERE N.AUTHOR_ID = B.AUTHOR_ID AND B.IS_DELETED = 0

Open in new window

So no need to the GetStrokeNo section in your code as you can get it directly from the main query:
strokeNo = rs.getInt("STROKE_NO");
Peter KwanAnalyst Programmer

Author

Commented:
Vitor, I need to check if my input language before getting the stroke number, so single query does not work.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial