Link to home
Start Free TrialLog in
Avatar of CoolestBananas
CoolestBananas

asked on

Windows Service, importing large amounts of data

This is a beast.

Currently there is a Windows service which runs on ~200 servers, collecting information: files, folders, groups, permissions, sessions and disk usage. This runs fine, and dumps all collected files on a server dedicated to importing. The aim is to get this information into SQL Server as quickly as possible.

Disk usage and sessions are continuously imported by building a XML formatted string and doing a single insert/update. Some of these can involve 8000 records being inserted at once, which takes around a minute. So the session information for 200 servers is taking around an hour - 24 session imports per server per day. Session information and disk usage alone could be ~750,000 new rows of data every hour.

For 12 hours of the day, file and folder information is being imported using SSIS - the packages are executed from the service. For the other 12 hours (yeah right) of the day, SQL Server indexes are rebuilt, and then the database file (around 90 GB) is copied to the live servers which deliver reports via a web application. It is important to remember that the information on these servers if for around 300 million files across 30 million folders.

When all of these processes are running simultaniously (multithreaded), certain parts run drastically slower. For example, there is a SSIS task which builds a table full of duplicate file information across all servers. When the t-sql to populate this table is run in SQL Server Management Studio against a server with no other processes it takes ~10 minutes to generate 1.3 million rows. When running with all of the other processes it is taking ~12 hours.

I realise hardware could be an issue here: 2 x Intel Xeon 3.00, 3.5 GB

I am not looking for any answers here because this thing is huge. A few pointers and other random thoughts would be great. Maybe some of you have been faced with similar problems?
Avatar of ZachSmith
ZachSmith
Flag of United States of America image

Parsing the XML string is a processor intensive task for SQL to handle. Have you looked at using a different approach for inserting the data? Maybe a bulk insert or a bunch of small inserts for each piece of data?
Avatar of CoolestBananas
CoolestBananas

ASKER

I have been thinking about the performance hit of ~750,000 records being inserted every hour. There is no doubt that it will be affecting the other processes.

I am not convinced that avoiding the XML import will be the answer. I have attached the CPU usage graphs whilsts disk usage, session information and SSIS is running. Each processor appears to be ticking over at 10-20%.

Maybe while the SSIS import packages are running, disk usage and session information imports should be stopped?
cpu-usage.GIF
Is that the graph of the SQL server or the graph of the server importing the data? We have had MAJOR problems at my company doing SSIS packages with a lot of data. In fact, we've had situations where the SSIS package totally blocked other calls to the database.

Also, we were using XML for a good portion of our auditing, and have noticed that parsing the XML in SQL can eat up CPU power. We ended up having to turn the auditing off on several tables. The XML parsing in SQL leaves A LOT to be desired!
The SQL Server is the same server that is running the import Windows service.

What I will do today is switch off the importing of disk usage and session information and see what difference that makes. It there is a significant difference then it will be necessary to reduce the frequency the XML data is imported.
The first attached file is a Performance Monitor for the the sqlservr process while the service is running; importing XML and running SSIS packages.

As you can see, the % Processor Time is pinned at 100% and has been for the past 10 minutes with rare slight dips.

The second graph shows IO read/write - does this look normal?
sqlserver-perfmon-cpu.GIF
drives-perfmon-io.GIF
NB: SQL Server resides on G:\ and the files to be imported are on H:\.
Also, this is a more reliable CPU usage graph for the box.
cpu-usage2.GIF
And it occasionally looks like this.
cpu-usage3.GIF
When it goes to 100% what is happening? Is this the point at which you're importing the XML into SQL?
I'm referring to the last graph you posted with my comment above. Sorry, forgot to mention that.
Yes, I am 99% sure it is.

What I have done is put the XML importing thread to sleep for 1 minute after each iteration. I am thinking that the CPU hits 100% once a minute when the XML is being inserted. The import packages do appear to be running quicker but I will not have definite answers until tomorrow.

The XML string and inserting method seemed to be the best for inserting large numbers of records. Is there another method you would used? I build the XML string, pass it to the stored procedure which then deals with it. The T-SQL for handling the string is attached.


DECLARE @TempSessions TABLE
(
ip_address BIGINT,
server_id INT,
share_id INT,
share_name VARCHAR(100),
[user_id] VARCHAR(100),
session_date DATETIME
)
 
INSERT INTO @TempSessions
(ip_address, server_id, share_id, share_name, [user_id], session_date)
SELECT
ParamValues.ID.value('ip_address[1]','BIGINT') AS ip_address,
ParamValues.ID.value('server_id[1]','INT') AS server_id,
ParamValues.ID.value('share_id[1]','INT') AS share_id,
ParamValues.ID.value('share_name[1]','VARCHAR(100)') AS share_name,
ParamValues.ID.value('user_id[1]','VARCHAR(100)') AS [user_id],
ParamValues.ID.value('session_date[1]','DATETIME') AS session_date
FROM @xml.nodes('//sessions/session') as ParamValues(ID)
 
INSERT INTO Sessions
SELECT NEWID() AS session_guid, TS.ip_address, TS.server_id, Shares.share_id, TS.[user_id], TS.session_date
FROM @TempSessions As TS
JOIN
Shares on Shares.share_name=TS.share_name and Shares.server_id=TS.server_id

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of ZachSmith
ZachSmith
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial