Link to home
Start Free TrialLog in
Avatar of satmisha
satmishaFlag for India

asked on

OLTP In-memory feature of Sql Server 2014

HI Experts,

I have 4 tables which are very huge in size around 2-3 million, which I am using in my SP. Is it possible that I could use OLTP-in memory feature to gain from performance perspective. Here are the details:

1. SP and tables were built in sql 2008, now I am migrating them in sql 2014 to take the advantage of in-memory feature if possible.
2. My tables contains various data types like datetime, varchar(max), int etc.
3. not using identity seed column.
4. My SP is complex and using dynamic query to insert/update/delete based on many business logics records from database 'A' --> database 'B'.

Looking forward to hearing from you.
Avatar of Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3]
Flag of Luxembourg image

see this doc reference for complete overview
https://msdn.microsoft.com/en-us/library/dn133186.aspx
Avatar of satmisha

ASKER

Thanks Hengel for you prompt reply.

I am little afraid as my existing tables have limitations like it does have index on the nullable columns which per MS wont work.

My question is despite these limitation would I able to use in-memory feature or not, if I migrate to sql 2014 ?
I have not seen any constraint on nullable fields so far for indexes...
can you show me where you found that?

>My question is despite these limitation would I able to use in-memory feature or not, if I migrate to sql 2014 ?
yes:
https://msdn.microsoft.com/en-us/library/dn133186%28v=sql.120%29.aspx
I have 4 tables which are very huge in size around 2-3 million
2-3 million what? Rows or MB? If rows then I'm afraid they aren't even huge.

4. My SP is complex and using dynamic query to insert/update/delete based on many business logics records from database 'A' --> database 'B'.
Well, I think this states why you have so poor performance. Do you want to move to In-Memory Optimized tables or do you want to tune the current process?
Thanks vitor,

2-3 million what? Rows or MB? If rows then I'm afraid they aren't even huge, in GB terms it comes out 10GB
rows in the table where I do update\insert

I am going to try both indeed, I am trying for two POC where in one trying to optimized as much as possible in another POC want to use in-memory feature to see how much I could gain out of it.
SOLUTION
Avatar of Vitor Montalvão
Vitor Montalvão
Flag of Switzerland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks victor for this vital information, i guess we have dedicated dba's who can take the charge on these points once i come up with some performance gain using this feature.

I am trying to do simple POC using my existing db objects. Once it is done, will help management to decide on which direction we have to move.
Currently inserting data in new db which is taking too much time to actually give significant info on this. Once done I'll post more details.
Sorry its Vitor...Typo mistake
Thanks Experts, Apologies for late response as stuck in migration.

I have migrated to sql 2014 and found the most suggestions suggested by CE are not appropriate or didnt provide any performance gain so far.

1. CE is not that good as I thought.
2. Need to check new features in my existing process if it can bring any performance improvement though I agraid as 2014 has lot many restrictions which are lifted in 2016.  But would give it a try if there is any drastic change.
3. I am open for any of your suggestion.

Looking forward to hearing from you.
Hi Experts,

I run the profiler on live server to see wheather my sp queries which I run on stage server are running on live or not but found that means distributed queries are running on stage:

Stage Server : 500 records in Table
Live server : Millions of records in tables

I am manipulating business logics and making joins with live tables, seems like causing the main performance blockage.

Is there any thing I can ensure that my queries executes on live server for example:


Example :
SET @QUERY = 'UPDATE DEST WITH (ROWLOCK)      
  SET '+@UPDATE_PRODUCT_COL_LIST_ORG+',DEST.CONTRIBUTOR_DISPLAY_Original=SOURCE.CONTRIBUTOR_DISPLAY,PRODUCT_ID= SOURCE.PRODUCT_ID ,ISUPDATED = 1, CONTRIBUTOR_FREEZE_FLAG= CASE WHEN SOURCE.CONTRIBUTOR_FREEZE_FLAG = 1 THEN 1 ELSE 0 END, PRODUCT_FREEZE_FLAG=CASE WHEN SOURCE.FREEZE_FLAG=1 THEN 1  ELSE 0 END    
  FROM  ['+@TABLENAME+'_MAIN]  DEST    
  INNER JOIN '+@LiveDbName+'.DBO.PRODUCT  SOURCE WITH (NOLOCK)  ON  SOURCE.ISBN =CAST(DEST.ISBN AS VARCHAR) '  
  EXECUTE SP_EXECUTESQL @QUERY    
PRINT @QUERY  

Can this be re-written to execute on remote server ?
Checking whether my distributed queries are running on remote server or not through sql profiler is the right way or something wrong in this approach ?
Let me ruled out remote server as we have both the databases on the same server.

Server1
    - Stage db ( 10GB) of data
    - Live db ( 500 GB of data)

need to move data from stage to live where making joins with live table and getting performance issue. Totally stuck now, experts what options I have left with now ?
Hi Experts,

I have user followings features of Sql 2014 to see the performance gain:

1. In-memory OLTP: My existing tables are not meant for it as I have lots of constraints, primary-foreign key, triggers, custom data type and SQL 2014 does not allow any of them. So I ruled out this option.
2. clustered index: In actual my performance decreased it after using it and also it has lots of limitations. Not feasible to use this in existing application.

Seems like MS SQL 2014 performance claimed features are just illusion as it has lot of restrictions and not possible to release any e-commerce application on production environment with so much basic level restriction. Also i didn't find any performance gain either.
Honestly speaking now I am direction less to meet the objective of performance. Looking some direction in which I could move.

I tried followings:
1. Tried to tune existing queries, joins etc but there is no significant gain in performance.
2. Moved to MS Sql 2014 to use performance features incorporated byMS but found that these features are not yet mature enough and have lots of basic restrictions which makes it illusion nothing else.
Mind that you're only presenting your situation in an high-level view and we don't have access to your environment so we are only giving you opinions based in a generic situation.
Basically what SQL Server needs is RAM. The much the better.
Then fast storage and network.
And finally very good database model and query design. With all these you only need features like In-Memory Optimized Tables in fewer situations. Other feature that can help improving the queries is to have partitioned tables.
Thanks Vitor.

Yes you are right that I am presenting high level view.

On prod env we do have very good hardware config around 45 GB RAM , hardware\Software not an issue at all.

Sql 2014 has not of limitations which are lifted in SQL 2016. But since they recently released this version and not a stable version require some more time it to get stable, so difficult to convince my organisation to switch over this.

I guess I should look into partitioned tables concept now.
45GB RAM might be short for your needs. Is not even near of 256GB RAM limitation for In-Memory Optimized tables, for example.
I think a "gigs" project, to remotely check "on hands" on the server would be the best option here, to see what is happening really on the machine ...
Thanks Hengel, I am also exploring the sql 2016 features as claimed by MS that they have lifted many limitations that they had in sql 2014 specially for OLTP In-memory feature. In between the process of restoring db and able to comment further. Also I have implemented points suggested by the vitor.

Will keep you posted in this shortly and assure to close this as early as possible. I am really thankful to all the experts who are motivating by giving the options.
Here are my findings guys for Sql 2016 features:

1. Implemented column store index :-> My actual performance went down by 2%. Not useful at all.

2. Implemented In-memory OLTP feature :--> Extremely disappointed to see how MS is giving illusions rather providing the solution to  the problem. Very basic concept of forign keys are not supported between in-memory & disk base tables. Any one going to implement this feature would expect that basic support but it is not suppoted. Here is the link
http://blog.sqlauthority.com/2016/01/29/sql-server-2016-error-reasons-msg-10778-level-16-foreign-key-relationships-between-memory-optimized-tables-and-non-memory-optimized-tables-are-not-supported/

to explain what I meant.

Seems like had no other option but to tune my existing process only.
I highly recommend you to go for a partitioned table solution.
Thanks Vitor for your reply.
Hi Vitor,

I thought to give it a shot and found:

1. My table is not having good partition candidate as I could see in many examples that majority is doing on behalf of date like month\year etc. But thats not true in my case.  
2. Table partitioning is metadata operation and good candidate for switch-in & switch-out operations whereas in my case we are updating\inserting new records in existing table.
3. Also went through one of the articlwe which says that

https://www.youtube.com/watch?v=PXvgsaBiVOo

So doubtful whether I should go for that or not ?
Yes, better candidates are usually date columns.
How do you define historical records in your table?

A good alternative is to partitionate by region or country.
What is in your WHERE clause?
Thanks vitor,

we have parent-master tables like:

<<Product Table>>
ProductId numeric ( primary)
StoreID int


<<Product_item Table>>
Product_ItemID numeric( primary)
ProductId numeric (forign key)
StoreID int
source_item_id varchar

In the where condition we have following columns:
StoreID
Product_ItemID
ProductId
source_item_id

Apart from that I realize that we have been making joins with heavy tables like

LightTable that contains 500 records
Heavy tables contains 40 million records

we are making inner join to update light table columns based on certain conditions. I guess this is is also heavy operation
<based on certain conditions>
this needs clarification, otherwise it cannot be commented on with potential tuning suggestions.
Thanks Hengel.
<based on certain conditions>
following cols are being compared in the inner join:

LightTable.ProductID=HeavyTable.ProductID
LightTable.Product_ItemID = HeavyTable.Product_ItemID
LightTable.source_item_id = HeavyTable.source_item_id
Those are the JOIN predicates. Doesn't the query have a WHERE clause?
Thanks Vitor,

In the where condition we have like:

where HeavyTable.Product_ItemID is null and LightTable.Rnumauth < 2
where LightTable.ProductID is null

"Rnumauth is bigint type"
Are you using a LEFT JOIN between Light and Heavy tables?
Otherwise this filter won't make sense: HeavyTable.Product_ItemID is null

Rnumauth might be a good candidate to be indexed but I need to see the actual Query Execution Plan to confirm that.

Btw, how much time the query takes to execute completely?
Hi Vitor,

In the product_item table which is having 4cr records we are inserting 500 records from lighter table i.e. 'BOOKSCAPE17_MAIN' having 500 records like:

Just trying to partition the 'product_item' table in case if I could gain any performance.
Execution plan is already there as you know in my process where there are 143 TSQL takes around 1.35 min to complete.
INSERT INTO DBO.PRODUCT_ITEM(PRODUCT_ITEM_ID, PRODUCT_ID, QUANTITY_ON_HAND,RETAIL_PRICE,ITEM_NUMBER,STORE_ID,DUSTJACKET_CONDITION,LOCATOR_CODE,CONDITION,LAST_RECV_DATE,ACCOMPANYING_MATERIAL,SLIPCASE_CONDITION,SHELVE_BY,SHELVE_UNDER,AISLE,COPY_EDITION_NOTE,SOURCE_ITEM_ID,COST_PRICE,SHELF_LIFE_WEEK,SHELF_LIFE_SALES,SET1,ZONE,DEPT_CAT_DESC,DEPT_CAT_CODE,CREATED_BY, CREATED_ON, UPDATE_TIMESTAMP    
  , ACTIVE_FLAG    
  , LIST_PRICE_EXCL_TAX, LIST_PRICE_INCL_TAX, SOLD_ONLINE_FLAG,POWELLS_FLAG,SKU, RARE_FLAG, SHORT_DISCOUNT_FLAG,ACCOUNTING_CATEGORY,SIGNED_FLAG)    
   SELECT (307341306 + ROW_NUMBER() OVER (ORDER BY CAST(PA_ProdAvail_ID AS VARCHAR))) AS [PRODUCT_ITEM_ID], PRODUCT_ID,ONHAND,RETAILPRICE,ITEMNUMBER,STORE,DUSTJACKET,LOCATORCODE,CONDITIONCODE,LASTRECVDATE,ACCOMPANYINGMATERIALPRESENT,SLIPCASECONDITION,SHELVEBY,SHELFUNDER,ASILE,COPYEDITIONNOTES,PA_ProdAvail_ID,UnitCost,ShelfLifew,ShelfLifes,Set1,ZONE,DEPTCATFriendlyName,DEPTCATCode    
  ,'CATALOG IMPORT_511', GETDATE(), GETDATE(),1    
  , 0, 0, 0,1,SKU, CASE   
  WHEN SUBSTRING(DEPTCATDESC, 1, 4) = 'RBR-' AND  CAST(DEPTCATDESC AS VARCHAR) != 'RBR-GIFT-RARE BOOK ROOM' THEN 1  
  ELSE 0 END AS RAREFLAG, CASE WHEN BOOKCLASSDESC = 'SD' THEN 1 ELSE 0 END AS ShortDiscountFlag,CASE WHEN BOOKCLASSDESC='SD' THEN 'N' ELSE BOOKCLASSDESC END,CASE WHEN COPYEDITIONNOTES LIKE '%signed%' THEN 1 ELSE 0 END   
  FROM  (SELECT  ROW_NUMBER() OVER (PARTITION BY  SKU ORDER BY SKU)RNUMSKU, ROW_NUMBER() OVER (PARTITION BY  PA_ProdAvail_ID ORDER BY PA_ProdAvail_ID)RNUMAUTH,*    
    FROM [BOOKSCAPE17_MAIN])MAIN          
  WHERE PRODUCT_ITEM_ID IS NULL AND MAIN.RNUMAUTH<2 AND MAIN.RNUMSKU<2

Open in new window


Is it possible to partition this table to see if there is any significant gain while updating\inserting recrds in this 40 million table ?
that statement is basically reducable to this one (in regards to performance review):

INSERT INTO ... SELECT ... 
FROM ( SELECT  ROW_NUMBER() OVER (PARTITION BY  SKU ORDER BY SKU ) RNUMSKU
      , ROW_NUMBER() OVER (PARTITION BY  PA_ProdAvail_ID ORDER BY PA_ProdAvail_ID) RNUMAUTH
    ,*    
     FROM [BOOKSCAPE17_MAIN]
)MAIN          
  WHERE PRODUCT_ITEM_ID IS NULL AND MAIN.RNUMAUTH<2 AND MAIN.RNUMSKU<2

Open in new window


I am wondering that you have PARTITION BY field ORDER BY the_same_field ...
would you not "order by" another field?

anyhow, this gives you 2 partition keys, while a table can only be partitioned on 1 field (to start with)
note: the partition schema can rely on 2 fields, but only 1 of the 2 fields can be first ...
so, what you would need is either:
* 2 indexes , one on SKU and one on PA_ProdAvail_ID , one being a clustered index
* 1 index from the above, and the other field being a table partition
* a completely different approach to avoid the row_number() function during your select...

note: if you used another field for the ORDER BY of ROW_NUMBER() that would be the second field for the index .
Thanks Hengel for your prompt reply.

But I couldn't understood it well, Is it like you are suggesting to partition the Product_Item (40 million ) table either on the behalf of SKU or  PA_ProdAvail_ID col?
Execution plan is already there as you know in my process where there are 143 TSQL takes around 1.35 min to complete.
Sorry, I didn't relate this question with those queries. As far I remember all process is heavy and running all 143 queries in few more than a minute and half is good performance.

40 million rows shouldn't be enough reason to partition a table but you may try SKU column. What is stored in that field?
There's any index on PA_ProdAvail_ID and SKU fields?
Thanks Vitor.

"running all 143 queries in few more than a minute and half is good performance"
Need to do something either in sql or any other technology to gain 2 or 3 times performance.

SKU is varchar field that contains uniquieID like '61-8967453423245-1'
One and half minute for only 500 records insert\update is quite huge as need to  insert millions of record. You can assume if we go and insert\update 10 million on prod server will not be acceptable to any one though on prod it takes 12 sec for 500 records so for 10 million it would be around three days for e-commerce platform where db must be avaliable to all users 24*7.

Is there any way to gain 2-3 times performance from this irrespective of technology as I am open to give it a shot.
r will not be acceptable to any one though on prod it takes 12 sec for 500 records so for 10 million it would be around three days
The relation shouldn't be that direct. Did you try with 5000 records? I guess that won't take 120 seconds (10x more records = 10x more time).
Yes we tried with 2000 & 5000 records at a time but found locking issues as other process were also do work as it is live platform and avaliable 24*7 to users and finally divided this to 500 bunch to avoid as much as possible this locking issue.
in the meantime I partitioned the table on the behalf of created_On ( datetime) column in 12 months to see the impact. Is it right to partitioned the table on date col which is not being used in the where condition ?
that partitioning will not "help" this where condition, it may only "parallelize" some work.
Thanks Hengel, so does it mean

1. My insert\update operations will not gain anything from partion ?
2. I should partition it on the behalf of 'PRODUCT_ITEM_ID' col which is used in where condition ?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks Hengel.

I tried and implemented partition on 'PRODUCT_ITEM_ID' col for one of table having 40 millions of record but found that performance (insert\update) has decreased by couple of sec.

"step back" and have a look at the larger picture"
I agreed on suggested point but directionless as of now.
Thanks a lot Vitor and Hengel from bottom of my heart.

I am closing this question and happy that I got chance to explore different options as suggested by you guys. Seems like no point to keep open this question forever but I will keep updating this based on the progress at my end. As you know that my issue is not yet resolved so I am keep continuing my search for best fit solution to my technical issue.

Thank you su much.