Link to home
Start Free TrialLog in
Avatar of marrowyung
marrowyung

asked on

polybase on MSSQL

hi,

anyone use polybase on MSSQL for hadoop ? is scale out feature of Polybase working fine? load balancing working well ?
Avatar of Raja Jegan R
Raja Jegan R
Flag of India image

I haven't personally used Polybase on MSSQL but have checked with few of my friends and Scale out feature seems to be working fine.
Avatar of marrowyung
marrowyung

ASKER

Wowow. how many nodes allowed for the Polybase scale out? all nodes in the AOG ?


Did they also try scalable SSIS ETL, it is also scaling well ? I heard it is not that good.
ASKER CERTIFIED SOLUTION
Avatar of Raja Jegan R
Raja Jegan R
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
"No, they haven't tried Scalable SSIS ETL.(probably you can open a new question with topics ETL, SSIS to get better response)"

any reason not trying it ? ETL is usually one of the bottleneck.

"They had around 1 Head node and 3 compute nodes in their Scale out groups"

how can they verify it is scaling out good ?
>> any reason not trying it ? ETL is usually one of the bottleneck.

Seems they didn't have such requirements..
I wouldn't say ETL is a bottleneck provided we design it properly

>> how can they verify it is scaling out good ?

They tried adding few Compute nodes and measured the performance before and after the New Compute node addition and it appeared good to them. Also confirmed the same by removing the nodes and measured the performance again.
"I wouldn't say ETL is a bottleneck provided we design it properly"

As ETL always load a lot of data, that's why it is slow from time to time.

People here all concern about it.

"They tried adding few Compute nodes and measured the performance before and after the New Compute node addition and it appeared good to them. Also confirmed the same by removing the nodes and measured the performance again.
Report Comment"

excellent.
>> As ETL always load a lot of data, that's why it is slow from time to time. People here all concern about it.

If we design the ETL better, then we need not bother much with increasing data load..
any good ETL design guide?

what I am thinking is , if ETL can be design that good, no need to build scale out function for it.  this feature is a long long requested feature.

when I interview  a DBA job long time ago, the interviewer already know this is one of the bottleneck.

please share some good design guideline for me.
Few best design guides I've found are..
https://www.timmitchell.net/etl-best-practices/
https://blog.westmonroepartners.com/10-best-practices-for-high-performance-etl-processing/

Of course, not all can be implemented for your current requirements but you need to find out the trade off and design better as well.
agree!  someone use Sqoop from Hadoop distribution to transfer data from noSQL to SQL, is it scaleable like polybase.
it seems also that Sqoop for hadoop can't scale out , right?
I haven't used Sqoop for Hadoop and hence can't answer that..
Kindly request you to raise a new question with appropriate topics so that you can get answers accordingly..
tks.
Okay, hope some Big Data experts can help you out..
can't see why this link: http://sqoop.apache.org/ has no ask me/contact us button !!