polybase on MSSQL

marrowyung
marrowyung used Ask the Experts™
on
hi,

anyone use polybase on MSSQL for hadoop ? is scale out feature of Polybase working fine? load balancing working well ?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Raja Jegan RSQL Server DBA & Architect, EE Solution Guide
Awarded 2009
Distinguished Expert 2018

Commented:
I haven't personally used Polybase on MSSQL but have checked with few of my friends and Scale out feature seems to be working fine.
marrowyungSenior Technical architecture (Data)

Author

Commented:
Wowow. how many nodes allowed for the Polybase scale out? all nodes in the AOG ?


Did they also try scalable SSIS ETL, it is also scaling well ? I heard it is not that good.
SQL Server DBA & Architect, EE Solution Guide
Awarded 2009
Distinguished Expert 2018
Commented:
They had around 1 Head node and 3 compute nodes in their Scale out groups. Yes, 2 Nodes from AOG Secondary servers.
No, they haven't tried Scalable SSIS ETL.(probably you can open a new question with topics ETL, SSIS to get better response)
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

marrowyungSenior Technical architecture (Data)

Author

Commented:
"No, they haven't tried Scalable SSIS ETL.(probably you can open a new question with topics ETL, SSIS to get better response)"

any reason not trying it ? ETL is usually one of the bottleneck.

"They had around 1 Head node and 3 compute nodes in their Scale out groups"

how can they verify it is scaling out good ?
Raja Jegan RSQL Server DBA & Architect, EE Solution Guide
Awarded 2009
Distinguished Expert 2018

Commented:
>> any reason not trying it ? ETL is usually one of the bottleneck.

Seems they didn't have such requirements..
I wouldn't say ETL is a bottleneck provided we design it properly

>> how can they verify it is scaling out good ?

They tried adding few Compute nodes and measured the performance before and after the New Compute node addition and it appeared good to them. Also confirmed the same by removing the nodes and measured the performance again.
marrowyungSenior Technical architecture (Data)

Author

Commented:
"I wouldn't say ETL is a bottleneck provided we design it properly"

As ETL always load a lot of data, that's why it is slow from time to time.

People here all concern about it.

"They tried adding few Compute nodes and measured the performance before and after the New Compute node addition and it appeared good to them. Also confirmed the same by removing the nodes and measured the performance again.
Report Comment"

excellent.
Raja Jegan RSQL Server DBA & Architect, EE Solution Guide
Awarded 2009
Distinguished Expert 2018

Commented:
>> As ETL always load a lot of data, that's why it is slow from time to time. People here all concern about it.

If we design the ETL better, then we need not bother much with increasing data load..
marrowyungSenior Technical architecture (Data)

Author

Commented:
any good ETL design guide?

what I am thinking is , if ETL can be design that good, no need to build scale out function for it.  this feature is a long long requested feature.

when I interview  a DBA job long time ago, the interviewer already know this is one of the bottleneck.

please share some good design guideline for me.
Raja Jegan RSQL Server DBA & Architect, EE Solution Guide
Awarded 2009
Distinguished Expert 2018

Commented:
Few best design guides I've found are..
https://www.timmitchell.net/etl-best-practices/
https://blog.westmonroepartners.com/10-best-practices-for-high-performance-etl-processing/

Of course, not all can be implemented for your current requirements but you need to find out the trade off and design better as well.
marrowyungSenior Technical architecture (Data)

Author

Commented:
agree!  someone use Sqoop from Hadoop distribution to transfer data from noSQL to SQL, is it scaleable like polybase.
marrowyungSenior Technical architecture (Data)

Author

Commented:
it seems also that Sqoop for hadoop can't scale out , right?
Raja Jegan RSQL Server DBA & Architect, EE Solution Guide
Awarded 2009
Distinguished Expert 2018

Commented:
I haven't used Sqoop for Hadoop and hence can't answer that..
Kindly request you to raise a new question with appropriate topics so that you can get answers accordingly..
marrowyungSenior Technical architecture (Data)

Author

Commented:
tks.
marrowyungSenior Technical architecture (Data)

Author

Commented:
Raja Jegan RSQL Server DBA & Architect, EE Solution Guide
Awarded 2009
Distinguished Expert 2018

Commented:
Okay, hope some Big Data experts can help you out..
marrowyungSenior Technical architecture (Data)

Author

Commented:
can't see why this link: http://sqoop.apache.org/ has no ask me/contact us button !!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial