polybase on MSSQL

hi,

anyone use polybase on MSSQL for hadoop ? is scale out feature of Polybase working fine? load balancing working well ?
LVL 1
marrowyungSenior Technical architecture (Data)Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
I haven't personally used Polybase on MSSQL but have checked with few of my friends and Scale out feature seems to be working fine.
marrowyungSenior Technical architecture (Data)Author Commented:
Wowow. how many nodes allowed for the Polybase scale out? all nodes in the AOG ?


Did they also try scalable SSIS ETL, it is also scaling well ? I heard it is not that good.
Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
They had around 1 Head node and 3 compute nodes in their Scale out groups. Yes, 2 Nodes from AOG Secondary servers.
No, they haven't tried Scalable SSIS ETL.(probably you can open a new question with topics ETL, SSIS to get better response)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Get a highly available system for cyber protection

The Acronis SDI Appliance is a new plug-n-play solution with pre-configured Acronis Software-Defined Infrastructure software that gives service providers and enterprises ready access to a fault-tolerant system, which combines universal storage and high-performance virtualization.

marrowyungSenior Technical architecture (Data)Author Commented:
"No, they haven't tried Scalable SSIS ETL.(probably you can open a new question with topics ETL, SSIS to get better response)"

any reason not trying it ? ETL is usually one of the bottleneck.

"They had around 1 Head node and 3 compute nodes in their Scale out groups"

how can they verify it is scaling out good ?
Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
>> any reason not trying it ? ETL is usually one of the bottleneck.

Seems they didn't have such requirements..
I wouldn't say ETL is a bottleneck provided we design it properly

>> how can they verify it is scaling out good ?

They tried adding few Compute nodes and measured the performance before and after the New Compute node addition and it appeared good to them. Also confirmed the same by removing the nodes and measured the performance again.
marrowyungSenior Technical architecture (Data)Author Commented:
"I wouldn't say ETL is a bottleneck provided we design it properly"

As ETL always load a lot of data, that's why it is slow from time to time.

People here all concern about it.

"They tried adding few Compute nodes and measured the performance before and after the New Compute node addition and it appeared good to them. Also confirmed the same by removing the nodes and measured the performance again.
Report Comment"

excellent.
Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
>> As ETL always load a lot of data, that's why it is slow from time to time. People here all concern about it.

If we design the ETL better, then we need not bother much with increasing data load..
marrowyungSenior Technical architecture (Data)Author Commented:
any good ETL design guide?

what I am thinking is , if ETL can be design that good, no need to build scale out function for it.  this feature is a long long requested feature.

when I interview  a DBA job long time ago, the interviewer already know this is one of the bottleneck.

please share some good design guideline for me.
Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
Few best design guides I've found are..
https://www.timmitchell.net/etl-best-practices/
https://blog.westmonroepartners.com/10-best-practices-for-high-performance-etl-processing/

Of course, not all can be implemented for your current requirements but you need to find out the trade off and design better as well.
marrowyungSenior Technical architecture (Data)Author Commented:
agree!  someone use Sqoop from Hadoop distribution to transfer data from noSQL to SQL, is it scaleable like polybase.
marrowyungSenior Technical architecture (Data)Author Commented:
it seems also that Sqoop for hadoop can't scale out , right?
Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
I haven't used Sqoop for Hadoop and hence can't answer that..
Kindly request you to raise a new question with appropriate topics so that you can get answers accordingly..
marrowyungSenior Technical architecture (Data)Author Commented:
tks.
marrowyungSenior Technical architecture (Data)Author Commented:
Raja Jegan RSQL Server DBA & Architect, EE Solution GuideCommented:
Okay, hope some Big Data experts can help you out..
marrowyungSenior Technical architecture (Data)Author Commented:
can't see why this link: http://sqoop.apache.org/ has no ask me/contact us button !!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Big Data

From novice to tech pro — start learning today.