We help IT Professionals succeed at work.

Use Azure to implement big data

hi,

any procedure and briefing on how to use Azure to implement big data in a cost efficient way ?

I am now studying how to setup big data cluster using SQL server 2019 on linux,  but I knew this is not enough.

and when connect to Azure SQL server, what tools should I use? SSMS 2019 or Azure data studio? any guide on that ?
Comment
Watch Question

lcohanDatabase Analyst
Commented:
There is a nice blog https://www.sherweb.com/blog/cloud-server/building-big-data-solution-azure/ where you can find the steps and the tools needed to build "... this solution on Azure requires the deployment of a suite of complementary product technologies which integrate seamlessly and collectively to create a comprehensive Big Data offering."[...]
"Microsoft Azure HDInsight is a Microsoft’s Big Data solution and is a 100% Apache Hadoop-based service in the Azure cloud. It is a fully managed cloud service making processing massive amounts of data easy, fast, and cost-effective allowing you to use widely accepted Big Data open source frameworks like Hadoop, Spark, Hive, and R among others."

And here https://azure.microsoft.com/en-ca/solutions/big-data/ you can find some of the "Related solution architectures" diagrams to choose from and apply the best that suits your needs.
marrowyungSenior Technical architecture (Data)

Author

Commented:
" this solution on Azure requires the deployment of a suite of complementary product technologies which integrate seamlessly and collectively to create a comprehensive Big Data offering"

 that single link already include everything in detail on how to build big data solution using MS SQL 2019, Azure and even non MS big data solution ? should be only Azure one.

Any other link for non MS and Azure Big data solution ?
lcohanDatabase Analyst
Commented:
I believe this https://docs.microsoft.com/en-us/azure/architecture/guide/architecture-styles/big-data is all about just what you said right? Meaning the data sources to feed the Data "Lake" Storage are heterogeneous right? Isn't that what you need?
I think you can still build your SQL 2019 on Linux and integrate it with Azure and use Azure Data Studio to manage it https://www.sqlshack.com/sql-server-2019-on-linux-with-ubuntu-and-azure-data-studio/ 

When to use this architecture
Consider this architecture style when you need to:

Store and process data in volumes too large for a traditional database.
Transform unstructured data for analysis and reporting.
Capture, process, and analyze unbounded streams of data in real time, or with low latency.
Use Azure Machine Learning or Microsoft Cognitive Services.
marrowyungSenior Technical architecture (Data)

Author

Commented:
tks,

"
I think you can still build your SQL 2019 on Linux and integrate it with Azure and use Azure Data Studio to manage it https://www.sqlshack.com/sql-server-2019-on-linux-with-ubuntu-and-azure-data-studio/ "

is that mean we can only create big data cluster on SQL server 2019 on linux but not on Windows ?
marrowyungSenior Technical architecture (Data)

Author

Commented:
BTW, what is your big data solution for your own company?

basically is it possible to setup an Azure cloud in my own laptop and test all above ? any procedure for it?
Database Analyst
Commented:
I can't disclose too much detail about the architecture and its implementation however the solution used is based on Teradata.
As far as setting local Azure cloud it all depends on the architecture you need to implement and I believe https://azure.microsoft.com/en-ca/overview/azure-stack/ can offer that with more info about wthat is it and how to implement on-premisse here: https://azure.microsoft.com/en-us/resources/videos/microsoft-azure-stack-azure-services-on-premises/
marrowyungSenior Technical architecture (Data)

Author

Commented:
"I can't disclose too much detail about the architecture and its implementation"

sorry I don't mean yours architecture but platform.

" the solution used is based on Teradata."

so that one also relate to big data and any resource on learn how to build big data platform using that? so teradata is none MS alternative on building big data solution for a company ? any other platform is good for big data?
lcohanDatabase Analyst
Commented:
I believe it all starts with your own "big data" definition meaning what type of your own enterprise data you want it to be included in your "data lake" (if we want to use new Azure terminology) where it resides and how do you expects to be used. After you have all these defined at high level I believe is easier to have a look at the options you have and the cost associated to implement one solution vs. another instead of trying to figure out the cost to implement something you're not even sure will fit your needs.

"I am now studying how to setup big data cluster using SQL server 2019 on linux,  but I knew this is not enough."
If all needed is to setup a "big data cluster" local/on premise maybe it is good enough to use Mongo DB clusters to store/use your "big data" internaly - https://www.mongodb.com/big-data-explained
If you want to stick with SQL Server 2019 running on Linux the the solution is explained here https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-overview?view=sql-server-ver15
marrowyungSenior Technical architecture (Data)

Author

Commented:
"If all needed is to setup a "big data cluster" local/on premise maybe it is good enough to use Mongo DB clusters to store/use your "big data" internaly"

but that one is only for none structured data, right ?

not for all, right?

but tks for the link anyway.