Solved

Blade or normal pc, which is good for big data

Posted on 2013-06-08
20
464 Views
Last Modified: 2016-03-23
Trying to setup a demo big data system, it can be several blades or normal pcs. Normal pc are cheap but not sure, if it need to linkup by fibre.

Some friend said hadoop are distributed calculation, not parallel calculation, so normal rj45 cable is good enough, is it true.

Or they said can DIY servers by self to save money. Or any cheapest blades.

Any suggestion, thanks.
0
Comment
Question by:turbot_yu
  • 9
  • 6
  • 3
  • +1
20 Comments
 
LVL 117

Assisted Solution

by:Andrew Hancock (VMware vExpert / EE MVE)
Andrew Hancock (VMware vExpert / EE MVE) earned 92 total points
Comment Utility
If you are wanting to build a cluster for calculations, you will need good bandwidth between the hosts servers, fibre would be better.

But you can get some serious horse power in GPU and CPU today, depends what calculations you are performing.
0
 
LVL 26

Assisted Solution

by:skullnobrains
skullnobrains earned 273 total points
Comment Utility
a normal pc can contain the same motherboard as a blade if it is server-class, and use fiberchannel same as a blade. the choice is more a matter of cleanliness than performance. blades are also noticably noisier

blades do not have to be expensive. you can easily setup a high-performance machine with 5Tb mirrored high performance (>400Mb/s) storage for less than 2500€ so not many more $ by purchasing a 2U cheap case, a motherboard+cpu(s)+ram, a low_end raid card and a bunch of reasonably cheap SATA disks. with proper config, and using raid10, you end up with better performance than with a 20-50k SAN and fiber connectors to a bunch of machines. you can lagg your network cards to get redundancy and 2Gb bandwidth between machines which you probably will have a hard time to saturate unless hadoop works mainly from ram which does not really fit the bigdata description even with efficient caches
0
 

Author Comment

by:turbot_yu
Comment Utility
Thanks a lot skullnobrains, is there any detailed example or link of setup the blades myself or linkup the pc by the fiber.

Really need to do it now.
0
 
LVL 26

Assisted Solution

by:skullnobrains
skullnobrains earned 273 total points
Comment Utility
no. sticking a motherboard in an empty case only takes a screw driver and a few minutes of your time.

i was not suggesting the use of fiber channel. if you have that much money to spend, you had better hire someone to buy and build the machines for you.

i'm sorry to say so, but it looks to me like you're way over your head. i do not know what you actually need to be able to show, but i do not feel like making you spend lots of time on money on something that may not even be usefull.

if you need to setup hadoop for testing or demonstration of scalability, you may as well set it up on any low performance machines already available

i'm ready to help, but i'd like to know what you are trying to achieve. you were talking fiberchannel... on a hunch, or you actually have specific need that can only be fit using fiberchannel ?
0
 
LVL 117

Assisted Solution

by:Andrew Hancock (VMware vExpert / EE MVE)
Andrew Hancock (VMware vExpert / EE MVE) earned 92 total points
Comment Utility
if you are going to use fibre for storage you also need a fibre channel storage device, otherwise fibre for networking, is just insert a card, and connect to a fibre ethernet connector.
0
 

Author Comment

by:turbot_yu
Comment Utility
Any link for cheap fibre storage device.

Hi skullnobrains,it difficult for me to find such person, can you recommend.

Or I just quickly setup by some PCs, then come to the hadoop, map-reduce and hive.

When demo is ready, may show it and see if can get some project from power grid or traffic video monitoring.
0
 
LVL 32

Assisted Solution

by:shalomc
shalomc earned 135 total points
Comment Utility
your friend was right, hadoop works in a distributed manner.
What is the size of your data set?  
Do you have lots of smallish files, or small number of huge files?
0
 

Author Comment

by:turbot_yu
Comment Utility
The data coming in by stream, such as video stream or data flow.

Each data point about 100M in full mode or serveral hundred k in filtering mode.
0
 

Author Comment

by:turbot_yu
Comment Utility
another question is how is the storage.

if using PCs, every PC has a hard-disk?

if using blade, is it a centralized storage?
0
 
LVL 32

Assisted Solution

by:shalomc
shalomc earned 135 total points
Comment Utility
central storage will be a disaster.
Hadoop is engineered to benefit from separate server instances and independent storage units.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 26

Assisted Solution

by:skullnobrains
skullnobrains earned 273 total points
Comment Utility
forget about blades VS pc stuff. they are the same, come with the same mother boards, us the same storage types (internal and external), ... the only real difference is the shape of the box

i concur with shalomc. using central storage with a distributed calculation system would be foolish. this is the reason why i do not believe in fiber stuff in your case unless you feel like plugging a SAN on each machine. just stick performant disks in your machines and make sure they use proper controllers, raid levels...

how much power do you need ? i assume we should only focus on disk throughput ?

why would you need hadoop ? is it a prerequisite or just something you feel you can use ? if you expect to stream data, i'm unsure hadoop wil help you much. what are you actually building ? you need to either come with specific prerequisite like ( i want to build machines that can stream data at 1Gb/s with 10 parallel rw threads ) or give us more informaton on your project so we can make a good guess and give proper advice. you also should tell us how comfortable you are with unix systems, the command line, ...
0
 

Author Comment

by:turbot_yu
Comment Utility
actually, I also not very sure how much power is needed, just want to make the system has the capability to extend.

Hadoop is the most stable big data os, so want to try it. The raw data for one sensor may be 100M/s, after filtering, it may less than 1M, every station may have <100 sensor. Theoretically, may have 150k sensors.

eliyart(at)hotmail.com
0
 
LVL 26

Assisted Solution

by:skullnobrains
skullnobrains earned 273 total points
Comment Utility
do you actually know what hadoop does ?

you never talked about sensors before. i thought you were dealing with video files ??? what does your project actually do ???

---

if i understand your requrements :
 
you receive hundreds of data flows at 100Mb/s, perform some real-time treatment that converts them into 1Mb/s flows and do not keep the data itself.

if that is the case, you do not need hadoop or any "big data" whatever system. if not, please clarify what you are doing.
0
 

Author Comment

by:turbot_yu
Comment Utility
It is like the description, why hadoop is needed is data mining is needed and big storage is needed.
0
 

Author Comment

by:turbot_yu
Comment Utility
there are 2 demos one for sensors, one for video.
0
 
LVL 32

Assisted Solution

by:shalomc
shalomc earned 135 total points
Comment Utility
This is a POC/pilot, right?

The hadoop setup is complex enough so don't worry about running your POC with commodity hardware. You will definitely change the hardware when you go to production :)

Just get a bunch of not too old PCs or servers having decent disks and at least 8GB memory on the same 1Gbit LAN, preferably on their own switch.

here is some sizing for production environments, as well as some sensible recommendations for power, disks and network.

http://docs.hortonworks.com/CURRENT/index.htm#About_Hortonworks_Data_Platform/Hardware_Recommendations_For_Apache_Hadoop.htm
0
 
LVL 26

Accepted Solution

by:
skullnobrains earned 273 total points
Comment Utility
then let's stick back to your original question

Trying to setup a demo big data system, it can be several blades or normal pcs. Normal pc are cheap but not sure, if it need to linkup by fibre.

- normal pcs do not have to be cheaper. if you compare equivalent hardware, the blade cases are marginally more expensive but the difference if next to nothing compared to motherboards, disks, raid cards...

- you DO NOT need fiber. as was pointed above, using shared storage for distributed calculation is foolish performance-wise in hadoop so stick with internal disks. if you ware considering fiber channel for the network part while using internal storage, read below

Some friend said hadoop are distributed calculation, not parallel calculation, so normal rj45 cable is good enough, is it true.

it mostly is. usually, distributed calculation do not need a lot of bandwidth to keep things in sync. i believe you do not need much more throughput than the disks can handle, meaning a 100Mb/s connection will be likely to be enough for a desktop PC, and one or several gigabyte links should be enough for a server equiped with a fast raid array. i'm ignoring cpu power as i understand this is not your issue, and i'm not considering latency either because the fiber will not make that much of a difference unless the machines are really far away from one another.

Or they said can DIY servers by self to save money. Or any cheapest blades.
yes. see my first post. this would apply to blades or regular boxes all the same. just make sure your box is meant to accept the required number of disks.

if you really need speed and do not care about data loss, you can switch to RAID0

---

since we have no idea about your needs, i believe that using any machines you have available for your demo will be good enough and will let you determine the requirements easily. it might also make the demo more impressive if you manage to run lots of calculations smoothly on poor hardware
0
 

Author Comment

by:turbot_yu
Comment Utility
Thanks a lot,  it is quite clear, I am trying to buy some small factor hp pc now, see if they can work.
0
 

Author Comment

by:turbot_yu
Comment Utility
Hi  skullnobrains, is it possible to learn your background.
0
 
LVL 26

Assisted Solution

by:skullnobrains
skullnobrains earned 273 total points
Comment Utility
i'm not sure i understand what you mean by "learn your background". if that is private information, i'll answer more questions in a private thread. i'm not allowed to disclose much personal information here, and i do not wish to either. sorry.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Shadow IT is coming out of the shadows as more businesses are choosing cloud-based applications. It is now a multi-cloud world for most organizations. Simultaneously, most businesses have yet to consolidate with one cloud provider or define an offic…
Exchange server is not supported in any cloud-hosted platform (other than Azure with Azure Premium Storage).
This Micro Tutorial will explain how to export DynamoDB tables in Amazon Web Services.
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now