Thanks.
Main Topics
Browse All TopicsHi experts!
I need to design an architecture for the big web application: big news portal + media hosting (pics, video, audio). Need to hold 400k active sessions (up to 600k in peaks), 50Gb daily video uploads, 10Gb photo etc. Check the picture attached. Doing such thing for the first time. If you can see any flow in here please give me a hint.
Thanks.
The cluster:
Going to use VMWare ESX, real servers are 24 x Xeon Quad Core blade servers.
Storage: VMFS-connected to virtual-servers, HBA FC connected SAN to real servers, stores media files, >2TB
front servers: nginx, connected to shared storage to read static data
app servers: apache+php, serving dynamic HTML, authorization, media upload, connected to shared storage
memecache: memcached servers, session caching, heavy calculation caching etc
mysql cluster: going to run ndb+mysqld+ndb manager on each server, use load balancers
vconv: video converting, ffmpeg, connected to shared storage
admin will be the only one allowed to access all other servers via SSH plus few other thing.
Make me think i did everything right.
Thanks.
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
Decided to use storage on front-end because of heavy content (video, pics) - will allow to read it from SAN
Going to use memcache to store sessions (probably the only one reason, want to take this job from Database).
Not sure, will talk to hosting guys, but want to try using faster storage for SQL servers (that FC connected SAN or SSD).
Changing MySQL to Postgressql will be painful because application is 80% done.
To summarize, i'm pretty sure about the architecture, the only one question is storage (that shared and faster for DB), will discuss it with hardware and "$$$" guys, not sure if budget will allow any improvements here right now. It's not polished yet. Going to finish by the end of the week.
squid line in position of loadb and front can do lot of cache job given templates for static content it expects.
memcache links to app ONLY for simplicity, could be integrated in form of extra RAM or SSD
OK - you have one set of wires, just pay attention to QoS and you will be fine.
Read this http://en.wikipedia.org/wi
I propose to make some effort to compact your picture vertically - less types of systems to be replicated, less headache etc...
Business Accounts
Answer for Membership
by: gheistPosted on 2009-09-06 at 00:35:43ID: 25269049
Squid in front of Apache will be able to handle heap of TCP connections and keep private systems off the public wires.
Apache can abuse sendfile() to serve static files off the NAS at no CPU cost.
PHP connection pooling with careful programming will keep SQL backend cluster happy.
I see no purpose for memcache - network interconnets are like 8Gbps FC (800MB/s) or 1GbE(100MB/s), but computer memory is like 8GB/s, so better add more RAM instead of dedicated cache machines.
PostgreSQL or FireBird has more functionality contained in database to serve as backend
Sharing dependent FC NAS spaces between databases and lots of other IO can slow it down.
If you have that multi-controller SAN you are fine.
Databases can be improved with transactional logs on solid state disks - smallest of fastest (like two in RAID0) are best choice.
gentoo is known to push most out of CPU for vconv part, for rest about anything does with kernel tuning.
VMFS does not share files between machines, it only makes images of systems for live migration shared.
To summarize - frontend Squid(s) does only network processing, not using storage, no secrets on the end of wire, can do logging - Gentoo good... For rest normal system can be tuned to spec.
QuadCore xeon is hyperthreading actually and you have to use processor binding in vmware.
Why not double quadcore processor setup?
Something is strange with your assumptions 10GB photos is at most 10000 files - what normal digital camera is able to produce running non-stop.
Hope I caused some doubts, if you have more feel free to share.