Big Data

Big data describes data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.

Share tech news, updates, or what's on your mind.

Sign up to Post

Hi there,

I know its kinda a ridiculous question since the Cisco Nexus Series is high end data center hardware and the Cisco SG500X is SMB. But for my home lab I am planning for the future and a good friend who runs a big data center wanted to sell me some nice Cisco Nexus Stuff. So I could get them very very cheap few hundred bucks vs. the around 1k for the SG500X-24. To be specific it would be a Nexus 5596UP with a 2248TP expension.

Would you go for the Nexus or for the SG500X? What are the gotcha's with the Nexus?

I know that the SG500X does L3 stuff out of the box. The Nexus 5596UP needs the L3 Module and the right License file for it. Also the Nexus 5596UP can't do 100MBit but I guess that's solved with the 2248TP Expension.

Thanks,
Yves
0
[Webinar] Lessons on Recovering from Petya
LVL 9
[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

I'm building a Microsoft Access application with a login authentication feature, I wanted to avoid the need for users to enter a login and password each time they need the use the app and then thought of using biometrics . How can I implement the use of a biometric login to a Microsoft Access application.
0
I'm working on an ad campaign management app. There's a feature there advertisers can assign caps for campaign based on spending or conversions in daily, monthly or lifetime basis and there would be multiple caps per ad campaign. As soon as one campaign reaches the 80% we will send notification to all publishes and once caps are reached we have to stop the campaign immediately. We're receiving thousands of event per second. Currently what i'm doing is querying reporting table every second but it's quite inefficient and sometimes campaigns already exceeded the caps when I detect it. So my question is;

What're the existing efficient programmatic or architectural solution's in industry to handle these kind of situations?
0
I have large numbers PDF document, from which I need to extract text. The extracted text I use for further processing. I did this for a small subset of documents using Tesseract API in a linear approach and I get the required output. However, this takes a very long time when I have a large number of documents.

I tried to use the Hadoop environment processing capabilities (Map-Reduce) and storage (HDFS) for solving this issue. However, I am facing problem to implement Tesseract API into the Hadoop (Map-Reduce) approach. As Teserract converts the files into intermediate image files, I am confused as to how intermediate result Image files of Tesseract-API-process can be handled inside HDFS.

I have searched and unsuccesfully tried a few options earlier like:

    I have extracted text from PDF by extending FileInputFormat class into my own PdfInputFormat class using Hadoop-Map-Reduce, for this i used Apache PDFBox to extract text from pdf, but when it comes to scanned-pdf's which contains image, this solution does not give me the required results.

    I found few answers on the same topic stating to use -Fuse and that will help or one should generate image files locally and than upload those into hdfs for further processing. Not sure if this is the correct approach.

Would like to know approaches around this.
0
https://www.google.com/search?biw=1918&bih=974&tbs=dur%3Al&tbm=vid&q=taxi+mafia+-android+-walkthrough+-gameplay+-game+-%22video+game%22+-playstation+-xbox&oq=taxi+mafia+-android+-walkthrough+-gameplay+-game+-%22video+game%22+-playstation+-xbox&gs_l=serp.3...14103.18629.0.19563.19.18.0.0.0.0.113.1362.16j2.18.0....0...1.1.64.serp..1.0.0.KTXCbd4TVOY


copy that search into a browser
Google with videos larger than 20 minutes

I am looking for mob (like tony sprano tv show "the spranos" on hbo television) documentary about taxi drivers
Please dont completely edit search to

tony sprano taxi cab

to find tony sprano riding a taxi cab because this is an algorithm question

Tony sprano riding a taxi cab is an example to a correct answer but isnt the only correct answer to the question but please dont edit the answer too much


All the results are video games playing
People filming themselves playing xbox or playstation or nintendo

Seems like all the specific searches are something else.
So this is more of a big data question.

I usually just watch netflix.com because it is easier than something specific.
Most people just watch regular tv because tv is easy to turn on

this question is a puzzle and not looking for a link the meaning of + and - symbols
so if you have a better picture of google custom search terms that is not the answer.
google-puzzle
Note: this question is not homework and is not a puzzle for a job interview -YET.…
0
HI,

I am trying to find sample dataset about (cloud) storage server file access logs to conduct my research project. Can anyone please suggest any ideas or places to find this type of sample files? I think maybe something like FTP server's log dataset. because my project focus on file access not web page access.

Thanks in advance.
0
I have an LDAP directory which contains huge volume data. (Appx 200 million). The requirement is to download the full 200 M data to files. Current scripts pull data based on certain search criteria and using such search parameter using LDAP SEARCH command. The volume of data pulled is appx 100 M and time taken is 10 hours.  Is there a better option to optimize the search command so as to have the 200 M records  downloaded in appx 5 hours or so. Any suggestion are welcome
0
i have to process 100-200 gigs of text files in a day with 2gb each

currently my python code architecture is like:

def parsers(data):
    if (-----):
        regex_email(data)
    elif(----):
        regex_ip(data)
    elif(----):
        regex_url(data)

now i want to call multiple instances of parsers method at a time on different files with calling of regex methods in parallel.
0
I've been reading into Microsoft Delve, and the ability to understand one's working habits.

Is this considered big data analytics?

How does it work exactly?
0
I am writing a mapreduce program in hadoop and have executed it successfully. Below is the attached snapshot of the output stating keys with its values. Now out of these values I need the top 5 values. I wrote the following command in the terminal

hadoop fs -cat /home/yogesh/Work/outputs/part-r-00000 | sort –n –k2 –r | head  –n5

Now I am getting this error:  I have attached the error snapshot as wellMy mapreduce program got executed successfully. This is the output of HDFS.
head: cannot open ‘–n5’ for reading: No such file or directory
sort: cannot read: –n: No such file or directory
cat: Unable to write to output stream.

I think due to some permission issue, it is giving this error.

Please Help how I can solve this problem. Any ideas are highly appreciated. Is it something I need to add in hdfs-site.xml?
hadoop-command-and-error.png
0
Automating Terraform w Jenkins & AWS CodeCommit
Automating Terraform w Jenkins & AWS CodeCommit

How to configure Jenkins and CodeCommit to allow users to easily create and destroy infrastructure using Terraform code.

I have an FTP program that allows me to schedule times for that runs scripts so we connect to FTP sites and download documents into folders that are created that day.  We recently had a change with one of the sites so that it is no longer just FTP it is encrypted and using WinSCP.  This works if someone manually connects and finds the files and downloads them, but I am tasked with making this happen again automatically.  I have run into the problem with my existing program not having a way to enter the passphrase for the key that is required after the initial login name and password.  I have read about scripts that could be created and used within WINSCP but none that address my problem.  I have looked at other software packages but found none so far that will work.  Does anyone have a script that will allow the login with the username, password and then after screen comes up asking for passphrase it is entered so that I can try to salvage the pieces of my existing script which creates a new folder each day using the date as the folder name and downloading the files from the site.  I am not a script person which is part of my problem, but I can understand the basics so if someone could share this info or perhaps let me know if there is a program that will do this it would be greatly appreciated.
0
We are looking for social media analytic tool (one tool) that supports the following:

1- it can automatically pull data periodically  from certain Facebook page and Twitter account , then
2- it can analyze data statistically and sentimentally .
3- its sentiment engine can be updated by adding custom user keywords.
4- it can categorize posts by topic like ( maintenance, service, news, complaint... )
5- it can provide result as structured raw data , so we can  build custom reports based on provided data on other platform ,for example:
list of posts [ post id,Post ,Topic (or category) , sentiment analysis for comments(# of positive . # of neutral , # of negative ) , # of like , # of share ,Created date...]
list of analyzed comments for each post [ Post ID,Comment , sentiment (positive , neutral , negative ) , Created date, location .... ]
0

Big Data

Big data describes data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.

Top Experts In
Big Data
<
Monthly
>