Big Data

Big data describes data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.

Share tech news, updates, or what's on your mind.

Sign up to Post

I have some data I am trying to normalize/ weight. I have 4 regions, the number of people who have missing training certificates, and number of people in the region. Originally I was going to divide number of missing training certificate by number of people for each region to normalize the data. However, the data looks really small when I do that - like 100/40000. I don’t really want to graph such small numbers but need some way to bring in the number of people. Should I multiply by 100 and then just say this data is per 100 employees? Would that make sense?

Any other suggestions?
1
Free Tool: SSL Checker
LVL 12
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Hello Community,

I have created my first hql code, see below and I can't get any data to appear.. I have recently installed Sandbox. The installation comes with a few sample databases. I'm using the database called sample_07 to guide me with my own .hql code.

My hql code is as follows:

CREATE EXTERNAL TABLE mysample
(
 code STRING,
 description STRING,
 total_emp INT,
 salary INT
)
ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/root/music'
TBLPROPERTIES ("skip.header.line.count" = "1");

Open in new window


However, when I run the code using Zeppellin Notebook with the following code, I can see the tables, but no data appears

%jdbc(hive)
select * from mysample limit 14

Open in new window


However, when I run the same code, with using the sample database called sample_07 both the tables and data appear.

csharp

I'm sure there is something very simple that I'm missing.

Can someone please let me know where I'm going wrong?
0
I have be asked to move data dated before 2004.  Is there a easy way of doing this?  Without going through each folder, sorting by created date and then moving.
0
I'm interested in using Visual Studio in the field of BIG data and artificial intelligence.

At the moment the latest version of Visual Studio is 2017.
When is the next version due out?
What spec machine is needed for it to run smoothly in terms of processor, RAM and diskspace (and anything else that is relevant).
I found that with the Express edition, I could not use the Streamwriter.  Is this expected?
0
Hi,

what is the diff between MariaDB ColumnStore 1 0  and MS SQL SSIS + SSAS ? if MariaDB ColumnStore 1 0   ?
0
what is Big data hadoop ? how it works  and what software is required to run it ?
0
Hi experts, I'm having trouble sending file to .asp server using ajax as the code shown below. I've observed that too much character could hinder the asp server not to receive, sending an error says; "The source you are looking for has been removed, had its changed or is temporarily unavailable". My question is, Is there other way forcing the server to receive big data using Ajaz like the way i used below? What approach should i use to handle big data to send .asp server? Thanks experts!

 
function iGetPerona(x,code,inv)
{				
	if (window.XMLHttpRequest)
	  {// code for IE7+, Firefox, Chrome, Opera, Safari
	  xmlhttp = new XMLHttpRequest();
	  }
	else
	  {// code for IE6, IE5
	  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
	  }
	xmlhttp.onreadystatechange=function()
	  {
	  if (this.readyState==4 && this.status==200)
	    {
	        var msg = this.response;
	        alert(msg);
	    }
	  }                                            
		xmlhttp.open("GET","SavePersona.asp?a="+x + "&idb="+ mcCode + "&b=" + code + "&c=" + inv, true);
		xmlhttp.send();
}

Open in new window

0
In this article, I read "Even in the optimistic scenario, just mining one bitcoin in 2020 would require a shocking 5,500 kWh, or about half the annual electricity consumption of an American household."

https://motherboard.vice.com/en_us/article/aek3za/bitcoin-could-consume-as-much-electricity-as-denmark-by-2020

So, I am trying to understand what exactly is meant by "mining one bitcoin."

Does this mean looking through the entire ledger to trace the history of a single bitcoin?

How large is that ledger, in record count...

Does the BlockChain database format have any query capabilities?

Please tell me what you can, since I find this entire problem very daunting.

Thanks
0
Hi there,

I know its kinda a ridiculous question since the Cisco Nexus Series is high end data center hardware and the Cisco SG500X is SMB. But for my home lab I am planning for the future and a good friend who runs a big data center wanted to sell me some nice Cisco Nexus Stuff. So I could get them very very cheap few hundred bucks vs. the around 1k for the SG500X-24. To be specific it would be a Nexus 5596UP with a 2248TP expension.

Would you go for the Nexus or for the SG500X? What are the gotcha's with the Nexus?

I know that the SG500X does L3 stuff out of the box. The Nexus 5596UP needs the L3 Module and the right License file for it. Also the Nexus 5596UP can't do 100MBit but I guess that's solved with the 2248TP Expension.

Thanks,
Yves
0
I'm building a Microsoft Access application with a login authentication feature, I wanted to avoid the need for users to enter a login and password each time they need the use the app and then thought of using biometrics . How can I implement the use of a biometric login to a Microsoft Access application.
0
Free Tool: Port Scanner
LVL 12
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Hey,

I have an audio file, many actually, that are an interview between the interviewer and interviewee.  The same person is asking questions in each file, while the people answering are different.

I need to separate the answers out by generating silence over the interview questions. I'm currently doing this by hand with audacity, but it is extremely time consuming.

Any help would be greatly appreciated.  I am a software developer, but audio is not my area, so code is am option if there isn't a program available.

Thanks
0
https://www.google.com/search?biw=1918&bih=974&tbs=dur%3Al&tbm=vid&q=taxi+mafia+-android+-walkthrough+-gameplay+-game+-%22video+game%22+-playstation+-xbox&oq=taxi+mafia+-android+-walkthrough+-gameplay+-game+-%22video+game%22+-playstation+-xbox&gs_l=serp.3...14103.18629.0.19563.19.18.0.0.0.0.113.1362.16j2.18.0....0...1.1.64.serp..1.0.0.KTXCbd4TVOY


copy that search into a browser
Google with videos larger than 20 minutes

I am looking for mob (like tony sprano tv show "the spranos" on hbo television) documentary about taxi drivers
Please dont completely edit search to

tony sprano taxi cab

to find tony sprano riding a taxi cab because this is an algorithm question

Tony sprano riding a taxi cab is an example to a correct answer but isnt the only correct answer to the question but please dont edit the answer too much


All the results are video games playing
People filming themselves playing xbox or playstation or nintendo

Seems like all the specific searches are something else.
So this is more of a big data question.

I usually just watch netflix.com because it is easier than something specific.
Most people just watch regular tv because tv is easy to turn on

this question is a puzzle and not looking for a link the meaning of + and - symbols
so if you have a better picture of google custom search terms that is not the answer.
google-puzzle
Note: this question is not homework and is not a puzzle for a job interview -YET.…
0
hi,

I am ready introduction to oracle goldengate:

http://www.oracle.com/technetwork/middleware/goldengate/overview/index.html

"

Oracle GoldenGate is a comprehensive software package for real-time data integration and replication in heterogeneous IT environments. The product set enables high availability solutions, real-time data integration, transactional change data capture, data replication, transformations, and verification between operational and analytical enterprise systems. Oracle GoldenGate 12c brings extreme performance with simplified configuration and management, tighter integration with Oracle Database, support for cloud environments, expanded heterogeneity, and enhanced security.

In addition to the Oracle GoldenGate core platform for real-time data movement, Oracle provides the Management Pack for Oracle GoldenGate—a visual management and monitoring solution for Oracle GoldenGate deployments—as well as Oracle GoldenGate Veridata, which allows high-speed, high-volume comparison between two in-use databases.
"

so it is for ETL and replication, but what is Oracle GoldenGate for Big Data? goldengate is not for big data, right?

please share you idea.
0
Hi Wizards, I think everyone nowadays heard about it everyday. So how is your experience with Bitcoin so far? We have 4-5 free servers, can we use it to mine some cent ;-)

Any recommendation for procedures, setup is appreciated. Many thanks as always.
0
regular gmail; not g-suite. one label.

gmail label

only want gmails to reach inbox from one sender
admin@ee.com

all the other emails are not important

is there a gmail filter using the word NOT
0
Hi All,

I'd like to know what kind of performance suggestion and tweak for very large VM deployment ?

I've got one VM running Tableau application which process data from multiple SQL Server databases, then it crunches the numbers before presenting it to the Executive management team.

The specs:

16x vCPU
112 GB vRAM
1 TB D:\ as Thin Provisioned VMDK on VMFS 5

somehow it is running slower every month. So what's the best practice recommendation for deploying such large VM ?

Any tips and suggestion would be greatly appreciated.
0
could i see the
do not call list


how can individuals know which numbers not to call; if they cant see the list


I am not sure which zone this question should be in so please add zones.
0
We have a table that lists dates as a number (double), ie, 20170417.
We would like to place this as a date into a date field, preferably in the format YYYY-MM-DD
What's the most efficient way to accomplish this?
0
Hi,
couple of years ago, our client developed a "Document Management" system for their own (it has specific business rules).
Currently, they have 10 million documents and 8 TB of information.

They currently have the system running in 2 platforms (both perfomes slow):
1. Windows Environment (Windows Server 2012 R2, MS SQL Server 2012 R2 and IIS)
2. Linux (Red Hat Linux 6, mySQL and Apache)

As you guess, managing this system have become terrible difficult because of 2 main reasons:

1. Displaying 'search results' or 'document reports' (list documents and properties) takes more than 30 minutes (in employee's computers).
2. To backup they have to do it in serveral steps (and the night is not enogh to make a full backup) (in employee's computers)

So, they have requested to us to improve their system, we are developers.
Also, they have request to us to propose a new platform for managing the new improved Document Management system.

We have done our research in google, but we are not satisfied on what seams to be the new platform so I would like to receive tour recommendations or suggestions about it.

What we initially think is that using the folloiwing should do the work just fine:
- Amazon Elastic (filesystem)
- Amazon DinamoDB (database)
- Apache Hadoop (web server)
- php/laravel (programming)

Your comments are very welcomed.
Thanks a lot.
0
Keep up with what's happening at Experts Exchange!
LVL 12
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Hoping to get some opinions.

Plain and simple. I would say 90% of our data stored on the network are PST files created from Outlook. Now when I say 90%, I am talking about hundreds and hundreds of GB, maybe even a TB of data that consists of purely PST files.

What do other companies do to combat people "needing" to save 10 years of email history? I know one of my options is buy more storage, but want to know what other options are out there, or what other people are doing.
0
I've loaded several months of data to Hive using SAS. I have confirmed that everything loaded successfully and can query the data with no issues.
However, when I move to using an Impala editor (local here Hue/Hadoop) and refresh/update the tables, I get an error when running the following query: SELECT * from data_table LIMIT 1000.

The error is:
Your query has the following error(s):
IllegalStateException: null

Seems that it cant see the table.
Any ideas?
0
How do I use the rand() function to divide a data set into 3 parts? Randomness for the purpose of statistical data analysis.
1
Hi

I am putting together a presentation to the business on the pros of creating a 360 customer view using our data. Does anyone have any information I can include that may help please?

Thanks you
0
I've been searching for non javascript based charts/graphs to display mysql data. We currently use nvd3 but that is becoming a problem when trying to integrate our software with other products.

I need at least 5 or 6 leads showing the possibilities of creating nice looking charts/graphs to represent mysql based data without needing js behind it, at all. Obviously, we'll have to build the intermediate between the charts/graphs and mysql but first trying to find if there are any such solutions.

I've come across a few html5 things but nothing that is really definitive and truly usable today. I'm looking for any and all alternatives to using JS which can show nice charts/graphs.
0
Hi, I am trying to use something similar to a vlookup in dax and am not able to get it working..

I've attached an example workbook.

In table "Sheet1" I have got a column named "BucketID" which is generated from a formula (from dates being completed or not --- giving me a string of 1's and 0's...) -- I am trying to take that string of numbers, and look it up from table "BucketID" --- by looking up the tableID and then providing the corresponding text.... (this output/formula will be in column "Bucket"

can anyone help me?

Attached is the example
Example.xlsx
0

Big Data

Big data describes data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.