Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x

Big Data

Big data describes data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.

Share tech news, updates, or what's on your mind.

Sign up to Post

Hello Experts,

I have run a hql called samplehive.hql, see attached. However, the script fails with the following error:

FAILED: ParseException line 1:2 cannot recognize input near 'D' 'R' 'O'
18/01/17 20:46:46 [main]: ERROR ql.Driver: FAILED: ParseException line 1:2 cannot recognize input near 'D' 'R' 'O'
org.apache.hadoop.hive.ql.parse.ParseException: line 1:2 cannot recognize input near 'D' 'R' 'O'

I'm very new to Hadoop Hive, can someone take a look at the script and let me know where I'm going wrong

Thanks
samplehive.txt
0
Microsoft Certification Exam 74-409
LVL 1
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

I have be asked to move data dated before 2004.  Is there a easy way of doing this?  Without going through each folder, sorting by created date and then moving.
0
I'm interested in using Visual Studio in the field of BIG data and artificial intelligence.

At the moment the latest version of Visual Studio is 2017.
When is the next version due out?
What spec machine is needed for it to run smoothly in terms of processor, RAM and diskspace (and anything else that is relevant).
I found that with the Express edition, I could not use the Streamwriter.  Is this expected?
0
So I have a dataset wherein I have account number and "days past due" with every observation. So for every account number, as soon as the "days past due" column hits a code like "DLQ3" , I want to remove rest of the rows for that account(even if DLQ3 is the first observation for that account).

My dataset looks like :

Observation date Account num   Days past due

2016-09                           200056              DLQ1
2016-09                           200048              DLQ2
2016-09                           389490              NORM
2016-09                           383984              DLQ3.....

So for account 383984, I want to remove all the rows post the date 2016-09 as now its in default.

So in all I want to see when the account hits DLQ3 and when it does I want to remove all the rows post the first DLQ3 observation.
0
Big Data Projects
The pressure to deliver ‘more for less’ is increasing day in day out inclusive of all industries and business sectors. But if you have a special understanding of technologies like Big Data, your company can get even more value.
0
My business is exploring the option of recoding item codes as currently its all over the place. Ideally going forward we would like to have only one serial number generated for an item  and that the serial number will be the same as the item number.

Is it possible and what impact it will have to the business.

Thanks
0
Hello,

I am new to Hadoop.  I have a question regarding yarn memory allocation.  If  we have 16GB memory in cluster,  we can have least 3 4GB cluster an keep 4 GB for other uses.  If a job needs 10 GB RAM, would it use 3 containers or  use one container and will start using the ram rest of the RAM ?
0
Hello Guys,

We would like to keep Hadoop prod , dev and QA with standard settings and configurations should sync.   What is the best practise to keep them same?  Since we have 100+ data nodes in PROD and only 8 nodes in Dev and 8 Nodes in QA.

We need to make sure all of them are in sync. What is best practise to keep them same?
0
Hi,

what is the diff between MariaDB ColumnStore 1 0  and MS SQL SSIS + SSAS ? if MariaDB ColumnStore 1 0   ?
0
dear all,
I have got video and audio files I need to segment them based on their text.
I need to segment all the files. for example ( a single word contain n audio frames and n of visual frames (images) )
Can any one help or advice how can I make it?

Thanks
0
Get your Conversational Ransomware Defense e‑book
LVL 1
Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

what is Big data hadoop ? how it works  and what software is required to run it ?
0
Hi,

I am curious if someone knows the best way to set alerts based on certain keywords for financial filings such as 8K, 10K etc. For example, I want to create an alert such that when the following filing appears on the website and has a keyword like "PSU", I get an alert.https://www.sec.gov/Archives/edgar/data/1115128/000156459017019148/quot-8k_20170928.htm

Thanks
0
Hello,

When we create datanodes ,  for the disks do we need to use local disks or SAN disks?  Most of them are recommending the local disks. Why do we need to have local disks?
0
3
detail data blocks will not query when one of them change
0
I had this question after viewing Advice for vb.net web application structure with code generator - refactoring, rewrite, change ORM?.

Hi Mr. tablaFreak,

Actually i was looking for similar code generator that enables me to create data-intensive asp.net web application in vb.net and after reading this article i think this is the best performance approach for CRUD operations with big data, but i am really not aware of how to bind class records to write literal HTML code in the code behind as you mentioned, so kindly provide your your code generator along with few samples that can help for the same

Your assistance is highly appreciated.
My email is SherifMazar@gmail.com
0
Hi experts, I'm having trouble sending file to .asp server using ajax as the code shown below. I've observed that too much character could hinder the asp server not to receive, sending an error says; "The source you are looking for has been removed, had its changed or is temporarily unavailable". My question is, Is there other way forcing the server to receive big data using Ajaz like the way i used below? What approach should i use to handle big data to send .asp server? Thanks experts!

 
function iGetPerona(x,code,inv)
{				
	if (window.XMLHttpRequest)
	  {// code for IE7+, Firefox, Chrome, Opera, Safari
	  xmlhttp = new XMLHttpRequest();
	  }
	else
	  {// code for IE6, IE5
	  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
	  }
	xmlhttp.onreadystatechange=function()
	  {
	  if (this.readyState==4 && this.status==200)
	    {
	        var msg = this.response;
	        alert(msg);
	    }
	  }                                            
		xmlhttp.open("GET","SavePersona.asp?a="+x + "&idb="+ mcCode + "&b=" + code + "&c=" + inv, true);
		xmlhttp.send();
}

Open in new window

0
In this article, I read "Even in the optimistic scenario, just mining one bitcoin in 2020 would require a shocking 5,500 kWh, or about half the annual electricity consumption of an American household."

https://motherboard.vice.com/en_us/article/aek3za/bitcoin-could-consume-as-much-electricity-as-denmark-by-2020

So, I am trying to understand what exactly is meant by "mining one bitcoin."

Does this mean looking through the entire ledger to trace the history of a single bitcoin?

How large is that ledger, in record count...

Does the BlockChain database format have any query capabilities?

Please tell me what you can, since I find this entire problem very daunting.

Thanks
0
By, Vadim Tkachenko. In this article we’ll look at ClickHouse on its one year anniversary.
2
Fill in the form and get your FREE NFR key NOW!
LVL 1
Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

Hi there,

I know its kinda a ridiculous question since the Cisco Nexus Series is high end data center hardware and the Cisco SG500X is SMB. But for my home lab I am planning for the future and a good friend who runs a big data center wanted to sell me some nice Cisco Nexus Stuff. So I could get them very very cheap few hundred bucks vs. the around 1k for the SG500X-24. To be specific it would be a Nexus 5596UP with a 2248TP expension.

Would you go for the Nexus or for the SG500X? What are the gotcha's with the Nexus?

I know that the SG500X does L3 stuff out of the box. The Nexus 5596UP needs the L3 Module and the right License file for it. Also the Nexus 5596UP can't do 100MBit but I guess that's solved with the 2248TP Expension.

Thanks,
Yves
0
I'm building a Microsoft Access application with a login authentication feature, I wanted to avoid the need for users to enter a login and password each time they need the use the app and then thought of using biometrics . How can I implement the use of a biometric login to a Microsoft Access application.
0
I'm working on an ad campaign management app. There's a feature there advertisers can assign caps for campaign based on spending or conversions in daily, monthly or lifetime basis and there would be multiple caps per ad campaign. As soon as one campaign reaches the 80% we will send notification to all publishes and once caps are reached we have to stop the campaign immediately. We're receiving thousands of event per second. Currently what i'm doing is querying reporting table every second but it's quite inefficient and sometimes campaigns already exceeded the caps when I detect it. So my question is;

What're the existing efficient programmatic or architectural solution's in industry to handle these kind of situations?
0
I have large numbers PDF document, from which I need to extract text. The extracted text I use for further processing. I did this for a small subset of documents using Tesseract API in a linear approach and I get the required output. However, this takes a very long time when I have a large number of documents.

I tried to use the Hadoop environment processing capabilities (Map-Reduce) and storage (HDFS) for solving this issue. However, I am facing problem to implement Tesseract API into the Hadoop (Map-Reduce) approach. As Teserract converts the files into intermediate image files, I am confused as to how intermediate result Image files of Tesseract-API-process can be handled inside HDFS.

I have searched and unsuccesfully tried a few options earlier like:

    I have extracted text from PDF by extending FileInputFormat class into my own PdfInputFormat class using Hadoop-Map-Reduce, for this i used Apache PDFBox to extract text from pdf, but when it comes to scanned-pdf's which contains image, this solution does not give me the required results.

    I found few answers on the same topic stating to use -Fuse and that will help or one should generate image files locally and than upload those into hdfs for further processing. Not sure if this is the correct approach.

Would like to know approaches around this.
0
Hey,

I have an audio file, many actually, that are an interview between the interviewer and interviewee.  The same person is asking questions in each file, while the people answering are different.

I need to separate the answers out by generating silence over the interview questions. I'm currently doing this by hand with audacity, but it is extremely time consuming.

Any help would be greatly appreciated.  I am a software developer, but audio is not my area, so code is am option if there isn't a program available.

Thanks
0
https://www.google.com/search?biw=1918&bih=974&tbs=dur%3Al&tbm=vid&q=taxi+mafia+-android+-walkthrough+-gameplay+-game+-%22video+game%22+-playstation+-xbox&oq=taxi+mafia+-android+-walkthrough+-gameplay+-game+-%22video+game%22+-playstation+-xbox&gs_l=serp.3...14103.18629.0.19563.19.18.0.0.0.0.113.1362.16j2.18.0....0...1.1.64.serp..1.0.0.KTXCbd4TVOY


copy that search into a browser
Google with videos larger than 20 minutes

I am looking for mob (like tony sprano tv show "the spranos" on hbo television) documentary about taxi drivers
Please dont completely edit search to

tony sprano taxi cab

to find tony sprano riding a taxi cab because this is an algorithm question

Tony sprano riding a taxi cab is an example to a correct answer but isnt the only correct answer to the question but please dont edit the answer too much


All the results are video games playing
People filming themselves playing xbox or playstation or nintendo

Seems like all the specific searches are something else.
So this is more of a big data question.

I usually just watch netflix.com because it is easier than something specific.
Most people just watch regular tv because tv is easy to turn on

this question is a puzzle and not looking for a link the meaning of + and - symbols
so if you have a better picture of google custom search terms that is not the answer.
google-puzzle
Note: this question is not homework and is not a puzzle for a job interview -YET.…
0

Big Data

Big data describes data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.