Hadoop

Apache™ Hadoop® is an  open-source framework that allows large data sets to be processed and distributed across commodity cluster computers.

Share tech news, updates, or what's on your mind.

Sign up to Post

My hive external table location is set to tmp location .  We need to change the location of data file of hive external table . How can I do it ? Please let me know .

hdfs dfs -ls -R /base/data/hive/landing/dw_transaction/ |grep "^d"

drwxr-xr-x   - grp_scoring_prod grp_scoring          0 2018-11-09 11:00 /base/data/hive/landing/dw_transaction/>
drwxr-xr-x   - grp_scoring_prod grp_scoring          0 2019-04-23 04:28 /base/data/hive/landing/dw_transaction/>/tmp
drwxr-xr-x   - grp_scoring_prod grp_scoring          0 2019-03-23 19:22 /base/data/hive/landing/dw_transaction/>/tmp/hive_hive_2019-03-23_16-04-31_798_8057101057507479986-1
drwxr-xr-x   - grp_scoring_prod grp_scoring          0 2019-03-24 10:59 /base/data/hive/landing/dw_transaction/>/tmp/hive_hive_2019-03-23_16-04-31_798_8057101057507479986-1/-ext-10001
drwxr-xr-x   - grp_scoring_prod grp_scoring          0 2019-03-24 09:40 /base/data/hive/landing/dw_transaction/>/tmp/hive_hive_2019-03-23_16-04-31_798_8057101057507479986-1/_task_tmp.-ext-10000
0
Ensure you’re charging the right price for your IT
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

In my cloudera 5.12 cluster yarn-nm-state is almost 84% full . How to resove it ? Please let me know
0
Kylin not getting started

here is the below error which I'm getting


kafka dependency is /opt/apache-kylin-2.2.0-bin/lib/kafka-clients-1.0.0.jar
Retrieving Spark dependency...
Error: Could not find or load main class exists
ERROR: Unknown error. Please check full log.
0
We are considering Splunk, ELK or Apache Metro Hadoop  for SIEM.

Q1:
I've encountered nightmares with a top-end SIEM in the past when
querying/retrieving data : takes days & even crash : which of the
above has excellent super-speed log management & querying?

Q2:
I was told by an ex-colleague that Arcsight/Splunk requires CEF
(Common Event Format or syslog format) to be piped to them
as they can't accept any other format.  A vendor using QRadar
told me QRadar requires syslog/CEF format inputs too.
I've SNMP traps / MIBS events (eg: from Cisco & proprietary
devices) that my ex-colleague told me can't be accepted by
Splunk/Arcsight, so would like to know if any of the 3 above
tools are more readily able to accept other SNMP/other
event formats

Q3:
Heard that ELK lacks policies which in the long run will be
costlier if we get consultants to customize : do the other
2 products have this concern.  
Also, Splunk Enterprise goes by amount of logs & we're
concerned that too much logs (can be 500MB/month)
 will make the cost high:  weighing between customization
/set-up PS efforts & licensing costs based on amount of
logs (which I guess we can archive off older logs to reduce
the license cost), which of the 3 are more cost-effective?
0
hi,

how can MySQL work with / load and save data from Hadoop ? any build in tools for it ?

is it scalable solution ?
0
hi,

any Hadoop to DB2 gateway/proxy scaleoutable solution for DB2?
0
hi,

any product from DB2 can get data from Hadoop and store in a structured format?
0
hi,

any oracle product that help to transfer data in and out from Hadoop by using single PL SQL language ?

and also can do parallel data processing for that feature ?
0
hi,

anyone use polybase on MSSQL for hadoop ? is scale out feature of Polybase working fine? load balancing working well ?
0
hi,

anyone know how to intergrate MariaDB and Mongo DB so that they work together well ?

how about MariaDB and hadoop?
0
CompTIA Network+
LVL 13
CompTIA Network+

Prepare for the CompTIA Network+ exam by learning how to troubleshoot, configure, and manage both wired and wireless networks.

hi,

anyone know how to intergrate MS SQL and Mongo DB so that they work together well ?

how about MS SQL and hadoop?
0
I have been asked to stand up a weighted search appliance for a company.  The decision was to use SOLR to create the search tool so they can use the associated REST API for searches and recommendations.

I'm am still beginning in SOLR and have to ask a basic architecture question.  I have a table with 220 elements, 130 Million record strong.  I grow 5 million a year.

Does this become a Hadoop solution?  or can this still be done with a single SOLR engine?  I need to know which direction to start with so I do this right

Thanks much.
0
Hello Experts,

The following Hive Script retrieves data from the hdfs dfs drive on hadoop from the directory '/user/hive/geography'

I would like to store the results on a local drive called /hadoop/hdfs'

Can someone please show me how to modify the script so that it doesn't retrieve and store the results of the query to 'user/hive/geography', but instead stores the results from the query to '/hadoop/hdfs' (or any local drive)

The script is as follows:

DROP TABLE IF EXISTS HiveSampleIn; 
CREATE EXTERNAL TABLE HiveSampleIn 
(
 anonid int,
 eprofileclass int,
 fueltypes STRING,
 acorn_category int,
 acorn_group STRING,
 acorn_type int,
 nuts4 STRING,
 lacode STRING,
 nuts1 STRING,
 gspgroup STRING,
 ldz STRING,
 gas_elec STRING,
 gas_tout STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography'; 

DROP TABLE IF EXISTS HiveSampleOut; 
CREATE EXTERNAL TABLE HiveSampleOut 
(

acorn_category int,
acorn_categorycount int )

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography';


INSERT OVERWRITE TABLE HiveSampleOut
Select 
   acorn_category,
   count(*) as acorn_categorycount 
FROM HiveSampleIn Group by acorn_category

Open in new window


Thanks
0
Techies, Can someone isolate where I'm dropping the ball on getting the regex matches I'm expecting in this dataflow? My goal is to move the matched versions over to 1 kafka topic and  the unmatched over to another kafka topic.  Attached is the client.csv test file


Here's what the data flow looks like--with the regex used in the ExtractText config. NiFi uses Java's version of regular expressions.  

ExtractFileProcessorNiFiwithRegexclient.csv
0
Hello Experts,

I would like to run a query on the attached file, but I don't know what type information is on the file in order to run a query on it.

Can someone let me know how to determine the information included in the file?

Regards

Carlton
VANQ_TRIAD_COLLS_20180118
0
Hello Community,

I have created my first hql code, see below and I can't get any data to appear.. I have recently installed Sandbox. The installation comes with a few sample databases. I'm using the database called sample_07 to guide me with my own .hql code.

My hql code is as follows:

CREATE EXTERNAL TABLE mysample
(
 code STRING,
 description STRING,
 total_emp INT,
 salary INT
)
ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/root/music'
TBLPROPERTIES ("skip.header.line.count" = "1");

Open in new window


However, when I run the code using Zeppellin Notebook with the following code, I can see the tables, but no data appears

%jdbc(hive)
select * from mysample limit 14

Open in new window


However, when I run the same code, with using the sample database called sample_07 both the tables and data appear.

csharp

I'm sure there is something very simple that I'm missing.

Can someone please let me know where I'm going wrong?
0
Hi Experts.

I'm having trouble configuring Flume to stream data from a website to my HDFS. As some tutorial i've read on the Internet such as TutorialPoint, Hadoop Pravendees... They all have the same example that stream data from Twitter to HDFS using Twitter Apps API.

Is there any source code PHP, Java or ASP.NET to do this without getting token like that example? The thing i want to do is setup an Agent in the website i want to get data and have data stream to my HDFS architecture.

Thanks for reading this, best regards.
0
I'm in the Business Intelligence Department, but practically speaking we're the Reporting Department, your basic operational type of reports - lists, lists, and more lists.

I'm at an institution of higher learning, and a new project has come up for the Math Department. They want to know relationships between courses, grades, etc.

Examples:

- if someone gets a D in Calc I, what's the likelihood of graduation?  with various permutations, like taking Calc I again
- what's the likelihood of someone getting a D in Calc I, getting a D or F in Calc II
- for placing incoming students in Pre-Calc or Calc I, what are the factors that indicate success? such as Verbal SAT

So I think I've targeted the right discipline (Analytics), but not sure where to take this project.
1
I've loaded several months of data to Hive using SAS. I have confirmed that everything loaded successfully and can query the data with no issues.
However, when I move to using an Impala editor (local here Hue/Hadoop) and refresh/update the tables, I get an error when running the following query: SELECT * from data_table LIMIT 1000.

The error is:
Your query has the following error(s):
IllegalStateException: null

Seems that it cant see the table.
Any ideas?
0
Hello,

I have just started to work on bigdata hadoop configuration.  I have a quick question regarding admin namenode.  Does name node to be clustered? Because, if that goes down, there won't be no access to data node.
0

Hadoop

Apache™ Hadoop® is an  open-source framework that allows large data sets to be processed and distributed across commodity cluster computers.

Top Experts In
Hadoop
<
Monthly
>

No Top Experts for this time period. Answer questions to earn the title!