Hadoop

Apache™ Hadoop® is an  open-source framework that allows large data sets to be processed and distributed across commodity cluster computers.

Share tech news, updates, or what's on your mind.

Sign up to Post

Hi Expert,

Could anybody please guide me how to load data from Oracle DB to Hadoop HDFS and the result back to Oracle DB.

Thanks in Advance!
0
10 Tips to Protect Your Business from Ransomware
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Issue with high number of  TCP CLOSE_WAIT socket connections on Hortonwork(HDP2.6.4) NameNodes & Metastore Server.
We frequently have very high number of CLOSE_WAIT  socket connections on hadoop servers, as a result hadoop services are unavailable on Namenode servers. This happen after heavy ingestion of data in cluster. As a result, I need to restart the cluster after re-booting concerned servers.
I tried re-setting  value of several TCP attributes on the servers, but this had not solve the problem.
Using lsop | grep CLOSE_WAIT, I can identified concerned processes which had CLOSE_WAIT socket connections, I killed the concerned process & try to re-start hadoop services but this had also not solve the problem.
I had monitored the servers for number of CLOSE_WAIT socket connections & whenever number of these keep rising , it's point to symptom that the hadoop services on NameNode are going to down in couple of minutes.
Any idea to solve this issue is welcome.
0
So i have a user schema like this:

var user_schema = new Schema({
   username:{type:String,required: true, unique : true, trim: true},
   college:{type:String,required: true},
   password:{type:String,required: true, trim: true},
   email:{type:String,required: true, unique : true, trim: true},
   phone:{type:Number,required: true, unique : true, trim: true},
   dp:{type:String},
   tag:{type:String},
   description:{type:String},
   friends:[{type:String}],
   pending:[{type:String}],
   skills:{type:String},
   bucket:[{type:String}]
  });

Open in new window

and my objective is, to search the all the documents in the collection to get people based on the following conditions:

1. They should not be in the users' "friends" array.
2. They should not be in the users' "pending" array.
3. They should have the same "tag" (a string value) as the user.

So, basically I have to compare the users' fields ("friends","pending" and "tags"), with fields of all documents in the whole collection.

How do I do it, using mongoose (nodejs mongodb library)
0
My requirement is to store multiple data types in same column of a hive table.
And also be able to read that data .e.g. if one records has array value for that column and another record has struct value or string , I may be able to fetch that value accordingly.

I could store json data as avro in hive table with data type of string for that particular column( since spark/sqlcontext inferred the data type as string for multiple data types in the same column).I am not able to operat on that data.I just can simply read by select columnname from table.

I have tried to use uniontype to load that column-table data(string) into another table with uniontype as data type for that column but it erred saying mismatch.

Error while compiling statement: FAILED: SemanticException [Error 10044]: line 18:36
Cannot insert into target table because column number/types are different 'test_uniontyp':
Cannot convert column 1 from string to uniontype<array<string>, string,struct<abd:array<struct<ax:string,bx:string>>>>
.


Any suggestion.
0
hi,

how can MySQL work with / load and save data from Hadoop ? any build in tools for it ?

is it scalable solution ?
0
hi,

any Hadoop to DB2 gateway/proxy scaleoutable solution for DB2?
0
hi,

any product from DB2 can get data from Hadoop and store in a structured format?
0
hi,

any oracle product that help to transfer data in and out from Hadoop by using single PL SQL language ?

and also can do parallel data processing for that feature ?
0
hi,

anyone use polybase on MSSQL for hadoop ? is scale out feature of Polybase working fine? load balancing working well ?
0
hi,

anyone know how to intergrate MariaDB and Mongo DB so that they work together well ?

how about MariaDB and hadoop?
0
Determine the Perfect Price for Your IT Services
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

hi,

anyone know how to intergrate MS SQL and Mongo DB so that they work together well ?

how about MS SQL and hadoop?
0
I have been asked to stand up a weighted search appliance for a company.  The decision was to use SOLR to create the search tool so they can use the associated REST API for searches and recommendations.

I'm am still beginning in SOLR and have to ask a basic architecture question.  I have a table with 220 elements, 130 Million record strong.  I grow 5 million a year.

Does this become a Hadoop solution?  or can this still be done with a single SOLR engine?  I need to know which direction to start with so I do this right

Thanks much.
0
There is a partitioned(batch_date is partition)  hive  table(Table 1) with 3 columns and 1 partition

I'm trying to execute  INSERT INTO TABLE PARTITION(batch_date='2018-02-22')  select column 1, column 2, column 3 from  Table 1 where column 1 = "ABC";

It returns zero records and in hdfs it is creating 3 empty files.

Can you please suggest me a solution on how to prevent these small files  getting created in hdfs??


Note: Before running the INSERT statements I have set the below hive properties.

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
0
basically I would like to  create a sample hello world program in Map Reduce and scala which can hit the job history server.
0
Hello Experts,

The following Hive Script retrieves data from the hdfs dfs drive on hadoop from the directory '/user/hive/geography'

I would like to store the results on a local drive called /hadoop/hdfs'

Can someone please show me how to modify the script so that it doesn't retrieve and store the results of the query to 'user/hive/geography', but instead stores the results from the query to '/hadoop/hdfs' (or any local drive)

The script is as follows:

DROP TABLE IF EXISTS HiveSampleIn; 
CREATE EXTERNAL TABLE HiveSampleIn 
(
 anonid int,
 eprofileclass int,
 fueltypes STRING,
 acorn_category int,
 acorn_group STRING,
 acorn_type int,
 nuts4 STRING,
 lacode STRING,
 nuts1 STRING,
 gspgroup STRING,
 ldz STRING,
 gas_elec STRING,
 gas_tout STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography'; 

DROP TABLE IF EXISTS HiveSampleOut; 
CREATE EXTERNAL TABLE HiveSampleOut 
(

acorn_category int,
acorn_categorycount int )

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography';


INSERT OVERWRITE TABLE HiveSampleOut
Select 
   acorn_category,
   count(*) as acorn_categorycount 
FROM HiveSampleIn Group by acorn_category

Open in new window


Thanks
0
Hello Experts,

I have created the following Hadoop Hive Script.

The script is attempting to store the results of a query into the following location:

LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';

However, I keep on getting the following error:

FAILED: ParseException line 9:0 Failed to recognize predicate 'ROW'. Failed rule: 'identifier' in table or column identifier
18/01/30 16:08:06 [main]: ERROR ql.Driver: FAILED: ParseException line 9:0 Failed to recognize predicate 'ROW'. Failed rule: 'identifier' in table or column identifier
org.apache.hadoop.hive.ql.parse.ParseException: line 9:0 Failed to recognize predicate 'ROW'. Failed rule: 'identifier' in table or column identifier

Open in new window


The Hive script is as follows:

[code]DROP TABLE IF EXISTS geography;
CREATE EXTERNAL TABLE geography
(
 anonid INT,
 eprofileclass INT,
 fueltypes STRING,
 acorn_category INT,
 acorn_group STRING,
 acorn_type INT,
 nuts4 STRING,
 lacode STRING,
 nuts1 STRING,
 gspgroup STRING,
 ldz STRING,
 gas_elec STRING,
 gas_tout STRING
)
ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/'
TBLPROPERTIES ("skip.header.line.count" = "1");

Create table acorn_category_frequency
 as
select acorn_category,
 count(*) as acorn_categorycount
from geography
group by acorn_category,
ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';

Open in new window

[/code]

Can someone please help figure out where I'm going wrong in the script?

Thanks
0
Techies, Can someone isolate where I'm dropping the ball on getting the regex matches I'm expecting in this dataflow? My goal is to move the matched versions over to 1 kafka topic and  the unmatched over to another kafka topic.  Attached is the client.csv test file


Here's what the data flow looks like--with the regex used in the ExtractText config. NiFi uses Java's version of regular expressions.  

ExtractFileProcessorNiFiwithRegexclient.csv
0
Hello Experts,

I have created the following Hadoop Hive HQL script, however, I keep on getting the following error

FAILED: ParseException line 21:44 missing EOF at ',' near ')'
18/01/29 21:37:15 [main]: ERROR ql.Driver: FAILED: ParseException line 21:44 missing EOF at ',' near ')'

Open in new window


The script is as follows:
DROP TABLE IF EXISTS HiveSampleIn;
CREATE EXTERNAL TABLE HiveSampleIn
(
 anonid INT,
 eprofileclass INT,
 fueltypes STRING,
 acorn_category INT,
 acorn_group STRING,
 acorn_type INT,
 nuts4 STRING,
 lacode STRING,
 nuts1 STRING,
 gspgroup STRING,
 ldz STRING,
 gas_elec STRING,
 gas_tout STRING
)
ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/';
tblproperties ("skip.header.line.count"="2");

DROP TABLE IF EXISTS HiveSampleOut; 
CREATE EXTERNAL TABLE HiveSampleOut 
(    
    acorn_category int
    
) 
ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';

INSERT OVERWRITE TABLE HiveSampleOut
Select 
   acorn_category
   
FROM HiveSampleIn Group by acorn_category;

Open in new window


Any help in fixing this problem will be greatly appreciate.
0
Hello Community,

The Hive script I have created keeps throwing the following error:

Time taken: 2.634 seconds
FAILED: ParseException line 17:2 missing EOF at 'COLUMN' near ')'
18/01/29 10:29:53 [main]: ERROR ql.Driver: FAILED: ParseException line 17:2 missing EOF at 'COLUMN' near ')'
org.apache.hadoop.hive.ql.parse.ParseException: line 17:2 missing EOF at 'COLUMN' near ')'

Open in new window


Can someone please take a look at the Hive script and let me know where I might be going wrong?

[code]DROP TABLE IF EXISTS HiveSampleIn; 
CREATE EXTERNAL TABLE HiveSampleIn 
(
 anonid int,
 eprofileclass int,
 fueltypes STRING,
 acorn_category int,
 acorn_group STRING,
 acorn_type int,
 nuts4 STRING,
 lacode STRING,
 nuts1 STRING,
 gspgroup STRING,
 ldz STRING,
 gas_elec STRING,
 gas_tout STRING
) COLUMN FORMAT DELIMITED FIELDS TERMINATED BY (',') LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/samplein/'; 

DROP TABLE IF EXISTS HiveSampleOut; 
CREATE EXTERNAL TABLE HiveSampleOut 
(    
    acorn_category int
    
) COLUMN FORMAT DELIMITED FIELDS TERMINATED BY (',') LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION 'wasb://adfgetstarted@geogstoreacct.blob.core.windows.net/sampleout/';

INSERT OVERWRITE TABLE HiveSampleOut
Select 
   acorn_category,
   count(*) as acorn_categorycount 

FROM HiveSampleIn Group by acorn_category

Open in new window

[/code]
Cheers

Carlton
0
Angular Fundamentals
LVL 12
Angular Fundamentals

Learn the fundamentals of Angular 2, a JavaScript framework for developing dynamic single page applications.

Hello Experts,

I would like to run a query on the attached file, but I don't know what type information is on the file in order to run a query on it.

Can someone let me know how to determine the information included in the file?

Regards

Carlton
VANQ_TRIAD_COLLS_20180118
0
Hello Community,

I have created my first hql code, see below and I can't get any data to appear.. I have recently installed Sandbox. The installation comes with a few sample databases. I'm using the database called sample_07 to guide me with my own .hql code.

My hql code is as follows:

CREATE EXTERNAL TABLE mysample
(
 code STRING,
 description STRING,
 total_emp INT,
 salary INT
)
ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/root/music'
TBLPROPERTIES ("skip.header.line.count" = "1");

Open in new window


However, when I run the code using Zeppellin Notebook with the following code, I can see the tables, but no data appears

%jdbc(hive)
select * from mysample limit 14

Open in new window


However, when I run the same code, with using the sample database called sample_07 both the tables and data appear.

csharp

I'm sure there is something very simple that I'm missing.

Can someone please let me know where I'm going wrong?
0
Hello Experts,

I have run a hql called samplehive.hql, see attached. However, the script fails with the following error:

FAILED: ParseException line 1:2 cannot recognize input near 'D' 'R' 'O'
18/01/17 20:46:46 [main]: ERROR ql.Driver: FAILED: ParseException line 1:2 cannot recognize input near 'D' 'R' 'O'
org.apache.hadoop.hive.ql.parse.ParseException: line 1:2 cannot recognize input near 'D' 'R' 'O'

I'm very new to Hadoop Hive, can someone take a look at the script and let me know where I'm going wrong

Thanks
samplehive.txt
0
Hello,

I am new to Hadoop.  I have a question regarding yarn memory allocation.  If  we have 16GB memory in cluster,  we can have least 3 4GB cluster an keep 4 GB for other uses.  If a job needs 10 GB RAM, would it use 3 containers or  use one container and will start using the ram rest of the RAM ?
0
Hello,
I am new to Hadoop,  when you configure hive server and yarn.  Can we pick any  node or  need a special node for it? Or can we use the name node?
0
Hello Guys,

We would like to keep Hadoop prod , dev and QA with standard settings and configurations should sync.   What is the best practise to keep them same?  Since we have 100+ data nodes in PROD and only 8 nodes in Dev and 8 Nodes in QA.

We need to make sure all of them are in sync. What is best practise to keep them same?
0

Hadoop

Apache™ Hadoop® is an  open-source framework that allows large data sets to be processed and distributed across commodity cluster computers.

Top Experts In
Hadoop
<
Monthly
>