[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x

Hadoop

Apache™ Hadoop® is an  open-source framework that allows large data sets to be processed and distributed across commodity cluster computers.

Share tech news, updates, or what's on your mind.

Sign up to Post

We are considering Splunk, ELK or Apache Metro Hadoop  for SIEM.

Q1:
I've encountered nightmares with a top-end SIEM in the past when
querying/retrieving data : takes days & even crash : which of the
above has excellent super-speed log management & querying?

Q2:
I was told by an ex-colleague that Arcsight/Splunk requires CEF
(Common Event Format or syslog format) to be piped to them
as they can't accept any other format.  A vendor using QRadar
told me QRadar requires syslog/CEF format inputs too.
I've SNMP traps / MIBS events (eg: from Cisco & proprietary
devices) that my ex-colleague told me can't be accepted by
Splunk/Arcsight, so would like to know if any of the 3 above
tools are more readily able to accept other SNMP/other
event formats

Q3:
Heard that ELK lacks policies which in the long run will be
costlier if we get consultants to customize : do the other
2 products have this concern.  
Also, Splunk Enterprise goes by amount of logs & we're
concerned that too much logs (can be 500MB/month)
 will make the cost high:  weighing between customization
/set-up PS efforts & licensing costs based on amount of
logs (which I guess we can archive off older logs to reduce
the license cost), which of the 3 are more cost-effective?
0
IT Pros Agree: AI and Machine Learning Key
LVL 1
IT Pros Agree: AI and Machine Learning Key

We’d all like to think our company’s data is well protected, but when you ask IT professionals they admit the data probably is not as safe as it could be.

hi,

how can MySQL work with / load and save data from Hadoop ? any build in tools for it ?

is it scalable solution ?
0
hi,

any Hadoop to DB2 gateway/proxy scaleoutable solution for DB2?
0
hi,

any product from DB2 can get data from Hadoop and store in a structured format?
0
hi,

any oracle product that help to transfer data in and out from Hadoop by using single PL SQL language ?

and also can do parallel data processing for that feature ?
0
hi,

anyone use polybase on MSSQL for hadoop ? is scale out feature of Polybase working fine? load balancing working well ?
0
hi,

anyone know how to intergrate MariaDB and Mongo DB so that they work together well ?

how about MariaDB and hadoop?
0
hi,

anyone know how to intergrate MS SQL and Mongo DB so that they work together well ?

how about MS SQL and hadoop?
0
I have been asked to stand up a weighted search appliance for a company.  The decision was to use SOLR to create the search tool so they can use the associated REST API for searches and recommendations.

I'm am still beginning in SOLR and have to ask a basic architecture question.  I have a table with 220 elements, 130 Million record strong.  I grow 5 million a year.

Does this become a Hadoop solution?  or can this still be done with a single SOLR engine?  I need to know which direction to start with so I do this right

Thanks much.
0
Hello Experts,

The following Hive Script retrieves data from the hdfs dfs drive on hadoop from the directory '/user/hive/geography'

I would like to store the results on a local drive called /hadoop/hdfs'

Can someone please show me how to modify the script so that it doesn't retrieve and store the results of the query to 'user/hive/geography', but instead stores the results from the query to '/hadoop/hdfs' (or any local drive)

The script is as follows:

DROP TABLE IF EXISTS HiveSampleIn; 
CREATE EXTERNAL TABLE HiveSampleIn 
(
 anonid int,
 eprofileclass int,
 fueltypes STRING,
 acorn_category int,
 acorn_group STRING,
 acorn_type int,
 nuts4 STRING,
 lacode STRING,
 nuts1 STRING,
 gspgroup STRING,
 ldz STRING,
 gas_elec STRING,
 gas_tout STRING
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography'; 

DROP TABLE IF EXISTS HiveSampleOut; 
CREATE EXTERNAL TABLE HiveSampleOut 
(

acorn_category int,
acorn_categorycount int )

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '10' STORED AS TEXTFILE LOCATION '/user/hive/geography';


INSERT OVERWRITE TABLE HiveSampleOut
Select 
   acorn_category,
   count(*) as acorn_categorycount 
FROM HiveSampleIn Group by acorn_category

Open in new window


Thanks
0
Creating Active Directory Users from a Text File
LVL 8
Creating Active Directory Users from a Text File

If your organization has a need to mass-create AD user accounts, watch this video to see how its done without the need for scripting or other unnecessary complexities.

Techies, Can someone isolate where I'm dropping the ball on getting the regex matches I'm expecting in this dataflow? My goal is to move the matched versions over to 1 kafka topic and  the unmatched over to another kafka topic.  Attached is the client.csv test file


Here's what the data flow looks like--with the regex used in the ExtractText config. NiFi uses Java's version of regular expressions.  

ExtractFileProcessorNiFiwithRegexclient.csv
0
Hello Community,

I have created my first hql code, see below and I can't get any data to appear.. I have recently installed Sandbox. The installation comes with a few sample databases. I'm using the database called sample_07 to guide me with my own .hql code.

My hql code is as follows:

CREATE EXTERNAL TABLE mysample
(
 code STRING,
 description STRING,
 total_emp INT,
 salary INT
)
ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/root/music'
TBLPROPERTIES ("skip.header.line.count" = "1");

Open in new window


However, when I run the code using Zeppellin Notebook with the following code, I can see the tables, but no data appears

%jdbc(hive)
select * from mysample limit 14

Open in new window


However, when I run the same code, with using the sample database called sample_07 both the tables and data appear.

csharp

I'm sure there is something very simple that I'm missing.

Can someone please let me know where I'm going wrong?
0
Hi Experts.

I'm having trouble configuring Flume to stream data from a website to my HDFS. As some tutorial i've read on the Internet such as TutorialPoint, Hadoop Pravendees... They all have the same example that stream data from Twitter to HDFS using Twitter Apps API.

Is there any source code PHP, Java or ASP.NET to do this without getting token like that example? The thing i want to do is setup an Agent in the website i want to get data and have data stream to my HDFS architecture.

Thanks for reading this, best regards.
0
I'm in the Business Intelligence Department, but practically speaking we're the Reporting Department, your basic operational type of reports - lists, lists, and more lists.

I'm at an institution of higher learning, and a new project has come up for the Math Department. They want to know relationships between courses, grades, etc.

Examples:

- if someone gets a D in Calc I, what's the likelihood of graduation?  with various permutations, like taking Calc I again
- what's the likelihood of someone getting a D in Calc I, getting a D or F in Calc II
- for placing incoming students in Pre-Calc or Calc I, what are the factors that indicate success? such as Verbal SAT

So I think I've targeted the right discipline (Analytics), but not sure where to take this project.
1
I've loaded several months of data to Hive using SAS. I have confirmed that everything loaded successfully and can query the data with no issues.
However, when I move to using an Impala editor (local here Hue/Hadoop) and refresh/update the tables, I get an error when running the following query: SELECT * from data_table LIMIT 1000.

The error is:
Your query has the following error(s):
IllegalStateException: null

Seems that it cant see the table.
Any ideas?
0
Hello,

I have just started to work on bigdata hadoop configuration.  I have a quick question regarding admin namenode.  Does name node to be clustered? Because, if that goes down, there won't be no access to data node.
0

Hadoop

Apache™ Hadoop® is an  open-source framework that allows large data sets to be processed and distributed across commodity cluster computers.

Top Experts In
Hadoop
<
Monthly
>

No Top Experts for this time period. Answer questions to earn the title!