Big Data

84

Solutions

4

Articles & Videos

215

Contributors

Big data describes data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.

Share tech news, updates, or what's on your mind.

Sign up to Post

Hey,

I have an audio file, many actually, that are an interview between the interviewer and interviewee.  The same person is asking questions in each file, while the people answering are different.

I need to separate the answers out by generating silence over the interview questions. I'm currently doing this by hand with audacity, but it is extremely time consuming.

Any help would be greatly appreciated.  I am a software developer, but audio is not my area, so code is am option if there isn't a program available.

Thanks
0
The Orion Papers
The Orion Papers

Are you interested in becoming an AWS Certified Solutions Architect?

Discover a new interactive way of training for the exam.

hi,

I am ready introduction to oracle goldengate:

http://www.oracle.com/technetwork/middleware/goldengate/overview/index.html

"

Oracle GoldenGate is a comprehensive software package for real-time data integration and replication in heterogeneous IT environments. The product set enables high availability solutions, real-time data integration, transactional change data capture, data replication, transformations, and verification between operational and analytical enterprise systems. Oracle GoldenGate 12c brings extreme performance with simplified configuration and management, tighter integration with Oracle Database, support for cloud environments, expanded heterogeneity, and enhanced security.

In addition to the Oracle GoldenGate core platform for real-time data movement, Oracle provides the Management Pack for Oracle GoldenGate—a visual management and monitoring solution for Oracle GoldenGate deployments—as well as Oracle GoldenGate Veridata, which allows high-speed, high-volume comparison between two in-use databases.
"

so it is for ETL and replication, but what is Oracle GoldenGate for Big Data? goldengate is not for big data, right?

please share you idea.
0
Hi Wizards, I think everyone nowadays heard about it everyday. So how is your experience with Bitcoin so far? We have 4-5 free servers, can we use it to mine some cent ;-)

Any recommendation for procedures, setup is appreciated. Many thanks as always.
0
regular gmail; not g-suite. one label.

gmail label

only want gmails to reach inbox from one sender
admin@ee.com

all the other emails are not important

is there a gmail filter using the word NOT
0
Hi All,

I'd like to know what kind of performance suggestion and tweak for very large VM deployment ?

I've got one VM running Tableau application which process data from multiple SQL Server databases, then it crunches the numbers before presenting it to the Executive management team.

The specs:

16x vCPU
112 GB vRAM
1 TB D:\ as Thin Provisioned VMDK on VMFS 5

somehow it is running slower every month. So what's the best practice recommendation for deploying such large VM ?

Any tips and suggestion would be greatly appreciated.
0
could i see the
do not call list


how can individuals know which numbers not to call; if they cant see the list


I am not sure which zone this question should be in so please add zones.
0
We have a table that lists dates as a number (double), ie, 20170417.
We would like to place this as a date into a date field, preferably in the format YYYY-MM-DD
What's the most efficient way to accomplish this?
0
Hi,
couple of years ago, our client developed a "Document Management" system for their own (it has specific business rules).
Currently, they have 10 million documents and 8 TB of information.

They currently have the system running in 2 platforms (both perfomes slow):
1. Windows Environment (Windows Server 2012 R2, MS SQL Server 2012 R2 and IIS)
2. Linux (Red Hat Linux 6, mySQL and Apache)

As you guess, managing this system have become terrible difficult because of 2 main reasons:

1. Displaying 'search results' or 'document reports' (list documents and properties) takes more than 30 minutes (in employee's computers).
2. To backup they have to do it in serveral steps (and the night is not enogh to make a full backup) (in employee's computers)

So, they have requested to us to improve their system, we are developers.
Also, they have request to us to propose a new platform for managing the new improved Document Management system.

We have done our research in google, but we are not satisfied on what seams to be the new platform so I would like to receive tour recommendations or suggestions about it.

What we initially think is that using the folloiwing should do the work just fine:
- Amazon Elastic (filesystem)
- Amazon DinamoDB (database)
- Apache Hadoop (web server)
- php/laravel (programming)

Your comments are very welcomed.
Thanks a lot.
0
Hoping to get some opinions.

Plain and simple. I would say 90% of our data stored on the network are PST files created from Outlook. Now when I say 90%, I am talking about hundreds and hundreds of GB, maybe even a TB of data that consists of purely PST files.

What do other companies do to combat people "needing" to save 10 years of email history? I know one of my options is buy more storage, but want to know what other options are out there, or what other people are doing.
0
I've loaded several months of data to Hive using SAS. I have confirmed that everything loaded successfully and can query the data with no issues.
However, when I move to using an Impala editor (local here Hue/Hadoop) and refresh/update the tables, I get an error when running the following query: SELECT * from data_table LIMIT 1000.

The error is:
Your query has the following error(s):
IllegalStateException: null

Seems that it cant see the table.
Any ideas?
0
Free Tool: ZipGrep
LVL 8
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

How do I use the rand() function to divide a data set into 3 parts? Randomness for the purpose of statistical data analysis.
1
I've been searching for non javascript based charts/graphs to display mysql data. We currently use nvd3 but that is becoming a problem when trying to integrate our software with other products.

I need at least 5 or 6 leads showing the possibilities of creating nice looking charts/graphs to represent mysql based data without needing js behind it, at all. Obviously, we'll have to build the intermediate between the charts/graphs and mysql but first trying to find if there are any such solutions.

I've come across a few html5 things but nothing that is really definitive and truly usable today. I'm looking for any and all alternatives to using JS which can show nice charts/graphs.
0
Hi, I am trying to use something similar to a vlookup in dax and am not able to get it working..

I've attached an example workbook.

In table "Sheet1" I have got a column named "BucketID" which is generated from a formula (from dates being completed or not --- giving me a string of 1's and 0's...) -- I am trying to take that string of numbers, and look it up from table "BucketID" --- by looking up the tableID and then providing the corresponding text.... (this output/formula will be in column "Bucket"

can anyone help me?

Attached is the example
Example.xlsx
0
I am try to migrate SQL server data to Google big query server my data size of table is 285 GB. how to migarte it.
0
I have a couple of tables in a database that are using the LONG data type. It's a horrible data type and we need to get rid of it. The question is what do I replace it with? I've narrowed it down to LOB, BLOB, or CLOB but the differences between those seem subtle and I can't quite figure out which is the best choice and why. Some of the LONG fields are storing bitmap image data. Others are storing HTML markup text. I'm OK with using a different data type for each of those. I just need to get rid of the LONG.
0
Is it actually possible for a transportation freight broker or even a manufacturer to create a prediction analytics tool that is accurate enough to predict what truckload capacity will cost in even given lane that moves in say 1 to 6 months?

for example

atlanta ga to chicago il
everything average aa far as weight , product and equipment. Current cost for truckload carrier to haul $1000 . What will this same move cost in 3 months?


My point is in freight hauling there are too many variables involved so each time i read about some company in transaportation freight advertise it has a rate prediction tool , i ask is this possible to predict?
0
I have a 3 node datastax cassandra(Community) cluster with huge data. I have few tables which contain 3-5 billion records in them. I want to delete data that is older than 90 days from those tables.

The problem is how do i run a select query which runs without timeout. I am currently running below query

NOW=$(date -d "-3 month" +"%Y-%m-%d")
select day_ts from table_name where minute_ts < '$NOW' LIMIT 100000 ALLOW FILTERING;


Even if i limit the select query result, it will still parse the whole 3-5 billion records and then filter the data.

Please suggest what can be a efficient way to do this.
0
Hi Experts,

I want to enroll in a big data course. While I'm searching for a course I found many like:
1- Big data schools (BDSCP).
2- EMC Data Science Associate (EMCDSA).
3- Cloudera Certified Associate (CCA).
4- MCSE: Business Intelligence.
Others.....

Which you recommend me to start with, considering these factors:
1- I'm a beginner to big data filed.
2- I have basic level programming language experience (in Microsoft technologies only).
3- I need the most required certificate in the market.
4- Can be studied online without a need to attend at training course center.


Thanks a lot in advance.
Harreni
0
I read that the only way to use MicroStrategy.NET is with VB.NET and aspx pages. Is this still true today?

Thanks
0
Announcing the Most Valuable Experts of 2016
LVL 6
Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

I need to learn this tool before my job starts and hope I can get my hands dirty with it.

Is this possible?

Thanks
0
We are migrating data onto a new server and were wondering if you can do a form of incremental robocopy?

we started copying data across from one server to another but it got interrupted, so we need to run the transfer again but only copy the differences not the whole lot again.

any advice on robocopy switches would be appreciated?
0
I have a financial project which receive real time stock data from some data vendor , save it into mysql database, then retrieve the data and send to the end user browser. The client software provided by the data vendor used to receive stock data is a program written by c/c++ running on the server.  this client can save the data into the mysql database(does not have to be mysql, could be switched to any other database). In order to retrieve the data from the database as quickly as possible, any framework can I use? heard about CES or ESP? spark streaming? any of them can be used for my project?  if not, how can I only retrieve the un-read data from the database as soon as it reach the database? the stock data feed is probably about maxium1000 records(my wild guess, might not be correct)  a second. see the sample below.

+---------------------+--------+-------------------+-------------+
| insertTime                  | symbal | trade_time                 | trade_price |
+---------------------+--------+-------------------+-------------+
| 2016-09-15 04:00:00 | AAPL   | 20160915040000017 |      111.70 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000017 |      111.70 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000200 |      111.69 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000200 |      111.69 |
| 2016-09-15 04:00:00 | AAPL   | 20160915040000272 |      111.51 |
| 2016-09-15 04:01:14 | AAPL   | 20160915040113878 |      111.57 |
| 2016-09-15 04:01:14 | AAPL   …
0
I am new to RDS but wonder if that developing that skill could constitute Big Data? That's the term I see on many .NET Full Stack jobs, so I wonder if that is a subset of Big Data.

From a .NET Full Stack perspective, what other tools are the most likely to be an element of Big Data?


Does Entity Framework or NHibernate fit int all?

Thanks.
0
In the past with a new pc/laptop I have always created a partition D for all my data files to shield them from the Internet.
Is this necessary with Windows 10@
If so how should I do this?
I have not taken to this  OneDrive.
I like to have all my data in one place with its sub folders as I have in the past.
Please advise
0
hi all,

right now finding good course for data scientists and AI, some course coach python or R lauguage,  which one is better?

I am focusing on data analyse and AI using MS SQL server 2016, can python do analytic on SQL 2016 or must R ?
0

Big Data

84

Solutions

4

Articles & Videos

215

Contributors

Big data describes data sets that are so large or complex that traditional data processing applications are inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy. The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.

Top Experts In
Big Data
<
Monthly
>