Solved

ETL using Google App Engine?

Posted on 2015-01-27
4
341 Views
Last Modified: 2015-04-14
Our company is seriously considering implementing our data warehouse and business intelligence on Google's Cloud Platform. Datasets would be queried using BigQuery and resulting data would be read through Google Sheets, QlikView, Tableau, etc. The way this works seems rather straightforward, but the more complex part is the ETL process that needs to happen before the data is loaded into BigQuery.
We've been advised that it's possible to use the Google Compute/App Engine to handle our ETL, but there is little information out there in the form of examples or case studies.
I'm wondering if there are experts here who have experience with this technology, and who can share their experiences. I'm looking for an ETL solution that can ideally does not require a lot of manual programming/scripting and should run with minimal user interaction once it's created/set up. Our system data is all kept in MSSQL 2008 R2 databases, which are currently not cloud based. Data must be loaded from multiple databases across multiple systems.
I realize this is a broad question with many possible answers and opinions so I'll divide the points over the most useful posts.
Thanks in advance.
0
Comment
Question by:Koen Van Wielink
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
4 Comments
 
LVL 11

Expert Comment

by:SThaya
ID: 40574610
Hi,

  I got few interesting results while i am searching on net.please find the below.

1. There are few ETL tools available for data load
https://cloud.google.com/bigquery/third-party-tools

2.Direct data insert into Big Query
http://codereview.stackexchange.com/questions/51828/insert-an-sql-server-table-rows-into-a-bigquery-table-in-one-bloc
3.SSIS ETL component :

http://www.rssbus.com/ssis/
http://www.rssbus.com/ssis/bigquery/download.aspx

i hope this will help you to move furthur
0
 
LVL 13

Author Comment

by:Koen Van Wielink
ID: 40574671
Hi SThaya,

Thanks for the reply.

I found the first point as well, but we'd prefer not to have to buy an expensive 3rd party ETL tool considering that, if we do require a separate tool, we have SSIS available.

I'm fairly sure the suggestion made in the forum you're referencing in point 2 is not possible, as you can only insert data through the Google API, and not as a linked server.

The SSIS ETL component you're referring to in point 3 looks interesting, but it bypasses the Google Cloud Storage completely. The main drawback I see here is that BigQuery does not support updates/deletes on existing records. If changes are required, the entire table has to be dropped and re-created with the new data. As such, if we have to reload the tables I'd much rather to this from the cloud storage, rather than having to upload huge amounts of data each time from our server directly into BigQuery. At least with Cloud Storage there should be some possibility to only keep track of the changes in CSV or JSON files.

Does anyone here on EE have first hand experience with BigQuery and, more importantly, an ETL process supporting it?
0
 
LVL 13

Accepted Solution

by:
Koen Van Wielink earned 0 total points
ID: 40714964
Just want to update this before requesting to close the question.
We are currently evaluating a tool called X-Plenty which is a cloud based ETL system which does not require any on-premise installation. We're probably also going to move away from Google BigQuery and use Amazon Redshift instead, although the final decision has yet to be made.
0
 
LVL 13

Author Closing Comment

by:Koen Van Wielink
ID: 40722598
No satisfactory answer provided, and our own research has led us to the solution stated in this answer.
0

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Cloud-based technologies and services will continue to grow in popularity in 2017 thanks to the simple, scalable and cost-effective solutions they deliver. Here are three areas where cloud adoption is poised to really take off.
A Stored Procedure in Microsoft SQL Server is a powerful feature that it can be used to execute the Data Manipulation Language (DML) or Data Definition Language (DDL). Depending on business requirements, a single Stored Procedure can return differe…
This Micro Tutorial demonstrates in Google Analytics how to create a custom report that shows you traffic over time using the month of year dimensions. There are also instructions on how to fix Google's odd month of year formatting, which Microsoft …
This Micro Tutorial demonstrates how to quickly find related content for YourTango's posts using MozBar Chrome extension.

690 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question