[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

ETL using Google App Engine?

Posted on 2015-01-27
4
Medium Priority
?
390 Views
Last Modified: 2015-04-14
Our company is seriously considering implementing our data warehouse and business intelligence on Google's Cloud Platform. Datasets would be queried using BigQuery and resulting data would be read through Google Sheets, QlikView, Tableau, etc. The way this works seems rather straightforward, but the more complex part is the ETL process that needs to happen before the data is loaded into BigQuery.
We've been advised that it's possible to use the Google Compute/App Engine to handle our ETL, but there is little information out there in the form of examples or case studies.
I'm wondering if there are experts here who have experience with this technology, and who can share their experiences. I'm looking for an ETL solution that can ideally does not require a lot of manual programming/scripting and should run with minimal user interaction once it's created/set up. Our system data is all kept in MSSQL 2008 R2 databases, which are currently not cloud based. Data must be loaded from multiple databases across multiple systems.
I realize this is a broad question with many possible answers and opinions so I'll divide the points over the most useful posts.
Thanks in advance.
0
Comment
Question by:Koen Van Wielink
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
4 Comments
 
LVL 11

Expert Comment

by:SThaya
ID: 40574610
Hi,

  I got few interesting results while i am searching on net.please find the below.

1. There are few ETL tools available for data load
https://cloud.google.com/bigquery/third-party-tools

2.Direct data insert into Big Query
http://codereview.stackexchange.com/questions/51828/insert-an-sql-server-table-rows-into-a-bigquery-table-in-one-bloc
3.SSIS ETL component :

http://www.rssbus.com/ssis/
http://www.rssbus.com/ssis/bigquery/download.aspx

i hope this will help you to move furthur
0
 
LVL 13

Author Comment

by:Koen Van Wielink
ID: 40574671
Hi SThaya,

Thanks for the reply.

I found the first point as well, but we'd prefer not to have to buy an expensive 3rd party ETL tool considering that, if we do require a separate tool, we have SSIS available.

I'm fairly sure the suggestion made in the forum you're referencing in point 2 is not possible, as you can only insert data through the Google API, and not as a linked server.

The SSIS ETL component you're referring to in point 3 looks interesting, but it bypasses the Google Cloud Storage completely. The main drawback I see here is that BigQuery does not support updates/deletes on existing records. If changes are required, the entire table has to be dropped and re-created with the new data. As such, if we have to reload the tables I'd much rather to this from the cloud storage, rather than having to upload huge amounts of data each time from our server directly into BigQuery. At least with Cloud Storage there should be some possibility to only keep track of the changes in CSV or JSON files.

Does anyone here on EE have first hand experience with BigQuery and, more importantly, an ETL process supporting it?
0
 
LVL 13

Accepted Solution

by:
Koen Van Wielink earned 0 total points
ID: 40714964
Just want to update this before requesting to close the question.
We are currently evaluating a tool called X-Plenty which is a cloud based ETL system which does not require any on-premise installation. We're probably also going to move away from Google BigQuery and use Amazon Redshift instead, although the final decision has yet to be made.
0
 
LVL 13

Author Closing Comment

by:Koen Van Wielink
ID: 40722598
No satisfactory answer provided, and our own research has led us to the solution stated in this answer.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
Windocks is an independent port of Docker's open source to Windows.   This article introduces the use of SQL Server in containers, with integrated support of SQL Server database cloning.
This Micro Tutorial demonstrates the importance of annotations in Google Analytics and how they should be used to document changes made to a site, Google updates (Ex: Panda & Penguin), marketing campaigns, and any other events that might have contri…
Both in life and business – not all partnerships are created equal. Spend 30 short minutes with us to learn:   • Key questions to ask when considering a partnership to accelerate your business into the cloud • Pitfalls and mistakes other partners…
Suggested Courses

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question