Solved

ETL using Google App Engine?

Posted on 2015-01-27
4
265 Views
Last Modified: 2015-04-14
Our company is seriously considering implementing our data warehouse and business intelligence on Google's Cloud Platform. Datasets would be queried using BigQuery and resulting data would be read through Google Sheets, QlikView, Tableau, etc. The way this works seems rather straightforward, but the more complex part is the ETL process that needs to happen before the data is loaded into BigQuery.
We've been advised that it's possible to use the Google Compute/App Engine to handle our ETL, but there is little information out there in the form of examples or case studies.
I'm wondering if there are experts here who have experience with this technology, and who can share their experiences. I'm looking for an ETL solution that can ideally does not require a lot of manual programming/scripting and should run with minimal user interaction once it's created/set up. Our system data is all kept in MSSQL 2008 R2 databases, which are currently not cloud based. Data must be loaded from multiple databases across multiple systems.
I realize this is a broad question with many possible answers and opinions so I'll divide the points over the most useful posts.
Thanks in advance.
0
Comment
Question by:Koen Van Wielink
  • 3
4 Comments
 
LVL 11

Expert Comment

by:SThaya
Comment Utility
Hi,

  I got few interesting results while i am searching on net.please find the below.

1. There are few ETL tools available for data load
https://cloud.google.com/bigquery/third-party-tools

2.Direct data insert into Big Query
http://codereview.stackexchange.com/questions/51828/insert-an-sql-server-table-rows-into-a-bigquery-table-in-one-bloc
3.SSIS ETL component :

http://www.rssbus.com/ssis/
http://www.rssbus.com/ssis/bigquery/download.aspx

i hope this will help you to move furthur
0
 
LVL 12

Author Comment

by:Koen Van Wielink
Comment Utility
Hi SThaya,

Thanks for the reply.

I found the first point as well, but we'd prefer not to have to buy an expensive 3rd party ETL tool considering that, if we do require a separate tool, we have SSIS available.

I'm fairly sure the suggestion made in the forum you're referencing in point 2 is not possible, as you can only insert data through the Google API, and not as a linked server.

The SSIS ETL component you're referring to in point 3 looks interesting, but it bypasses the Google Cloud Storage completely. The main drawback I see here is that BigQuery does not support updates/deletes on existing records. If changes are required, the entire table has to be dropped and re-created with the new data. As such, if we have to reload the tables I'd much rather to this from the cloud storage, rather than having to upload huge amounts of data each time from our server directly into BigQuery. At least with Cloud Storage there should be some possibility to only keep track of the changes in CSV or JSON files.

Does anyone here on EE have first hand experience with BigQuery and, more importantly, an ETL process supporting it?
0
 
LVL 12

Accepted Solution

by:
Koen Van Wielink earned 0 total points
Comment Utility
Just want to update this before requesting to close the question.
We are currently evaluating a tool called X-Plenty which is a cloud based ETL system which does not require any on-premise installation. We're probably also going to move away from Google BigQuery and use Amazon Redshift instead, although the final decision has yet to be made.
0
 
LVL 12

Author Closing Comment

by:Koen Van Wielink
Comment Utility
No satisfactory answer provided, and our own research has led us to the solution stated in this answer.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Exchange server is not supported in any cloud-hosted platform (other than Azure with Azure Premium Storage).
This Micro Tutorial will demonstrate without sampling how to find out top organic landing pages. The hack gets around the standard way to find the pages in Google Analytics results in sampling for larger sites.
This Micro Tutorial will demonstrate importing calendar invites from events such as webinars into your Google Calendar.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now