Solved

ETL using Google App Engine?

Posted on 2015-01-27
4
300 Views
Last Modified: 2015-04-14
Our company is seriously considering implementing our data warehouse and business intelligence on Google's Cloud Platform. Datasets would be queried using BigQuery and resulting data would be read through Google Sheets, QlikView, Tableau, etc. The way this works seems rather straightforward, but the more complex part is the ETL process that needs to happen before the data is loaded into BigQuery.
We've been advised that it's possible to use the Google Compute/App Engine to handle our ETL, but there is little information out there in the form of examples or case studies.
I'm wondering if there are experts here who have experience with this technology, and who can share their experiences. I'm looking for an ETL solution that can ideally does not require a lot of manual programming/scripting and should run with minimal user interaction once it's created/set up. Our system data is all kept in MSSQL 2008 R2 databases, which are currently not cloud based. Data must be loaded from multiple databases across multiple systems.
I realize this is a broad question with many possible answers and opinions so I'll divide the points over the most useful posts.
Thanks in advance.
0
Comment
Question by:Koen Van Wielink
  • 3
4 Comments
 
LVL 11

Expert Comment

by:SThaya
ID: 40574610
Hi,

  I got few interesting results while i am searching on net.please find the below.

1. There are few ETL tools available for data load
https://cloud.google.com/bigquery/third-party-tools

2.Direct data insert into Big Query
http://codereview.stackexchange.com/questions/51828/insert-an-sql-server-table-rows-into-a-bigquery-table-in-one-bloc
3.SSIS ETL component :

http://www.rssbus.com/ssis/
http://www.rssbus.com/ssis/bigquery/download.aspx

i hope this will help you to move furthur
0
 
LVL 13

Author Comment

by:Koen Van Wielink
ID: 40574671
Hi SThaya,

Thanks for the reply.

I found the first point as well, but we'd prefer not to have to buy an expensive 3rd party ETL tool considering that, if we do require a separate tool, we have SSIS available.

I'm fairly sure the suggestion made in the forum you're referencing in point 2 is not possible, as you can only insert data through the Google API, and not as a linked server.

The SSIS ETL component you're referring to in point 3 looks interesting, but it bypasses the Google Cloud Storage completely. The main drawback I see here is that BigQuery does not support updates/deletes on existing records. If changes are required, the entire table has to be dropped and re-created with the new data. As such, if we have to reload the tables I'd much rather to this from the cloud storage, rather than having to upload huge amounts of data each time from our server directly into BigQuery. At least with Cloud Storage there should be some possibility to only keep track of the changes in CSV or JSON files.

Does anyone here on EE have first hand experience with BigQuery and, more importantly, an ETL process supporting it?
0
 
LVL 13

Accepted Solution

by:
Koen Van Wielink earned 0 total points
ID: 40714964
Just want to update this before requesting to close the question.
We are currently evaluating a tool called X-Plenty which is a cloud based ETL system which does not require any on-premise installation. We're probably also going to move away from Google BigQuery and use Amazon Redshift instead, although the final decision has yet to be made.
0
 
LVL 13

Author Closing Comment

by:Koen Van Wielink
ID: 40722598
No satisfactory answer provided, and our own research has led us to the solution stated in this answer.
0

Featured Post

Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SQL Server Designer 19 45
SQL Replication question 9 43
Help Required 3 97
optimize stored proc 14 22
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Moving applications to the cloud or switching services to cloud-based ones, is a stressful job.  Here's how you can make it easier.
This Micro Tutorial will demonstrate Google Calendar to monitor updates with top sites, such as Facebook, Google, Twitter, etc. with Marketing News. Each update of Google Calendar can be monitored, correlate dips and spikes in your website traffic, …
This Micro Tutorial will demonstrate common damaging and frequent mistakes I see in most analytic audits. Most of them are campaign tagging mistakes, so this video will break it down into simple steps.

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question