API data as a stream to Google Pub Sub

Eric Ullmann
Eric Ullmann used Ask the Experts™
I have a public API endpoint that I am pulling a json file every 30 mins. Right now I am using a python pandas dataframe to pull and upload the file to a cloud storage bucket and then sending to pub sub to process and place into BQ. The problem with this is that the file name stays the same and even though I have  gcs text stream to pub sub if it reads the file once it never reads it again even though the file attributes have changed. My question here is can any one help me with code that will pull from an api web link and stream the data directly to pub sub?

Sample code below:
import json
import pandas as pd
from sodapy import Socrata
from io import StringIO
import datalab.storage as gcs
from google.oauth2 import service_account

client = Socrata("sample.org", None)
results = client.get("xxx")

# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results, columns =['segmentid','street','_direction','_fromst','_tost','_length','_strheading','_comments','start_lon','_lif_lat','lit_lon','_lit_lat','_traffic','_last_updt'])
# send results to GCP
gcs.Bucket('test-temp').item('data.json').write_to(results_df.to_json(orient='records', lines=True),'text/json')
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
David FavorFractional CTO
Distinguished Expert 2018

This sounds like a caching problem.

So even though your file attributes are changing, the data consumer has some sort of caching between file + it's code.

I'm unfamiliar with your code stack, so you'll just have to go through every layer of code + read docs + disable any shred of caching done by any layer of code.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial