Link to home
Start Free TrialLog in
Avatar of Member_2_7966113
Member_2_7966113

asked on

Python Code Challenge to Match Schema

Hello Experts I have a Python task which involves Matching Schemas

Basically I need to modify the columns in the dataset df to match the provided schema and return the first three rows of the dataset

The steps that are to be are completed are as follows:

Write a function matchSchema(df) that achieves the following:
1. Converts column active to type Boolean
2. Creates the column ‘price’ by converting the column ‘counts’ to type Double and dividing by 100
3. Drops the ‘counts’ column
4. Returns the first three rows of the resulting Dataframe

I have attempted the task with the following script:

import numpy as np
import pandas as pd

df = pd.read_csv('D:\matchSchema.csv')

def matchSchema(df):
    df['active'] = df['active'].astype('bool')
    df['price'] = df['cents']/100
    df.drop('cents', axis=1, inplace=True)
    return df.head(3)

matchSchema(df)

However I'm failing to get the following set of results
Return Array of correct size
Return Array of rows with correnct number of rows
Avatar of noci
noci

Almost there:

import numpy as np
import pandas as pd

df = pd.read_csv('D:\matchSchema.csv')

def matchSchema(df):
    df['active'] = df['active'].astype('bool')
    df['price'] = df['counts']/100
    df.drop('counts', axis=1, inplace=True)
    return df,df.head(3)

(dataset, sample) =  matchSchema(df)

print(dataset)
print(sample)

Open in new window


You had all in place, just missing the return of the right items...
To prove added some print statements.

then (dataset, sample) = .... is needed to fetch the return values.
ASKER CERTIFIED SOLUTION
Avatar of Member_2_7966113
Member_2_7966113

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Member_2_7966113

ASKER

Hi noci,

When i run your code I get the following error:


KeyError: 'counts'
For some silly reason the signon button doesn't work to download from databricks.
(i have a severe blockage for various data trackers, possibly the code behind the button is from marketo, facebook or one of the 4 other trackers they have )

So i can't download it.

Anyway the question says Boolean..., your code had 'bool', so maybe try 'Boolean'  instead:
Like in:
import numpy as np
import pandas as pd

df = pd.read_csv('D:\matchSchema.csv')

def matchSchema(df):
    df['active'] = df['active'].astype('Boolean')
    df['price'] = df['counts']/100
    df.drop('counts', axis=1, inplace=True)
    return df,df.head(3)

(dataset, sample) =  matchSchema(df)

print(dataset)
print(sample)

Open in new window


I can't determine what types are "used"  or not.  I could tell how to return the data in Python though. (imho the real problem in your code).
The Question state counts, your code had cents..., i used a CSV that had counts in it.. So it balked on the 'cents'