I have been given a massive text file (40gb) that I need to process in Netezza. I figured I could import the data in pieces (say 10000 rows at a time), run my logic against that data which filters out a lot of rows and keep that filtered data in a table that I can append to. I've tried the Import Wizard but I don't see any way of limiting the number of rows. I've also tried the following:
CALL DROP_IF_EXISTS('IMPORTED_DATA');
CREATE TABLE IMPORTED_DATA
(
MEMBER_ID VARCHAR (11),
AGE_BUCKET VARCHAR (7),
ZIPCODE VARCHAR (5),
CLM_NBR VARCHAR (10),
DTL_LN_NBR VARCHAR (3),
FST_SRVC_DT DATE,
LST_SRVC_DT DATE,
)
insert into imported_data
select * from
external 'Q:\Dept\\data\\folder\\huge_file.txt'
USING(
DELIMITER '|'
MAXROWS 10
REMOTESOURCE 'ODBC'
DATESTYLE 'DMONY'
MAXERRORS 200
Y2BASE 2000
ENCODING 'internal'
);
This might be working but it's taking forever so I'm wondering if the maxrows is working. Can I do something like this with the insert statement:
insert into imported_data
select * from
external 'Q:\Dept\\data\\folder\\huge_file.txt'
USING(
DELIMITER '|'
MAXROWS 10
REMOTESOURCE 'ODBC'
DATESTYLE 'DMONY'
MAXERRORS 200
Y2BASE 2000
ENCODING 'internal'
) limit 10;
Or if anyone can point me to other ways to read this data in that would be very helpful. Thanks
ASKER