I have been given a massive text file (40gb) that I need to process in Netezza. I figured I could import the data in pieces (say 10000 rows at a time), run my logic against that data which filters out a lot of rows and keep that filtered data in a table that I can append to. I've tried the Import Wizard but I don't see any way of limiting the number of rows. I've also tried the following:
CALL DROP_IF_EXISTS('IMPORTED_DATA');
CREATE TABLE IMPORTED_DATA
(
MEMBER_ID VARCHAR (11),
AGE_BUCKET VARCHAR (7),
ZIPCODE VARCHAR (5),
CLM_NBR VARCHAR (10),
DTL_LN_NBR VARCHAR (3),
FST_SRVC_DT DATE,
LST_SRVC_DT DATE,
)
insert into imported_data
select * from
external 'Q:\Dept\\data\\folder\\huge_file.txt'
USING(
DELIMITER '|'
MAXROWS 10
REMOTESOURCE 'ODBC'
DATESTYLE 'DMONY'
MAXERRORS 200
Y2BASE 2000
ENCODING 'internal'
);
This might be working but it's taking forever so I'm wondering if the maxrows is working. Can I do something like this with the insert statement:
insert into imported_data
select * from
external 'Q:\Dept\\data\\folder\\huge_file.txt'
USING(
DELIMITER '|'
MAXROWS 10
REMOTESOURCE 'ODBC'
DATESTYLE 'DMONY'
MAXERRORS 200
Y2BASE 2000
ENCODING 'internal'
) limit 10;
Or if anyone can point me to other ways to read this data in that would be very helpful. Thanks
Our community of experts have been thoroughly vetted for their expertise and industry experience.
The Most Valuable Expert award recognizes technology experts who passionately share their knowledge with the community, demonstrate the core values of this platform, and go the extra mile in all aspects of their contributions. This award is based off of nominations by EE users and experts. Multiple MVEs may be awarded each year.
The Distinguished Expert awards are presented to the top veteran and rookie experts to earn the most points in the top 50 topics.