Solved

Importing very large 1.5Gb xml file into Access

Posted on 2013-05-29
12
363 Views
Last Modified: 2013-06-05
Hi All,

I posted a very similar question last week which @aikimark very helpfully solved, so I am hoping this should be easy for someone...  have tried to modify but unfortunately just am not sure what I'm doing!

I have daily 1-2Gb xml files which need to be processed, the following being the first 20,000 chars of a random one:

<?xml version="1.0"?><Extract><Record><EA>9780029314906</EA><I3>9780029314906</I3><TP>The</TP><TI>Theory of the Novel</TI><AUS><AU>Stevick, Philip</AU></AUS><BCS><BC>DSB</BC><BC>DSK</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19671001</PD><PA>452 pages, black &amp; white illustrations</PA><NP>452</NP><RP>14.99</RP><RI>14.99</RI><RE>14.99</RE><DI>216 x 140 x 25</DI><PU>Simon &amp; Schuster</PU><YP>1967</YP><RSS><RS RC="P">Professional &amp; Vocational</RS><RS RC="UP">Postgraduate, Research &amp; Scholarly</RS><RS RC="UU">Undergraduate</RS></RSS><IU>black &amp; white illustrations</IU><RF>F</RF><WE>571</WE><SG>0</SG></Record><Record><EA>9780065017038</EA><I3>9780065017038</I3><TI>Basic Concepts of Chemistry, the Cell and Tissues</TI><AUS><AU>Bastian, Glenn</AU></AUS><BCS><BC>MFCC</BC><BC>MFCH</BC><BC>PNN</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19931206</PD><PA>144 pages, illustrations, glossary</PA><NP>144</NP><RP>36.99</RP><RI>36.99</RI><RE>3
6.99</RE><DI>289 x 203 x 8</DI><PU>HarperCollins College</PU><YP>1993</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>illustrations, glossary</IU><RF>F</RF><WE>392</WE><SG>1</SG></Record><Record><EA>9780065017076</EA><I3>9780065017076</I3><TI>Cardiovascular System</TI><AUS><AU>Bastian, Glenn</AU></AUS><BCS><BC>MFC</BC><BC>MFG</BC><BC>MJD</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19931230</PD><PA>208 pages, illustrations, glossary</PA><NP>208</NP><RP>34.99</RP><RI>34.99</RI><RE>34.99</RE><DI>290 x 205 x 13</DI><PU>HarperCollins College</PU><YP>1993</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>illustrations, glossary</IU><RF>F</RF><WE>550</WE><SG>1</SG></Record><Record><EA>9780065017090</EA><I3>9780065017090</I3><TP>The</TP><TI>Respiratory System</TI><AUS><AU>Bastian, Glenn</AU></AUS><BCS><BC>MFC</BC><BC>MFG</BC><BC>MJL</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19940801</PD><PA>112 pages, illustrations, gloss
ary</PA><NP>112</NP><RP>28.99</RP><RI>28.99</RI><RE>28.99</RE><DI>290 x 206 x 7</DI><PU>HarperCollins College</PU><YP>1994</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>illustrations, glossary</IU><RF>F</RF><WE>310</WE><SG>1</SG></Record><Record><EA>9780080359410</EA><I3>9780080359410</I3><TI>Enzymes in Synthetic Organic Chemistry</TI><AUS><AU>Wong, C.H.</AU><AU>Whitesides, G. M.</AU></AUS><BCS><BC>PNN</BC><BC>PSBZ</BC></BCS><AV>MD</AV><BI>Hardback</BI><CO>United Kingdom</CO><MP>No</MP><PD>19940617</PD><PA>392 pages, 1</PA><NP>392</NP><RP>34.99</RP><RI>34.99</RI><RE>34.99</RE><DI>229 x 152 x 20</DI><PU>Elsevier Science &amp; Technology</PU><YP>1994</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><SR>Tetrahedron Organic Chemistry</SR><IU>1</IU><DE>Covering the recent development in enzymatic organic synthesis, this text focuses on the use of isolated enzymes. It includes a discussion of the the characteristics of enzymes as catalysts and different types of chemical transformation
s.</DE><RF>F</RF><WE>700</WE><SG>0</SG></Record><Record><EA>9780080425757</EA><I3>9780080425757</I3><TI>Contemporary Issues in Commercial Policy</TI><EDS><ED>Kreinin, Mordechai E.</ED></EDS><BCS><BC>KCLF</BC><BC>KCLT</BC><BC>KJK</BC></BCS><AV>MD</AV><BI>Hardback</BI><CO>United Kingdom</CO><MP>No</MP><PD>19951124</PD><PA>240 pages, Illustrations</PA><NP>240</NP><RP>51.95</RP><RI>51.95</RI><RE>51.95</RE><DI>234 x 156 x 16</DI><PU>Emerald Group Publishing Limited</PU><YP>1995</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><SR>Series in International Business &amp; Economics</SR><IU>Illustrations</IU><DE>Addresses a range of topical issues in commercial policy. This book examines bilateral trade agreements, foreign direct investment strategies, regional and global integration; trade reforms; privatization; capital flows; portfolio diversification, and international and technological competitiveness.</DE><RF>F</RF><WE>516</WE><SG>0</SG></Record><Record><EA>9780120455911</EA><I3>9780120455911</I3><TP>
An</TP><TI>Introduction to Human Evolutionary Anatomy</TI><AUS><AU>Aiello, Leslie C.</AU><AU>Dean, Christopher</AU></AUS><ILS><IL>Cameron, Joanna</IL></ILS><BCS><BC>HD</BC><BC>PSXM</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19900911</PD><PA>608 pages</PA><NP>608</NP><RP>61.99</RP><RI>61.99</RI><RE>61.99</RE><DI>254 x 175 x 32</DI><PU>Elsevier Science Publishing Co Inc</PU><YP>1990</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><DE>An anthropologist and an anatomist have combined their skills in this book to provide the essentials of anatomy and the means to apply these to investigations into hominid form and function.</DE><RF>F</RF><WE>1196</WE><SG>0</SG></Record><Record><EA>9780121821869</EA><I3>9780121821869</I3><TI>Cumulative Subject Index: Volumes 263, 264, 266-289</TI><BCS><BC>GBC</BC><BC>PSBZ</BC></BCS><AV>MD</AV><BI>Hardback</BI><CO>United States</CO><MP>No</MP><PD>19980319</PD><PA>345 pages</PA><NP>345</NP><RP>118.00</RP><RI>118.00</RI><RE>118.00</RE><DI>
229 x 152 x 28</DI><PU>Elsevier Science Publishing Co Inc</PU><YP>1998</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><SR>Methods in Enzymology</SR><DE>Includes a cumulative subject index for "Methods in Enzymology" that covers volumes 263, 264, and 266-289. This volume contains subject index and the contents of each volume indexed and a contributor's index.</DE><RF>F</RF><WE>821</WE><SG>0</SG></Record><Record><EA>9780124019409</EA><I3>9780124019409</I3><TI>Lasers and Optical Fibers in Medicine</TI><AUS><AU>Katzir, Abraham</AU></AUS><BCS><BC>MBG</BC><BC>MQW</BC><BC>TTB</BC></BCS><AV>MD</AV><BI>Hardback</BI><CO>United States</CO><MP>No</MP><PD>19931117</PD><PA>317 pages, glossary, bibliography, index</PA><NP>317</NP><RP>145.00</RP><RI>145.00</RI><RE>145.00</RE><DI>229 x 152 x 22</DI><PU>Elsevier Science Publishing Co Inc</PU><YP>1993</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><SR>Physical Techniques in Biology and Medicine</SR><IU>glossary, bibliography, index</IU><DE>Presents a 
comprehensive examination of lasers and optical fibres in an hierarchical, three-tier system. Each chapter is divided into three basic sections: the fundamentals, the principles and the advances sections.</DE><RF>F</RF><WE>669</WE><SG>0</SG></Record><Record><EA>9780124912526</EA><I3>9780124912526</I3><TI>Trace Elements in Human and Animal Nutrition</TI><AUS><AU>Underwood, E.J.</AU></AUS><EDS><ED>Mertz, Walter</ED></EDS><BCS><BC>PSVW7</BC></BCS><AV>MD</AV><BI>Hardback</BI><CO>United States</CO><MP>No</MP><PD>19860421</PD><PA>499 pages, illustrations</PA><NP>499</NP><RP>150.00</RP><RI>150.00</RI><RE>150.00</RE><DI>229 x 152 x 33</DI><EI>5 Rev ed</EI><PU>Elsevier Science Publishing Co Inc</PU><YP>1986</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><SR>Trace Elements in Human &amp; Animal Nutrition S.</SR><IU>illustrations</IU><DE>Includes chapters on such subjects as 'Methodology of Trace Element Research' and 'Quality Assurance for Trace Element Analysis'. This title provides the discussion of lit
hium and aluminum.</DE><RF>F</RF><WE>826</WE><SG>0</SG></Record><Record><EA>9780125870559</EA><I3>9780125870559</I3><TI>Allelopathy</TI><AUS><AU>Rice, Elroy L.</AU></AUS><BCS><BC>PSTS</BC><BC>RNC</BC></BCS><AV>MD</AV><BI>Hardback</BI><CO>United States</CO><MP>No</MP><PD>19840301</PD><PA>368 pages, illustrations</PA><NP>368</NP><RP>210.00</RP><RI>210.00</RI><RE>210.00</RE><DI>229 x 152 x 28</DI><EI>2 Rev ed</EI><PU>Elsevier Science Publishing Co Inc</PU><YP>1984</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><SR>Physiological Ecology</SR><IU>illustrations</IU><RF>F</RF><WE>880</WE><SG>0</SG></Record><Record><EA>9780130302069</EA><I3>9780130302069</I3><TP>A</TP><TI>European Introduction to Financial Accounting</TI><AUS><AU>Alexander, David</AU><AU>Nobes, Christopher</AU></AUS><BCS><BC>1QFE</BC><BC>KFCF</BC><BC>LNP</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19940101</PD><PA>480 pages</PA><NP>480</NP><RP>56.99</RP><RI>56.99</RI><RE>56.99</RE><DI>233 x 172 x 27</DI><P
U>Pearson Education (US)</PU><YP>1994</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><DE>Providing a first course in financial accounting in the context of the rules of the EC and the International Accounting Standards Committee, this text covers the principles of financial accounting. It discusses important international aspects and presents significant international comparisons.</DE><RF>F</RF><WE>818</WE><SG>1</SG></Record><Record><EA>9780130302229</EA><I3>9780130302229</I3><TP>The</TP><TI>Essence of Change</TI><AUS><AU>Clarke, Liz</AU></AUS><BCS><BC>KJU</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19940301</PD><PA>216 pages, Illustrations</PA><NP>216</NP><RP>47.99</RP><RI>47.99</RI><RE>47.99</RE><DI>234 x 161 x 13</DI><PU>Pearson Education (US)</PU><YP>1994</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>Illustrations</IU><DE>This work is part of a series which analyzes approaches to key topics in good business practice. Arguing that change in bu
siness today is inevitable, especially in response to environmental pressures, this book is an introduction to managing change effectively for competitive advantage.</DE><RF>F</RF><WE>356</WE><SG>1</SG></Record><Record><EA>9780131659452</EA><I3>9780131659452</I3><TI>Computer Science</TI><ST>A Modern Introduction</ST><AUS><AU>Goldschlager, Les</AU><AU>Lister, A.M.</AU></AUS><BCS><BC>UY</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19880101</PD><PA>344 pages, black &amp; white illustrations</PA><NP>344</NP><RP>64.99</RP><RI>64.99</RI><RE>64.99</RE><DI>229 x 152 x 20</DI><EI>2 Rev ed</EI><PU>Pearson Education (US)</PU><YP>1988</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>black &amp; white illustrations</IU><DE>'Computer Science: A Modern Introduction' provides an introductory overview of the discipline of computer science, using the notion of algorithms as the unifying concept.</DE><RF>F</RF><WE>502</WE><SG>0</SG></Record><Record><EA>9780132033657</EA><I3>9780132
033657</I3><TI>Strategic Business Marketing</TI><AUS><AU>Chisnall, Peter M.</AU></AUS><BCS><BC>KJC</BC><BC>KJK</BC><BC>KJS</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19941202</PD><PA>520 pages, black &amp; white illustrations</PA><NP>520</NP><RP>53.99</RP><RI>53.99</RI><RE>53.99</RE><DI>235 x 178 x 0</DI><EI>3 Rev ed</EI><PU>Pearson Education (US)</PU><YP>1994</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>black &amp; white illustrations</IU><DE>This text offers information about the nature and techniques of marketing applied to industrial products and services, businesses and public sector undertakings. It includes new material on internationalism, the growth in regionalism, the EU, Pacific Basin countries, NAFTA and Eastern Europe.</DE><RF>F</RF><WE>1459</WE><SG>1</SG></Record><Record><EA>9780132848787</EA><I3>9780132848787</I3><TP>The</TP><TI>Essence of Effective Communication</TI><AUS><AU>Ludlow, Ron</AU><AU>Panton, Fergus</AU></AUS><BCS><BC>KJM</BC><BC>K
JP</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19920101</PD><PA>160 pages, black &amp; white illustrations</PA><NP>160</NP><RP>54.99</RP><RI>54.99</RI><RE>54.99</RE><DI>233 x 156 x 10</DI><PU>Pearson Education (US)</PU><YP>1992</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>black &amp; white illustrations</IU><DE>Managing relationships with others is a major part of a managers' job in an organization. Discussing the importance of effective communication, this work also gives advice on developing your own communication skills. An ELBS/LPBB edition is available.</DE><RF>F</RF><WE>284</WE><SG>1</SG></Record><Record><EA>9780132853705</EA><I3>9780132853705</I3><TP>The</TP><TI>Essence of Women in Management</TI><AUS><AU>Vinnicombe, Susan</AU></AUS><BCS><BC>JFSJ1</BC><BC>KJM</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19950719</PD><PA>176 pages, illustrations</PA><NP>176</NP><RP>41.99</RP><RI>41.99</RI><RE>41.99</RE><DI>234 x 156 x 11<
/DI><PU>Pearson Education (US)</PU><YP>1995</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>illustrations</IU><DE>Examines the rationale for expanding opportunities for women to achieve a more equal representation in management positions. The book explores research on the status of women managers in Europe, analyses female roles in organisations, looks at the balance between work and family and illuminates the underlying issues of power and powerlessness.</DE><RF>F</RF><WE>282</WE><SG>1</SG></Record><Record><EA>9780135434062</EA><I3>9780135434062</I3><TI>Making Management Decisions</TI><AUS><AU>Cooke, Steve</AU><AU>Slack, Nigel</AU></AUS><BCS><BC>KJM</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19910401</PD><PA>352 pages, black &amp; white illustrations</PA><NP>352</NP><RP>73.99</RP><RI>73.99</RI><RE>73.99</RE><DI>234 x 172 x 20</DI><EI>2 Rev ed</EI><PU>Pearson Education (US)</PU><YP>1991</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>black &amp
; white illustrations</IU><DE>A long-time market leader in this discipline, Slack has set the standards in Operations Management which other textbooks seek to emulate. Expert authorship, an engaging writing style, and an interesting collection of cases combine to communicate the importance of managing operations and processes within a successful organisation.</DE><RF>F</RF><WE>608</WE><SG>1</SG></Record><Record><EA>9780138968380</EA><I3>9780138968380</I3><TI>Computer Hardware and Data Communications</TI><AUS><AU>Goupille, P.A.</AU></AUS><BCS><BC>TJFC</BC><BC>UKS</BC><BC>UYF</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19930201</PD><PA>352 pages</PA><NP>352</NP><RP>49.99</RP><RI>49.99</RI><RE>49.99</RE><DI>235 x 173 x 0</DI><PU>Pearson Education (US)</PU><YP>1993</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><DE>This introductory textbook covers basic logic and architecture and relates these to real computer designs. Taking a unique approach the book treats a wide 
range of peripheral devices and covers data communications and local area networks.</DE><RF>F</RF><WE>988</WE><SG>0</SG></Record><Record><EA>9780139503955</EA><I3>9780139503955</I3><TI>Water Supply Byelaws Guide</TI><AUS><AU>White, S.F.</AU><AU>Mays, G.D.</AU></AUS><BCS><BC>RNH</BC><BC>TQS</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19900501</PD><PA>264 pages</PA><NP>264</NP><RP>41.99</RP><RI>41.99</RI><RE>41.99</RE><DI>194 x 130 x 0</DI><EI>2 Rev ed</EI><PU>Pearson Education (US)</PU><YP>1990</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><RF>F</RF><WE>741</WE><SG>1</SG></Record><Record><EA>9780201416794</EA><I3>9780201416794</I3><TI>Programming with Top Speed Modula-2</TI><AUS><AU>Cornelius, Barry John</AU></AUS><BCS><BC>UMX</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19910701</PD><PA>624 pages, Ill.</PA><NP>624</NP><RP>41.99</RP><RI>41.99</RI><RE>41.99</RE><DI>230 x 172 x 31</DI><PU>Pearson Education (US)</PU><YP>1991</YP><RSS><R
S RC="P">Professional &amp; Vocational</RS></RSS><IU>Ill.</IU><DE>This text is an introduction to the Modula-2 language, emphasising good programming style and the importance of defining modules for abstract data types and abstract state machines.</DE><RF>F</RF><WE>1751</WE><SG>1</SG></Record><Record><EA>9780201416961</EA><I3>9780201416961</I3><TI>CASE Method</TI><ST>Entity Relationship Modelling</ST><PTS><PT PartNumber="1">Entity Relationship Modelling</PT></PTS><AUS><AU>Barker, Richard</AU><AU>Longman, Cliff</AU></AUS><BCS><BC>UGK</BC><BC>UMZ</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19900101</PD><PA>240 pages, black &amp; white illustrations</PA><NP>240</NP><RP>65.99</RP><RI>65.99</RI><RE>65.99</RE><DI>249 x 189 x 15</DI><PU>Pearson Education (US)</PU><YP>1990</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>black &amp; white illustrations</IU><RF>F</RF><WE>492</WE><SG>1</SG></Record><Record><EA>9780202260860</EA><I3>9780202260860</I3><TP>The</TP><TI>Other 
23 Hours</TI><ST>Child Care Work with Emotionally Disturbed Children in a Therapeutic Milieu</ST><AUS><AU>Trieschman, Albert E.</AU><AU>Whittaker, James</AU><AU>Brendtro, Larry K.</AU></AUS><BCS><BC>MJW</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19621231</PD><PA>258 pages, black &amp; white illustrations</PA><NP>258</NP><RP>27.50</RP><RI>27.50</RI><RE>27.50</RE><DI>216 x 140 x 14</DI><PU>Transaction Publishers</PU><YP>1962</YP><RSS><RS RC="P">Professional &amp; Vocational</RS><RS RC="UP">Postgraduate, Research &amp; Scholarly</RS><RS RC="UU">Undergraduate</RS></RSS><IU>black &amp; white illustrations</IU><RF>F</RF><WE>332</WE><SG>0</SG></Record><Record><EA>9780273625216</EA><I3>9780273625216</I3><TP>The</TP><TI>Investors Chronicle A-Z of Investment</TI><ST>Essential Guide to Tools, Terms and Techniques</ST><AUS><AU>Sefton, Caroline</AU></AUS><BCS><BC>KFFM</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United Kingdom</CO><MP>No</MP><PD>19961028</PD><PA>272 pages, illustrations</PA
><NP>272</NP><RP>19.99</RP><RI>19.99</RI><RE>19.99</RE><DI>216 x 139 x 10</DI><PU>Pearson Education Limited</PU><YP>1996</YP><RSS><RS RC="P">Professional &amp; Vocational</RS></RSS><IU>illustrations</IU><DE>Breaks investment terms, tools and techniques into easily digestible mouthfuls, providing strong, practical information, which will enable you to make sound investment decisions.</DE><RF>F</RF><WE>358</WE><SG>0</SG></Record><Record><EA>9780300105261</EA><I3>9780300105261</I3><TP>The</TP><TI>Psychoanalytic Theory of Greek Tragedy</TI><AUS><AU>Alford, C., Fred</AU></AUS><BCS><BC>DD</BC><BC>JM</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19921118</PD><PA>232 pages, black &amp; white illustrations</PA><NP>232</NP><RP>20.50</RP><RI>20.50</RI><RE>20.50</RE><DI>229 x 152 x 13</DI><PU>Yale University Press</PU><YP>1992</YP><RSS><RS RC="G">General (US: Trade)</RS></RSS><IU>black &amp; white illustrations</IU><RF>F</RF><WE>347</WE><SG>0</SG></Record><Record><EA>9780300105292</EA><I3>9
780300105292</I3><TI>Beethoven in German Politics, 1870-1989</TI><AUS><AU>Dennis, David, B.</AU></AUS><BCS><BC>1DFG</BC><BC>HBJD</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19960417</PD><PA>264 pages, black &amp; white illustrations</PA><NP>264</NP><RP>16.95</RP><RI>16.95</RI><RE>16.95</RE><DI>229 x 152 x 15</DI><PU>Yale University Press</PU><YP>1996</YP><RSS><RS RC="G">General (US: Trade)</RS></RSS><IU>black &amp; white illustrations</IU><RF>F</RF><WE>392</WE><SG>0</SG></Record><Record><EA>9780300105315</EA><I3>9780300105315</I3><TI>Journey of Purpose</TI><ST>Reflections on the Presidency, Multiculturalism, and Third Parties</ST><AUS><AU>Tsongas, Paul E.</AU></AUS><BCS><BC>1KBB</BC><BC>JPH</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19960214</PD><PA>124 pages, black &amp; white illustrations</PA><NP>124</NP><RP>14.50</RP><RI>14.50</RI><RE>14.50</RE><DI>203 x 127 x 7</DI><PU>Yale University Press</PU><YP>1996</YP><RSS><RS RC="P">Professional &am
p; Vocational</RS><RS RC="UP">Postgraduate, Research &amp; Scholarly</RS><RS RC="UU">Undergraduate</RS></RSS><SR>Castle Lectures Series in Ethics, Politics &amp; Economics</SR><IU>black &amp; white illustrations</IU><RF>F</RF><WE>143</WE><SG>0</SG></Record><Record><EA>9780300105407</EA><I3>9780300105407</I3><TP>The</TP><TI>Fate of Marxism in Russia</TI><AUS><AU>Yakovlev, Alexander</AU></AUS><TRS><TR>Fitzpatrick, Catherine, A.</TR></TRS><BCS><BC>JP</BC></BCS><AV>MD</AV><BI>Paperback</BI><CO>United States</CO><MP>No</MP><PD>19931001</PD><PA>276 pages, black &

Open in new window


The format is basically:

<?xml version="1.0"?>
    <Extract>
        <Record>
            <XX>Variable text/numerical values, up to many 1000 chars long</XX>
            <YY>...</YY>
            <ZZ>...</ZZ>
            <AA>...</AA>
        </Record>
        <Record>
            <XX>...</XX>
            <YY>...</YY>
            <ZZ>...</ZZ>
            <AA>...</AA>
        </Record>
    </Extract>

Open in new window


So pretty straightforward. Ideally it would be best to get this into a DOMDocument and use the xml structure to parse each <Record>, however it's so structered that even using Split() would suffice... IF I could import the file!

Using DOMDocument.Load sFilePath takes about 15 minutes, however then DOMDocument.parseError returns 'Not enough storage is available to complete this operation.' (code-2147024882) and the object is not usable.

Using Input() fails, as does the two other methods I've mentioned on the previous question - this was the question last time.

Any ideas guys? Many thanks!

Katerina.
0
Comment
Question by:katerina-p
  • 5
  • 5
12 Comments
 
LVL 77

Expert Comment

by:peter57r
ID: 39204382
"I have daily 1-2Gb xml files which need to be processed"

I take it you are aware that Access has a limit of 2gb for the size of a mdb/accdb file
0
 

Author Comment

by:katerina-p
ID: 39204667
I take it you are aware that Access has a limit of 2gb for the size of a mdb/accdb file

Hi, yes, thanks though - why is partially why I can't (even if it would work) just use the 'importxml' function, as the data will need to be sent to multiple back ends.
0
 
LVL 45

Accepted Solution

by:
aikimark earned 500 total points
ID: 39208487
The code in the prior related question can be tweaked to  parse the large XML file into chunks that can be processed by the MSXML2 object.  The following isn't the cleanest code I've posted, but it is an example of how you might process the XML with buffered I/O.

Notes:
* You will need to add a reference to MSXML2
* You will need to change the path to your copy of the XML file
* You will probably want to run some more performance tests with different buffer sizes
* I encountered problems when I tested the code. I had to tweak the XML a few times. Some of the tags were being split in the file you posted.  I think we had a similar problem with byte code order in the prior question.

Sub Q_28133857()
    Dim intFN As Integer
    Dim strBuffer As String
    Dim strTrailing As String
    Const cRec As String = "<Record>"
    Const cEndRec As String = "</Record>"
    Const cBufSize As Long = 5000
    Dim oXML As New MSXML2.DOMDocument
    Dim oRecordNodes As MSXML2.IXMLDOMNodeList
    Dim oRec As MSXML2.IXMLDOMNode
    Dim oNode As MSXML2.IXMLDOMNode
    Dim vItem As Variant
    Dim lngRecNum As Long
    
    intFN = FreeFile
    Open "c:\users\aikimark\downloads\Q_28141594.xml" For Binary As #intFN
    strBuffer = Space(cBufSize)
    Get #intFN, , strBuffer
    'remove XML header and <Extract> tag
    strBuffer = Mid$(strBuffer, InStr(strBuffer, cRec))
    Do
        strTrailing = Mid$(strBuffer, InStrRev(strBuffer, cEndRec) + Len(cEndRec))
        strBuffer = Left$(strBuffer, InStrRev(strBuffer, cEndRec) + Len(cEndRec) - 1)
        oXML.loadXML "<Extract>" & strBuffer & "</Extract>"
        Set oRecordNodes = oXML.getElementsByTagName("Record")
        For Each oRec In oRecordNodes
            lngRecNum = lngRecNum + 1
            Debug.Print "* * *", lngRecNum
            For Each oNode In oRec.childNodes
                If oNode.childNodes.Length > 1 Then
                    Debug.Print "[" & oNode.nodeName & "]", oNode.Text
                Else
                    Debug.Print oNode.nodeName, oNode.Text
                End If
            Next
        Next
        strBuffer = Space(cBufSize)
        Get #intFN, , strBuffer
        strBuffer = strTrailing & strBuffer
    Loop Until EOF(intFN)
    'Process last buffer
    oXML.loadXML "<Extract>" & strBuffer
    Set oRecordNodes = oXML.getElementsByTagName("Record")
    For Each oRec In oRecordNodes
            lngRecNum = lngRecNum + 1
            Debug.Print "* * *", lngRecNum
            For Each oNode In oRec.childNodes
                If oNode.childNodes.Length > 1 Then
                    Debug.Print "[" & oNode.nodeName & "]", oNode.Text
                Else
                    Debug.Print oNode.nodeName, oNode.Text
                End If
            Next
    Next
    Close intFN
End Sub

Open in new window

0
 
LVL 45

Expert Comment

by:aikimark
ID: 39212057
Here's a tidied-up version of the XML file.  I truncated it to the last whole record, added a
"</Extract>" tag, and opened/saved it with XML Notepad.
Q-28141594.xml
0
 

Author Comment

by:katerina-p
ID: 39213039
Thank you very much  aikimark - will check this through on Monday; looks good though!

I've just realised that the format of the file is slightly more complex than I first wrote; each node can actually be repeated within a container so it's actually:

<?xml version="1.0"?>
    <Extract>
        <Record>
            <XX>Variable text/numerical values, up to many 1000 chars long</XX>
            <YYS>
                <YY>1</YY>
                <YY>2...</YY>
            </YYS>
            <ZZS>
                <YY>1</YY>
                <YY>2...</YY>
            </ZZS>
            <AA>...</AA>
        </Record>
        <Record>
            <XX>...</XX>
            <YY>...</YY>
            <ZZ>...</ZZ>
            <AA>...</AA>
        </Record>
    </Extract>

Open in new window


So, after "For Each oNode In oRec.childNodes" I guess I'll just have to put a "For Each oNode2 In oNode.childNodes" loop to iterate through any? I guess that's the distinction you wree making with your If [oNode.childNodes.Length > 1] conditional?
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 45

Expert Comment

by:aikimark
ID: 39213338
Yes.  You can also package the code into a routine that is passed a parent and then iterates the child nodes.  Then the count (length) of child nodes is greater than one, the routine recursively calls itself.  When looking at the output, such nodes were surrounded by square brackets.

Of course, you may need to traverse the nodes one level at a time, rather than diving down into the child nodes as you encounter them.  It really depends on what (save/ignore) decisions your code needs to make with the XML data and how your tables are defined and related.

You might want to 'play' with the cleaned-up version of the XML that I posted before tackling the large XML file.  That way, you will be sure the logic is correct and your trial runs will be much quicker.

I don't know if it will be quicker than what I'm doing, but if there is some logic to which records are eligible for saving, you might be able to use some of the XPath expressions to limit which (and how many) nodes your VBA code has to process.
0
 

Author Closing Comment

by:katerina-p
ID: 39221734
Thank you once again!
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39221799
What kind of performance are you getting with this?
What was the 'sweet spot' buffer size?

There is at least one ODBC driver that will consume XML and feed your program a recordset.  That might be worth looking at for a commercial application or high performance application.
0
 

Author Comment

by:katerina-p
ID: 39222606
What kind of performance are you getting with this?
What was the 'sweet spot' buffer size?

Tests running at the moment. Will post results.


There is at least one ODBC driver that will consume XML and feed your program a recordset.  That might be worth looking at for a commercial application or high performance application.

Will see after performance testing, however might perhaps be an idea, yes.
0
 

Author Comment

by:katerina-p
ID: 39224374
First pass (iteration only, no data/rst manipulation) of testing dump! :)

buffer of 5,000 took 3 seconds to import 5,000,000, i.e. 1,636,829 per second
buffer of 10,000 took 6 seconds to import 10,000,000, i.e. 1,797,753 per second
buffer of 15,000 took 7 seconds to import 15,000,000, i.e. 2,112,211 per second
buffer of 20,000 took 10 seconds to import 20,000,000, i.e. 1,970,747 per second
buffer of 25,000 took 13 seconds to import 25,000,000, i.e. 1,934,704 per second
buffer of 30,000 took 15 seconds to import 30,000,000, i.e. 1,950,229 per second
buffer of 35,000 took 19 seconds to import 35,000,000, i.e. 1,891,093 per second
buffer of 40,000 took 20 seconds to import 40,000,000, i.e. 1,983,727 per second
buffer of 50,000 took 26 seconds to import 50,000,000, i.e. 1,938,219 per second
buffer of 60,000 took 32 seconds to import 60,000,000, i.e. 1,848,820 per second
buffer of 70,000 took 37 seconds to import 70,000,000, i.e. 1,889,897 per second
buffer of 80,000 took 41 seconds to import 80,000,000, i.e. 1,930,618 per second
buffer of 90,000 took 48 seconds to import 90,000,000, i.e. 1,882,045 per second
buffer of 100,000 took 5 seconds to import 10,000,000, i.e. 1,960,184 per second
buffer of 150,000 took 8 seconds to import 15,000,000, i.e. 1,921,922 per second
buffer of 200,000 took 10 seconds to import 20,000,000, i.e. 1,911,875 per second
buffer of 250,000 took 14 seconds to import 25,000,000, i.e. 1,771,872 per second
buffer of 300,000 took 17 seconds to import 30,000,000, i.e. 1,749,431 per second
buffer of 350,000 took 20 seconds to import 35,000,000, i.e. 1,789,852 per second
buffer of 400,000 took 22 seconds to import 40,000,000, i.e. 1,792,717 per second
buffer of 500,000 took 29 seconds to import 50,000,000, i.e. 1,753,425 per second
buffer of 600,000 took 35 seconds to import 60,000,000, i.e. 1,736,378 per second
buffer of 700,000 took 41 seconds to import 70,000,000, i.e. 1,715,161 per second
buffer of 800,000 took 50 seconds to import 80,000,000, i.e. 1,584,158 per second
buffer of 900,000 took 53 seconds to import 90,000,000, i.e. 1,693,371 per second
0
 
LVL 45

Expert Comment

by:aikimark
ID: 39224526
Looks like if you go with the 15K buffer size, you will only need about 11 minutes to read the 1.5GB file.  If you have an SSD or some other high performance storage device (and channel) you might be able to read the XML file faster.

If this isn't satisfactory, you might also look at Powershell to process the XML. You would use a Get-Content command with a -readcount 15000 parameter to read the file in chunks.

You might also be able to split the XML into smaller pieces with a utility like XMLSplit
http://www.xponentsoftware.com/XmlSplit.aspx
and then spawn multiple jobs/processes on a multi-core machine to process the XML in parallel.

If you are still not getting the performance you need, you might consider using the SAX2  library and bypass the DOM.  Not for the squeemish.
http://msdn.microsoft.com/en-us/library/ms994312.aspx
http://msdn.microsoft.com/en-us/library/aa923272.aspx
http://msdn.microsoft.com/en-us/library/ms994347.aspx  (VB example)

=============
FYI: link to MS XML-related products page
http://msdn.microsoft.com/en-us/library/aa286549.aspx
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

A simple tool to export all objects of two Access files as text and compare it with Meld, a free diff tool.
If you need to start windows update installation remotely or as a scheduled task you will find this very helpful.
In Microsoft Access, learn different ways of passing a string value within a string argument. Also learn what a “Type Mis-match” error is about.
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now