Link to home
Start Free TrialLog in
Avatar of Fril
Fril

asked on

data import - html tables -> access

Hello everyone,

Firstly, apologies if this has been asked before, I'm guessing it may be a common question, but i've searched around and can't find any specific answers..

I have an extremely large HTML document (around 1000 tables) that i would like to import into an existing table into an access database, but i'm having a few problems...[unfortunately I have to use MS Access.]

the html table lists its data vertically.. so the table headings are horizontal to the data.. eg.

<html tables>
primary key:   00001
address:        blah
postcode:      blah001

as opposed to the access database, where the table headings are spread horizontally...

<access database>
primary key  address    postcode
00001           blah         blah01

(apologies if the formatting messes up there..)

i've painfully changed all of the table headings inside the html doc, so they are an exact match of the access database.. however when i go to import, the database, it only imports one HTML table, and lists all of the data in the first column of the access table.  to throw a spanner in the works, some of the columns..[well rows] in the HTML document will not need to be imported, as the data is not necessary..

Does anyone know any ways around this? I'm guessing one of the following may be applicable;

1. to flip/pivot the HTML tables into the same format, before import? (probably a cpu heavy process)
2. convert the html doc to xml somehow? and then import
3. make access see how the data should be going in, and make it do some thinking and processing of its own? (could be dangerous!)
4. throw the machine out of the window and deny all knowledge of any 'databases'?

unfortunately, the ppl who gave me the html doc are unable to provide it to me in xml form :( i've begged

it would be great if an answer didn't include me having to script to much.. i'm familiar with oracle dbs, pl/sql .. but access isn't my strong point

If anyone could point me to any links where people have had similar problems, or has any specific advice, it would be greatly appreciated!

Thanks in advance!!

Phil
Avatar of Fril
Fril

ASKER

to clarify.. when importing the html table to access.. the following layout occurs..

primary key        address               postcode
<existing data>  <existing data>    <existing data>
0001
blah
blah01


Avatar of peter57r
Hi Fril,
Have you tried importing the html file into XL first and saving as a spreadsheet which Access will handle easily.

Pete
Avatar of Fril

ASKER

Hi pete,

i never thought of that.. thanks

i've just tried it however, and the data is still the 'wrong way round'  headings on the left, data on the right. instead of headings above, data below

You know of any way to flip/rotate/pivot this information automatically? at least it should be easier for me to manipulate in excel.

Thanks for your quick response.

Phil
Fril,
How would you tell (a program) where each record ended?

Pete
another way to do this is to save the content of the page as a text file, parse and saved recordset to the
access table.

sample

https://www.experts-exchange.com/questions/21312509/To-Export-URL-Data-Into-Table.html

if  you are interested post the URL here, and what information you want extracted and place in a table.

After you import into Excel, then import into Access, you can use a crosstab query to transform the data.  The crosstab query wizard should help you do that.  You can then create a make table query to take the output of the crosstab query to a new table.

I've sent an email to jjafferr at yahoo dot com to look at your question, he likes these problems.
I would create a table with an autonumber and one text field then just pull the whole html file in line by line as if it were a text file.

Then after it is in parse the table. This is the outline of a function to parse it just put it into a module and edit it to work for you. It should work quickly and efficiently. I've had to do this kind of thing before.
-------------------------------------------------------------------
Public Function ParseTable()
'Declaring variables
Dim DB As Database
Dim InRS As Recordset
Dim OutRS As Recordset

Dim SQL As String

Dim PrimaryKey As String
Dim Address As String
Dim PostCode As String
'Set up to open the input table.
SQL = "SELECT Index_Num, Text_Field " & _
    "FROM Import_Table " & _
    "ORDER BY Index_Num "

Set DB = CurrentDb()
Set InRS = DB.OpenRecordset(SQL)                'actually open the import table
Set OutRS = DB.OpenRecordset("DataTable")       'open the the output table
PrimaryKey = "BLANK1"                           'Set the primary key variable to something
                                                'so the first pass gets skipped writing output

If InRS.EOF = False Then InRS.MoveFirst         'move to the first record for input

Do Until InRS.EOF = True
    'On each pass it will check if it is seeing the primary key and then it knows _
     to write the data in the variables and clear them for the next pass.
    If InStr(1, InRS!text_field, "primary key:", vbTextCompare) <> 0 _
        And PrimaryKey <> "BLANK1" Then
        With OutRS
            .AddNew
            !Primary_Key = PrimaryKey
            !Address = Address
            !Post_Code = PostCode
            .Update
        End With
        PrimaryKey = ""
        Address = ""
        PostCode = ""
    End If
    'If the input field has "primary key:" in the text then assign it to the variable _
     and move onto the next record. and so on and so forth.
    If InStr(1, InRS!text_field, "primary key:", vbTextCompare) <> 0 Then
        PrimaryKey = Mid(InRS!text_field, InStr(1, InRS!text_field, "primary key:", vbTextCompare), 500)
        InRS.MoveNext
    End If
    If InStr(1, InRS!text_field, "Address:", vbTextCompare) <> 0 Then
        PrimaryKey = Mid(InRS!text_field, InStr(1, InRS!text_field, "Address:", vbTextCompare), 500)
        InRS.MoveNext
    End If
    If InStr(1, InRS!text_field, "PostCode:", vbTextCompare) <> 0 Then
        PrimaryKey = Mid(InRS!text_field, InStr(1, InRS!text_field, "PostCode:", vbTextCompare), 500)
        InRS.MoveNext
    End If
    InRS.MoveNext
Loop

Set InRS = Nothing
Set OutRS = Nothing
Set DB = Nothing

End Function
Avatar of Fril

ASKER

thanks for your responses, sorry i've been away for a few hrs...

pete: the data is seperated within the excel document by 2 clear rows.. (<br><br> in the html)

capricorn: thanks for your suggestion, unfortunately the data is 'sensitive' so i can't really publish it :(

thenelson: seems like a nice solution, and it is able to import one [html] table correctly, however, when i attempt to pull in loads of [html]tables, it likes to change lots of my data to numbers.. and only lists things in the first column.. if you could expand on that, it would be great! - i'll get some more specific information for you in a minute (it may just be the way i'm performing the crosstab query)

jimpen:  let me check that and get back to you..


Hey Nelson, thanks for the email pointing to this Q, I appreciate that :o)

Fril

The best way to do this is to save the HTML page as text, then extract the data with Mid() command,
if you look at the following link, its similar to what you are asking:
https://www.experts-exchange.com/questions/20940488/Parse-emails-from-Outlook-Express.html

Your case would be easier, as you would deal with the text file right away.
and if you don't want to import any part of the text file, don't include it in the if statement.

It would help greatly if you could post here a few lines of your document, OR if there was any way to get the whole document, then nothing like it.

jaffer
Your code will look something like this:

           Open "C:\DeleteMe.txt" For Input As #1    ' Open file for input.

            Do While Not EOF(1)    ' Loop until end of file.
            DoCmd.GoToRecord , "frmPersonal_CVs", acNewRec
            Line Input #1, TextLine    ' Read line into variable.

If Mid(TextLine, 1, 5) = "post:" Then Forms![frmPersonal_CVs]![post] = Mid(TextLine, 7, Len(TextLine) - 5)
If Mid(TextLine, 1, 5) = "name:" Then Forms![frmPersonal_CVs]![name1] = Mid(TextLine, 7, Len(TextLine) - 5)
If Mid(TextLine, 1, 7) = "gender:" Then Forms![frmPersonal_CVs]![gender] = Mid(TextLine, 9, Len(TextLine) - 7)
If Mid(TextLine, 1, 8) = "dob_day:" Then Forms![frmPersonal_CVs]![dob_day] = Mid(TextLine, 10, Len(TextLine) - 8)
If Mid(TextLine, 1, 10) = "dob_month:" Then Forms![frmPersonal_CVs]![dob_month] = Mid(TextLine, 12, Len(TextLine) - 10)
If Mid(TextLine, 1, 9) = "dob_year:" Then Forms![frmPersonal_CVs]![dob_year] = Mid(TextLine, 11, Len(TextLine) - 9)

           Loop
           Close #1


jaffer
I've done it both ways in the past.  I find it easier to just dump it into a table. Usually I preprocess the table to blow out blank lines and other extraneous like page numbers.

The other plus is that you can see your data as you code as opposed to living in the debug.print of the debug window.

It works either way. It is more of a preference thing.
Avatar of Fril

ASKER

Hi,

Thanks for all your responses on this,  I was hoping for a way to resolve this without scripting as I don't have much experience with VB, and i can't even manage to get it installed on this system to try it out :( .. i'll only be able to have a word with our vb guy tomorrow ..

The additional problems i am faced with, is some of the rows have no data on certain records, and full data on others.. as well as the issue of not wanting to import every row, but if thats solved by simply not including the row header, then that is fine...

heres an example of one table (in HTML - hope it formats ok..)... don't know why I didn't post this sooner! Sorry..

Thanks!

Phil


------------------------------------


<HTML>
<table width="640" border>
<tr valign="top">
<td width="30%"><font face="Arial" size="2"><b>Manufacturer Ticket
Number</b></font></td>
<td width="70%">0123456</td>
</tr>
<tr valign="top">
<td width="30%"><b>Manufacturer</b></td>
<td width="70%">anymanufacturer</td>
</tr>
<tr valign="top">
<td width="30%"><b>Location</b></td>
<td width="70%">Anytown</td>
</tr>
<tr valign="top">
<td width="30%"><b>Ticket Status</b></td>
<td width="70%">CL</td>
</tr>
<tr valign="top">
<td width="30%"><b>Priority</b></td>
<td width="70%">M</td>
</tr>
<tr valign="top">
<td width="30%"><b>Overdue Status</b></td>
<td width="70%">?</td>
</tr>
<tr valign="top">
<td width="30%"><b>Hotsite Status</b></td>
<td width="70%">G</td>
</tr>
<tr valign="top">
<td width="30%"><b>Contact</b></td>
<td width="70%">name surname</td>
</tr>
<tr valign="top">
<td width="30%"><b>Notification Description</b></td>
<td width="70%">short description</td>
</tr>
<tr valign="top">
<td width="30%"><b>Customer Contact</b></td>
<td width="70%">name surname</td>
</tr>
<tr valign="top">
<td width="30%"><b>Tech Department</b></td>
<td width="70%">CSC</td>
</tr>
<tr valign="top">
<td width="30%"><b>Days Raised</b></td>
<td width="70%">92</td>
</tr>
<tr valign="top">
<td width="30%"><b>Equipment/Product Type</b></td>
<td width="70%">prod01234</td>
</tr>
<tr valign="top">
<td width="30%"><b>UK Dpar</b></td>
<td width="70%"></td>
</tr>
<tr valign="top">
<td width="30%"><b>Ticket Message</b></td>
<td width="70%">Normal Ticket</td>
</tr>
<tr valign="top">
<td width="30%"><b>Notes</b></td>
<td width="70%">very very very long notes bit...</td>
</tr>
<tr valign="top">
<td width="30%"><b>Content Changed</b></td>
<td width="70%">10/02/2004</td>
</tr>
<tr valign="top">
<td width="30%"><b>Scenario to Design</b></td>
<td width="70%">22-Nov-2003</td>
</tr>
<tr valign="top">
<td width="30%"><b>Fix to Support</b></td>
<td width="70%">31-Mar-2004</td>
</tr>
<tr valign="top">
<td width="30%"><b>Fix to Maintainer</b></td>
<td width="70%">21-Apr-2004</td>
</tr>
<tr valign="top">
<td width="30%"><b>Completion Date</b></td>
<td width="70%">5-May-2004</td>
</tr>
<tr valign="top">
<td width="30%"><b>Start Date</b></td>
<td width="70%">10-Nov-2003</td>
</tr>
</table>
</body>
</HTML>
Avatar of Fril

ASKER

and in text...

Manufacturer Ticket Number      123456
Manufacturer      anymanufacturer
Location      Anytown
Ticket Status      CL
Priority      M
Overdue Status      ?
Hotsite Status      G
Contact      name surname
Notification Description      short description
Customer Contact      
Tech Department      CSC      
Days Raised      92
Equipment/Product Type      prod01234
UK Dpar      
Ticket Message      Normal Ticket
Notes      very very very long notes bit...
Content Changed      10/02/2004
Scenario to Design      22-Nov-03
Fix to Support      31-Mar-04
Fix to Maintainer      21-Apr-04
Completion Date      05-May-04
Start Date      10-Nov-03
Note that Visual Basic for Applications should install by default with Access. Open a DB in Access and if you hit Alt+F11 it should bring up the VB window.
ASKER CERTIFIED SOLUTION
Avatar of jjafferr
jjafferr
Flag of Oman image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Fril

ASKER

thanks to everyone for all your responses and help on this..

I used jjaffers script, which worked a treat!

the only issue i had, were some of the fields were blank, and i had to remove the allow zero length, and some of the dates had been put in an incorrect format, so had to remove the date function.. apart from that, it was a nice easy process thanks to the code you gave..

thanks again!
Phil
Avatar of Fril

ASKER

Well.. someone else in the business has found out about "my" immense skills (i gave proper acknoledgemnts :) converting vertical to horizontal data, and they had a load of column headings that they wanted converting into row headings.. basically what i've just done, except from horizontal to vertical.. there is no data in the tables concerned, and i found this a nice easy solution without the need for vb programming.. (just thought i'd include it on this Q as a reference.. seems a fairly useful point)

CONVERTING COLUMN HEADINGS TO ROW HEADINGS - DOES NOT WORK WITH DATA

1. copy and paste the table into word -
2. in word, highlight the table and click table / convert / table to text
3. Choose seperate text with PARAGRAPH MARKS
4. watch as your data is neatly transformed from horizontal to vertical.

i found a little glitch which basically causes word to avoid applying formatting on any rows greater than 62. meaning, only 62 column headings can be converted at any one time.. although surely thats better than one at a time with copy and paste :)

below is the expert exchange Q this was shamelessly butchered from

https://www.experts-exchange.com/questions/20273540/Excel-Spread-column-data-across-rows.html
Hi Fril

I am glad my code worked for you, and we are always for the lookout for easier and nicer ways, but it is fun to code ;o)

Thanks for the points and the grade.

jaffer
Avatar of Fril

ASKER

no problem jaffer, appreciate the time and effort you put into this, saved me a nightmare data entry job :)

.. agreed coding is fun, you would have liked to see my 2700 records, with all the data 'magically' converting into the format you want.... and its made me want to learn a little VB :) ..

<<but it is fun to code>>
<<agreed coding is fun>>
YOU GUYS ARE WARPED!!!

But not me, I'm the only sane person I know.
Some coding is fun. When you can see the definite target it is easy to whip out some code that will work quickly and easily.
> you would have liked to see my 2700 records, with all the data 'magically' converting into the format you want
Ah, you should have said so, I would insert this line of code in the middle of any IF statement:

MyCount=MyCount+1
Call SysCmd(acSysCmdSetStatus, "Records Entered in the Table: " & MyCount)

This would show you the Records Entered in the Table, this will be displayed on your status Bar (at the bottom of your Form screen), I am sure
http://en.wikipedia.org/wiki/Count_von_Count
would have done the same, but louder ;o)

jaffer