[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Take details from website and add to database.

Posted on 2012-08-24
30
Medium Priority
?
671 Views
Last Modified: 2012-09-05
I want to make a database of items and item details from a specific website, I want to be able to  enter the Url into one field of the database, and for it to automatically fetch the information for the other fields.

The Website I wish to import this information from
http://www.dealextreme.com
Example Page
http://www.dealextreme.com/p/93909

Example Data:
Name                   |   SKU   | Price US  | Price AUS  | Specifications |
MiLi Photo Printer etc. | 93909 |  273.00   |    261.67    | Dimensions: 6.93 etc.

Open in new window

Preferably I'd like this to be able to be done in Excel, but thats not 100% needed.

I don't know how well I explained what I want here, so if there are any questions, please ask.
Thanks, Kemsypt.
0
Comment
Question by:kemsypt
  • 16
  • 14
30 Comments
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38331192
You have provided too little information for me to make the database connection but here is a first setup in Excel VBA.
Option Explicit

Const C_PAGE = "http://www.dealextreme.com/p/"

Sub GetDetails()

    ' note: use menu Tools|References to add "Microsoft VBScript Regular Expressions 5.5"
    Dim objRE As New RegExp

    Dim objXMLHTTP, strHtml
    Set objXMLHTTP = CreateObject("Microsoft.XMLHTTP")

    [A2].Select
    Dim str_sku
    For Each str_sku In Array(93909, 93910) ' numbers or strings (for letters or leading zeroes)
        ' get page html
        objXMLHTTP.Open "GET", C_PAGE & str_sku, False
        objXMLHTTP.Send
        strHtml = objXMLHTTP.ResponseText
        ' find text parts
        objRE.IgnoreCase = True
        objRE.MultiLine = True
        objRE.Pattern = "<div class=""SectionContents"">\s*Item: (.*?)\s*<br />\s*<div><font face=""Arial"" size=2>(.*?)</font></div>\s*</div>"
        Dim objMatches, objMatch
        Set objMatches = objRE.Execute(strHtml)
        If objMatches.Count > 0 Then
            ActiveCell.Value = objMatches(0).SubMatches(0)
            ActiveCell.Offset(0, 1).Value = str_sku
            ActiveCell.Offset(0, 4).Value = objMatches(0).SubMatches(1)
        End If
        Set objMatches = Nothing
        ' search price US
        objRE.Pattern = "<strong>\s*Price:</strong>\s*<span id=""ctl00_content_Price"" style=""font-family:Arial;font-size:15pt;font-weight:bold;"">\$([0-9,.]+)</span>"
        Set objMatches = objRE.Execute(strHtml)
        If objMatches.Count > 0 Then
            ActiveCell.Offset(0, 2).Value = objMatches(0).SubMatches(0)
        End If
        Set objMatches = Nothing
        ' search price AUD
        objRE.Pattern = "<span id='AUD'>\s*([0-9,.]+)\s*</span>"
        Set objMatches = objRE.Execute(strHtml)
        If objMatches.Count > 0 Then
            ActiveCell.Offset(0, 3).Value = objMatches(0).SubMatches(0)
        End If
        Set objMatches = Nothing
        ' next row
        ActiveCell.Offset(1, 0).Select
    Next
    Set objRE = Nothing
    Set objXMLHTTP = Nothing
End Sub

Open in new window

0
 

Author Comment

by:kemsypt
ID: 38335966
Sorry robert_schutt, I think maybe an some Images will explain what I want more:

What I want is to be able to fill in the SKU field, and for the other information (Name, Price US, Price AUS and specifications) to automatically (or done in batch) load in the specified fields.

(Circled in red is the details I want to import.
Information from web page
exel example
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38336288
Yep, that's exactly what the code does.
captureThere is a list of SKU's hard coded in the routine now, probably that's not what you want.

Can you give a little more info on the database you're going to use? Which type, what does the table look like, etc. Also how do you want to trigger the code, on open or with a button for example? Then I will extend the routine and post a working excel sheet.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:kemsypt
ID: 38336353
Table you have there is the exact layout I'm looking for. I really only need a basic table.

And I'd like the code to work on the click of a button preferably.

Thanks, Kemsypt.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38336522
Ok, I already started on a mostly generic solution. Should be easy to change to for example Access or MySQL but I made it for MSSQL (express) for the moment.

Starting with the table I created in a local database "ee" (I did that in design mode, this is the code that Management Studio generated):
CREATE TABLE [tblDX](
	[ID] [int] IDENTITY(1,1) NOT NULL,
	[SKU] [varchar](20) NOT NULL,
	[Name] [varchar](200) NULL,
	[PriceUS] [money] NULL,
	[PriceAUS] [money] NULL,
	[Specs] [varchar](500) NULL,
	[LAST_UPDATE] [datetime] NULL,
 CONSTRAINT [PK_tblDX] PRIMARY KEY CLUSTERED 
(
	[ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

Open in new window

Attached please find the Excel file. A number of checks will probably have to be added but try and play around with it for a bit.

What I did was add some numbers in the table just incrementing from the number you used as an example. You may want to process a maximum number of pages at a time hence the "TOP 2" in the sql, also useful for testing purposes...
DX-post.xls
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38336531
Here is the table after running the excel code, there is a problem with linefeeds, but maybe that's just in Management Studio.
capture2 - updated table
0
 

Author Comment

by:kemsypt
ID: 38336802
Excel comes up with an error when trying to run your code, since it works for you, I'm assuming its a problem with my excel.
I think it may be an issue with excel not (for some odd reason) being able to access the internet.
Once I can fix that, I shall try out your code, it looks pretty good thus far though, I am grateful for the work put in.

I posted a separate question trying to figure out my excel problem (a reasonable effort on Google didn't do much in the way of a solution)
Excel Problem - System Error &H80004005 (-2147467259) + cannot access webpage

Thanks, Kemsypt.
0
 

Author Comment

by:kemsypt
ID: 38336817
Either that, or are there any other programs that you could recommend that would do what I desire equally well, or better than excel?

Thanks, Kemsypt.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38336878
Well, since it is basically VB script, if we take out the Excel references, you could just run it as a .vbs
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38336946
For example:
Option Explicit

Const C_PAGE = "http://www.dealextreme.com/p/"

' ADO constants (from C:\Program Files\Common Files\System\ado\ADOVBS.INC)
Const adCmdText = &H1
Const adCmdStoredProc = &H4
Const adInteger = 3
Const adCurrency = 6
Const adVarChar = 200
Const adParamInput = &H1
Const adExecuteNoRecords = &H80

Dim msg
msg=""

Call GetDetails()

MsgBox msg


Sub GetDetails()

    Dim objRE
	Set objRE = New RegExp

    Dim objConn, objRst
    Set objConn = CreateObject("ADODB.Connection")
    objConn.ConnectionString = "Provider=SQLOLEDB;Data Source=.\SQLEXPRESS;User ID=ee;Password=ee"
    objConn.Open
    Dim str_sku
    Set objRst = objConn.Execute("SELECT TOP 2 [SKU] FROM [tblDX] WHERE [LAST_UPDATE] IS NULL OR [LAST_UPDATE] < GETDATE() - 1 ORDER BY [LAST_UPDATE], ID")
    Dim arrSKUs(), intCountSKUs
    intCountSKUs = -1
    While Not objRst.EOF
        str_sku = objRst.Fields(0).Value
        intCountSKUs = intCountSKUs + 1
        ReDim Preserve arrSKUs(intCountSKUs)
        arrSKUs(intCountSKUs) = str_sku
        objRst.MoveNext
    Wend
    objRst.Close
    Set objRst = Nothing
    
    If intCountSKUs = -1 Then
        msg = "Nothing to update"
    Else
        Dim objXMLHTTP, strHtml, objCmd
        ' prepare update command
        Set objCmd = CreateObject("ADODB.Command")
        Set objCmd.ActiveConnection = objConn
        objCmd.CommandType = adCmdText
        objCmd.CommandText = "UPDATE [tblDX] SET [Name] = ?, [PriceUS] = ?, [PriceAUS] = ?, [Specs] = Replace(?,'<br />',char(13)), [LAST_UPDATE] = GETDATE() WHERE [SKU] = ?"
        objCmd.Parameters.Append objCmd.CreateParameter("Name", adVarChar, adParamInput, 200)
        objCmd.Parameters.Append objCmd.CreateParameter("PriceUS", adCurrency, adParamInput)
        objCmd.Parameters.Append objCmd.CreateParameter("PriceAUS", adCurrency, adParamInput)
        objCmd.Parameters.Append objCmd.CreateParameter("Specs", adVarChar, adParamInput, 500)
        objCmd.Parameters.Append objCmd.CreateParameter("SKU", adVarChar, adParamInput, 20)
        ' get pages for each SKU to be updated
        Set objXMLHTTP = CreateObject("Microsoft.XMLHTTP")
        For Each str_sku In arrSKUs ' Array(93909, 93910) ' numbers or strings (for letters or leading zeroes)
            objCmd.Parameters(4).Value = str_sku
            ' get page html
            objXMLHTTP.Open "GET", C_PAGE & str_sku, False
            objXMLHTTP.Send
            strHtml = objXMLHTTP.ResponseText
            ' find text parts
            objRE.IgnoreCase = True
            objRE.MultiLine = True
            objRE.Pattern = "<div class=""SectionContents"">\s*Item: (.*?)\s*<br />\s*<div><font face=""Arial"" size=2>(.*?)</font></div>\s*</div>"
            Dim objMatches, objMatch
            Set objMatches = objRE.Execute(strHtml)
            If objMatches.Count > 0 Then
                objCmd.Parameters(0).Value = objMatches(0).SubMatches(0)
                objCmd.Parameters(3).Value = objMatches(0).SubMatches(1) ' Replace(, "<br />", Chr(13)) ' now done in SQL!
            Else
                objCmd.Parameters(0).Value = "N/A"
                objCmd.Parameters(3).Value = "N/A"
            End If
            Set objMatches = Nothing
            ' search price US
            objRE.Pattern = "<strong>\s*Price:</strong>\s*<span id=""ctl00_content_Price"" style=""font-family:Arial;font-size:15pt;font-weight:bold;"">\$([0-9,.]+)</span>"
            Set objMatches = objRE.Execute(strHtml)
            If objMatches.Count > 0 Then
                objCmd.Parameters(1).Value = objMatches(0).SubMatches(0)
            Else
                objCmd.Parameters(1).Value = 0 ' default/keep existing?
            End If
            Set objMatches = Nothing
            ' search price AUD
            objRE.Pattern = "<span id='AUD'>\s*([0-9,.]+)\s*</span>"
            Set objMatches = objRE.Execute(strHtml)
            If objMatches.Count > 0 Then
                objCmd.Parameters(2).Value = objMatches(0).SubMatches(0)
            Else
                objCmd.Parameters(2).Value = 0 ' ?
            End If
            Set objMatches = Nothing
            ' update database
            objCmd.Execute , , adExecuteNoRecords
			msg = msg & "updated " & str_sku & vbCrLf
        Next
        ' clean up
        Set objRE = Nothing
        Set objXMLHTTP = Nothing
        Set objCmd.ActiveConnection = Nothing
        While objCmd.Parameters.Count > 0
            objCmd.Parameters.Delete objCmd.Parameters.Count - 1
        Wend
        Set objCmd = Nothing
        'MsgBox "Update complete"
    End If
    objConn.Close
    Set objConn = Nothing
End Sub

Open in new window

0
 

Author Comment

by:kemsypt
ID: 38360809
Haven't forgotten about this, been busy + trying to fix my excel error before i proceed.

Thanks for all your help thus far though.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38360844
No problem, take your time.

Yeah, I took a look at that other question but really haven't a clue what could cause that.

Didn't the vb script work or is that not an option?
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38360877
By the way, both in VBA and VBS you need to change the connectionstring to your own needs.
0
 

Author Comment

by:kemsypt
ID: 38363029
Oh I need to
change the connectionstring to your own needs.
? I may have assumed it was usable straight off, how silly of me.

I have little VBA knowledge, so if you could explain what and how I go about doing that, it would be much appreciated.
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38363117
This code:
objConn.ConnectionString = "Provider=SQLOLEDB;Data Source=.\SQLEXPRESS;User ID=ee;Password=ee"

Open in new window

needs to be adjusted to access your database, so maybe on a different server, different user/password or trusted connection.
0
 

Author Comment

by:kemsypt
ID: 38363220
How would I go about determining any of this?
Do I need to install any other programs to do this? (SQL Express?)
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38363545
Well, yes. I use SQL express but any database would do I guess. In your post you stated you want to save the data in a database. Just saving it to excel is also a possibility.
0
 

Author Comment

by:kemsypt
ID: 38363942
I did state that yes, but I wasn't 100% sure with what I wanted when I wrote that.
Just saving to Excel would be great though
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38364065
Ok, then you just need to decide on a procedure. Since the code now uses the database itself to determine which sku's to download, you need a different 'trigger'.

Actually, the first code I posted could be used because that's just downloading a specific array of sku's (and put it in Excel, not a database):
For Each str_sku In Array(93909, 93910)

Open in new window


Another option would be to use the excel sheet as input for that as well, insert the sku's at the bottom of the list and the code could check for new sku's or update all rows every time. Any idea what would be a convenient way for you?
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38364151
Here is an example, updating where sku is filled and description is empty.
DX-post2.xls
0
 

Author Comment

by:kemsypt
ID: 38364334
Ok, by the looks of things, what you just gave me is pretty much absolutely perfect.
Its definitely usable for what I want, although just one more thing (if its possible to do):

Is there any way for it to, if, for example, the details of SKU 93909 have changed since the last time I clicked update, is there any way I could have it inform me thus (Eg. it saying "Updated" in column F)

I'm happy to manually delete the Updated field if its necessary, I was just wandering if it would be possible.

If its not possible, then what I have here will definitely do me will still. Thanks for all your help.
0
 

Author Comment

by:kemsypt
ID: 38364428
Just found out that the site has changed its name and layout, I would love if you could check to see if everything would still work with the new layout. Given how much help you have been I'll open it in a new question and close this one.
Much appreciation.
Kemsypt.

New Question - http://www.experts-exchange.com/Database/Miscellaneous/Q_27852625.html
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38366194
There's not really a need for a new question. I'll adapt it here, no problem. Also I think the new question would not be clear to anybody what you mean there without a reference to this question, maybe better to just delete it.

The "updated" shouldn't be a problem. The design change gives one problem straight up and that's the url which can no longer be constructed with just the sku. Should there be some search action? Oh wait, I see it's still possible to use http://dx.com/p/93909

I happen to be a client on the site so maybe it's my settings but I completely lost the currency conversions so would it be an option to use a default conversion or get that from another website? (I've seen references to that on EE before).
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38366264
For the update notification and url/design change here is a new version. For the currency I don't have a solution yet and await your answer regarding an alternative or maybe I'm missing something in the site settings.
DX-post3.xls
0
 

Author Comment

by:kemsypt
ID: 38366521
I talked to one of the Live Support people from DX and they informed me they don't have an AU currency conversion live on the new DX yet.
So I think I'll have have to make do with what I have until they introduce that feature.

So, just 2 more things then everything will be perfect:
1 - Could you integrate the Status feature into the original spreadsheet
2 - Would you be able to add, on column G, something to check whether the item is still available? (it says "Delivery:      Item is temporarily sold out." if the item is not available)

So what would be perfect is if you could make it so both the Dealextreme and the DX spreadsheets have the Status feature and have the availability feature to.

Thanks substantially for your help.
0
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 2000 total points
ID: 38367004
No problem, here are workbooks 2 (old page) and 3 (new page) with new options built in. Needs some more testing and changes to the site can make it stop working any time, so making the regex patterns a bit more robust could be valuable.
DX-post2.xls
DX-post3.xls
0
 

Author Closing Comment

by:kemsypt
ID: 38369359
Brilliant help at every turn, very much appreciated.
0
 

Author Comment

by:kemsypt
ID: 38369463
Out of curiosity, how difficult was this for you to do? Did it take long?
0
 
LVL 35

Expert Comment

by:Robert Schutt
ID: 38369977
It didn't take too long, a couple of hours at the most. I know some people like to say they do stuff like this in 5 minutes but I wanted to test the changes every step, have felt embarrassed before posting something 'faulty'. I have done a number of questions similar to this so some code could be copied but each time it is of course a little different. Like I already admitted before, it would have taken longer if I had taken the time to really make the regex patterns more robust but not knowing exactly what your purpose with this is, I wanted to get something up and running first and it did seem to do the job nicely so I just thought I'd post what I had cooked up and see what your reaction was. Turns out I think we can both be happy with this, although there may be some changes needed when DX adds the currencies again or changes the layout. The best result would be of course if my code makes the process more clear for you and is easy enough to change when needed. But otherwise you know you have us here at EE to help again!
0

Featured Post

New feature and membership benefit!

New feature! Upgrade and increase expert visibility of your issues with Priority Questions.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Windows Explorer let you handle zip folders nearly as any other folder: Copy, move, change, and delete, etc. In VBA you can also handle normal files and folders, but zip folders takes a little more - and that you'll find here.
Eseutil Hard Recovery is part of exchange tool and ensures Exchange mailbox data recovery when mailbox gets corrupt due to some problem on Exchange server.
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…
With just a little bit of  SQL and VBA, many doors open to cool things like synchronize a list box to display data relevant to other information on a form.  If you have never written code or looked at an SQL statement before, no problem! ...  give i…

872 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question