Link to home
Start Free TrialLog in
Avatar of kemsypt
kemsypt

asked on

Take details from website and add to database.

I want to make a database of items and item details from a specific website, I want to be able to  enter the Url into one field of the database, and for it to automatically fetch the information for the other fields.

The Website I wish to import this information from
http://www.dealextreme.com
Example Page
http://www.dealextreme.com/p/93909

Example Data:
Name                   |   SKU   | Price US  | Price AUS  | Specifications |
MiLi Photo Printer etc. | 93909 |  273.00   |    261.67    | Dimensions: 6.93 etc.

Open in new window

Preferably I'd like this to be able to be done in Excel, but thats not 100% needed.

I don't know how well I explained what I want here, so if there are any questions, please ask.
Thanks, Kemsypt.
Avatar of Robert Schutt
Robert Schutt
Flag of Netherlands image

You have provided too little information for me to make the database connection but here is a first setup in Excel VBA.
Option Explicit

Const C_PAGE = "http://www.dealextreme.com/p/"

Sub GetDetails()

    ' note: use menu Tools|References to add "Microsoft VBScript Regular Expressions 5.5"
    Dim objRE As New RegExp

    Dim objXMLHTTP, strHtml
    Set objXMLHTTP = CreateObject("Microsoft.XMLHTTP")

    [A2].Select
    Dim str_sku
    For Each str_sku In Array(93909, 93910) ' numbers or strings (for letters or leading zeroes)
        ' get page html
        objXMLHTTP.Open "GET", C_PAGE & str_sku, False
        objXMLHTTP.Send
        strHtml = objXMLHTTP.ResponseText
        ' find text parts
        objRE.IgnoreCase = True
        objRE.MultiLine = True
        objRE.Pattern = "<div class=""SectionContents"">\s*Item: (.*?)\s*<br />\s*<div><font face=""Arial"" size=2>(.*?)</font></div>\s*</div>"
        Dim objMatches, objMatch
        Set objMatches = objRE.Execute(strHtml)
        If objMatches.Count > 0 Then
            ActiveCell.Value = objMatches(0).SubMatches(0)
            ActiveCell.Offset(0, 1).Value = str_sku
            ActiveCell.Offset(0, 4).Value = objMatches(0).SubMatches(1)
        End If
        Set objMatches = Nothing
        ' search price US
        objRE.Pattern = "<strong>\s*Price:</strong>\s*<span id=""ctl00_content_Price"" style=""font-family:Arial;font-size:15pt;font-weight:bold;"">\$([0-9,.]+)</span>"
        Set objMatches = objRE.Execute(strHtml)
        If objMatches.Count > 0 Then
            ActiveCell.Offset(0, 2).Value = objMatches(0).SubMatches(0)
        End If
        Set objMatches = Nothing
        ' search price AUD
        objRE.Pattern = "<span id='AUD'>\s*([0-9,.]+)\s*</span>"
        Set objMatches = objRE.Execute(strHtml)
        If objMatches.Count > 0 Then
            ActiveCell.Offset(0, 3).Value = objMatches(0).SubMatches(0)
        End If
        Set objMatches = Nothing
        ' next row
        ActiveCell.Offset(1, 0).Select
    Next
    Set objRE = Nothing
    Set objXMLHTTP = Nothing
End Sub

Open in new window

Avatar of kemsypt
kemsypt

ASKER

Sorry robert_schutt, I think maybe an some Images will explain what I want more:

What I want is to be able to fill in the SKU field, and for the other information (Name, Price US, Price AUS and specifications) to automatically (or done in batch) load in the specified fields.

(Circled in red is the details I want to import.
User generated image
User generated image
Yep, that's exactly what the code does.
User generated imageThere is a list of SKU's hard coded in the routine now, probably that's not what you want.

Can you give a little more info on the database you're going to use? Which type, what does the table look like, etc. Also how do you want to trigger the code, on open or with a button for example? Then I will extend the routine and post a working excel sheet.
Avatar of kemsypt

ASKER

Table you have there is the exact layout I'm looking for. I really only need a basic table.

And I'd like the code to work on the click of a button preferably.

Thanks, Kemsypt.
Ok, I already started on a mostly generic solution. Should be easy to change to for example Access or MySQL but I made it for MSSQL (express) for the moment.

Starting with the table I created in a local database "ee" (I did that in design mode, this is the code that Management Studio generated):
CREATE TABLE [tblDX](
	[ID] [int] IDENTITY(1,1) NOT NULL,
	[SKU] [varchar](20) NOT NULL,
	[Name] [varchar](200) NULL,
	[PriceUS] [money] NULL,
	[PriceAUS] [money] NULL,
	[Specs] [varchar](500) NULL,
	[LAST_UPDATE] [datetime] NULL,
 CONSTRAINT [PK_tblDX] PRIMARY KEY CLUSTERED 
(
	[ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO

Open in new window

Attached please find the Excel file. A number of checks will probably have to be added but try and play around with it for a bit.

What I did was add some numbers in the table just incrementing from the number you used as an example. You may want to process a maximum number of pages at a time hence the "TOP 2" in the sql, also useful for testing purposes...
DX-post.xls
Here is the table after running the excel code, there is a problem with linefeeds, but maybe that's just in Management Studio.
User generated image
Avatar of kemsypt

ASKER

Excel comes up with an error when trying to run your code, since it works for you, I'm assuming its a problem with my excel.
I think it may be an issue with excel not (for some odd reason) being able to access the internet.
Once I can fix that, I shall try out your code, it looks pretty good thus far though, I am grateful for the work put in.

I posted a separate question trying to figure out my excel problem (a reasonable effort on Google didn't do much in the way of a solution)
Excel Problem - System Error &H80004005 (-2147467259) + cannot access webpage

Thanks, Kemsypt.
Avatar of kemsypt

ASKER

Either that, or are there any other programs that you could recommend that would do what I desire equally well, or better than excel?

Thanks, Kemsypt.
Well, since it is basically VB script, if we take out the Excel references, you could just run it as a .vbs
For example:
Option Explicit

Const C_PAGE = "http://www.dealextreme.com/p/"

' ADO constants (from C:\Program Files\Common Files\System\ado\ADOVBS.INC)
Const adCmdText = &H1
Const adCmdStoredProc = &H4
Const adInteger = 3
Const adCurrency = 6
Const adVarChar = 200
Const adParamInput = &H1
Const adExecuteNoRecords = &H80

Dim msg
msg=""

Call GetDetails()

MsgBox msg


Sub GetDetails()

    Dim objRE
	Set objRE = New RegExp

    Dim objConn, objRst
    Set objConn = CreateObject("ADODB.Connection")
    objConn.ConnectionString = "Provider=SQLOLEDB;Data Source=.\SQLEXPRESS;User ID=ee;Password=ee"
    objConn.Open
    Dim str_sku
    Set objRst = objConn.Execute("SELECT TOP 2 [SKU] FROM [tblDX] WHERE [LAST_UPDATE] IS NULL OR [LAST_UPDATE] < GETDATE() - 1 ORDER BY [LAST_UPDATE], ID")
    Dim arrSKUs(), intCountSKUs
    intCountSKUs = -1
    While Not objRst.EOF
        str_sku = objRst.Fields(0).Value
        intCountSKUs = intCountSKUs + 1
        ReDim Preserve arrSKUs(intCountSKUs)
        arrSKUs(intCountSKUs) = str_sku
        objRst.MoveNext
    Wend
    objRst.Close
    Set objRst = Nothing
    
    If intCountSKUs = -1 Then
        msg = "Nothing to update"
    Else
        Dim objXMLHTTP, strHtml, objCmd
        ' prepare update command
        Set objCmd = CreateObject("ADODB.Command")
        Set objCmd.ActiveConnection = objConn
        objCmd.CommandType = adCmdText
        objCmd.CommandText = "UPDATE [tblDX] SET [Name] = ?, [PriceUS] = ?, [PriceAUS] = ?, [Specs] = Replace(?,'<br />',char(13)), [LAST_UPDATE] = GETDATE() WHERE [SKU] = ?"
        objCmd.Parameters.Append objCmd.CreateParameter("Name", adVarChar, adParamInput, 200)
        objCmd.Parameters.Append objCmd.CreateParameter("PriceUS", adCurrency, adParamInput)
        objCmd.Parameters.Append objCmd.CreateParameter("PriceAUS", adCurrency, adParamInput)
        objCmd.Parameters.Append objCmd.CreateParameter("Specs", adVarChar, adParamInput, 500)
        objCmd.Parameters.Append objCmd.CreateParameter("SKU", adVarChar, adParamInput, 20)
        ' get pages for each SKU to be updated
        Set objXMLHTTP = CreateObject("Microsoft.XMLHTTP")
        For Each str_sku In arrSKUs ' Array(93909, 93910) ' numbers or strings (for letters or leading zeroes)
            objCmd.Parameters(4).Value = str_sku
            ' get page html
            objXMLHTTP.Open "GET", C_PAGE & str_sku, False
            objXMLHTTP.Send
            strHtml = objXMLHTTP.ResponseText
            ' find text parts
            objRE.IgnoreCase = True
            objRE.MultiLine = True
            objRE.Pattern = "<div class=""SectionContents"">\s*Item: (.*?)\s*<br />\s*<div><font face=""Arial"" size=2>(.*?)</font></div>\s*</div>"
            Dim objMatches, objMatch
            Set objMatches = objRE.Execute(strHtml)
            If objMatches.Count > 0 Then
                objCmd.Parameters(0).Value = objMatches(0).SubMatches(0)
                objCmd.Parameters(3).Value = objMatches(0).SubMatches(1) ' Replace(, "<br />", Chr(13)) ' now done in SQL!
            Else
                objCmd.Parameters(0).Value = "N/A"
                objCmd.Parameters(3).Value = "N/A"
            End If
            Set objMatches = Nothing
            ' search price US
            objRE.Pattern = "<strong>\s*Price:</strong>\s*<span id=""ctl00_content_Price"" style=""font-family:Arial;font-size:15pt;font-weight:bold;"">\$([0-9,.]+)</span>"
            Set objMatches = objRE.Execute(strHtml)
            If objMatches.Count > 0 Then
                objCmd.Parameters(1).Value = objMatches(0).SubMatches(0)
            Else
                objCmd.Parameters(1).Value = 0 ' default/keep existing?
            End If
            Set objMatches = Nothing
            ' search price AUD
            objRE.Pattern = "<span id='AUD'>\s*([0-9,.]+)\s*</span>"
            Set objMatches = objRE.Execute(strHtml)
            If objMatches.Count > 0 Then
                objCmd.Parameters(2).Value = objMatches(0).SubMatches(0)
            Else
                objCmd.Parameters(2).Value = 0 ' ?
            End If
            Set objMatches = Nothing
            ' update database
            objCmd.Execute , , adExecuteNoRecords
			msg = msg & "updated " & str_sku & vbCrLf
        Next
        ' clean up
        Set objRE = Nothing
        Set objXMLHTTP = Nothing
        Set objCmd.ActiveConnection = Nothing
        While objCmd.Parameters.Count > 0
            objCmd.Parameters.Delete objCmd.Parameters.Count - 1
        Wend
        Set objCmd = Nothing
        'MsgBox "Update complete"
    End If
    objConn.Close
    Set objConn = Nothing
End Sub

Open in new window

Avatar of kemsypt

ASKER

Haven't forgotten about this, been busy + trying to fix my excel error before i proceed.

Thanks for all your help thus far though.
No problem, take your time.

Yeah, I took a look at that other question but really haven't a clue what could cause that.

Didn't the vb script work or is that not an option?
By the way, both in VBA and VBS you need to change the connectionstring to your own needs.
Avatar of kemsypt

ASKER

Oh I need to
change the connectionstring to your own needs.
? I may have assumed it was usable straight off, how silly of me.

I have little VBA knowledge, so if you could explain what and how I go about doing that, it would be much appreciated.
This code:
objConn.ConnectionString = "Provider=SQLOLEDB;Data Source=.\SQLEXPRESS;User ID=ee;Password=ee"

Open in new window

needs to be adjusted to access your database, so maybe on a different server, different user/password or trusted connection.
Avatar of kemsypt

ASKER

How would I go about determining any of this?
Do I need to install any other programs to do this? (SQL Express?)
Well, yes. I use SQL express but any database would do I guess. In your post you stated you want to save the data in a database. Just saving it to excel is also a possibility.
Avatar of kemsypt

ASKER

I did state that yes, but I wasn't 100% sure with what I wanted when I wrote that.
Just saving to Excel would be great though
Ok, then you just need to decide on a procedure. Since the code now uses the database itself to determine which sku's to download, you need a different 'trigger'.

Actually, the first code I posted could be used because that's just downloading a specific array of sku's (and put it in Excel, not a database):
For Each str_sku In Array(93909, 93910)

Open in new window


Another option would be to use the excel sheet as input for that as well, insert the sku's at the bottom of the list and the code could check for new sku's or update all rows every time. Any idea what would be a convenient way for you?
Here is an example, updating where sku is filled and description is empty.
DX-post2.xls
Avatar of kemsypt

ASKER

Ok, by the looks of things, what you just gave me is pretty much absolutely perfect.
Its definitely usable for what I want, although just one more thing (if its possible to do):

Is there any way for it to, if, for example, the details of SKU 93909 have changed since the last time I clicked update, is there any way I could have it inform me thus (Eg. it saying "Updated" in column F)

I'm happy to manually delete the Updated field if its necessary, I was just wandering if it would be possible.

If its not possible, then what I have here will definitely do me will still. Thanks for all your help.
Avatar of kemsypt

ASKER

Just found out that the site has changed its name and layout, I would love if you could check to see if everything would still work with the new layout. Given how much help you have been I'll open it in a new question and close this one.
Much appreciation.
Kemsypt.

New Question - https://www.experts-exchange.com/questions/27852625/Updating-Data-Excel.html
There's not really a need for a new question. I'll adapt it here, no problem. Also I think the new question would not be clear to anybody what you mean there without a reference to this question, maybe better to just delete it.

The "updated" shouldn't be a problem. The design change gives one problem straight up and that's the url which can no longer be constructed with just the sku. Should there be some search action? Oh wait, I see it's still possible to use http://dx.com/p/93909

I happen to be a client on the site so maybe it's my settings but I completely lost the currency conversions so would it be an option to use a default conversion or get that from another website? (I've seen references to that on EE before).
For the update notification and url/design change here is a new version. For the currency I don't have a solution yet and await your answer regarding an alternative or maybe I'm missing something in the site settings.
DX-post3.xls
Avatar of kemsypt

ASKER

I talked to one of the Live Support people from DX and they informed me they don't have an AU currency conversion live on the new DX yet.
So I think I'll have have to make do with what I have until they introduce that feature.

So, just 2 more things then everything will be perfect:
1 - Could you integrate the Status feature into the original spreadsheet
2 - Would you be able to add, on column G, something to check whether the item is still available? (it says "Delivery:      Item is temporarily sold out." if the item is not available)

So what would be perfect is if you could make it so both the Dealextreme and the DX spreadsheets have the Status feature and have the availability feature to.

Thanks substantially for your help.
ASKER CERTIFIED SOLUTION
Avatar of Robert Schutt
Robert Schutt
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of kemsypt

ASKER

Brilliant help at every turn, very much appreciated.
Avatar of kemsypt

ASKER

Out of curiosity, how difficult was this for you to do? Did it take long?
It didn't take too long, a couple of hours at the most. I know some people like to say they do stuff like this in 5 minutes but I wanted to test the changes every step, have felt embarrassed before posting something 'faulty'. I have done a number of questions similar to this so some code could be copied but each time it is of course a little different. Like I already admitted before, it would have taken longer if I had taken the time to really make the regex patterns more robust but not knowing exactly what your purpose with this is, I wanted to get something up and running first and it did seem to do the job nicely so I just thought I'd post what I had cooked up and see what your reaction was. Turns out I think we can both be happy with this, although there may be some changes needed when DX adds the currencies again or changes the layout. The best result would be of course if my code makes the process more clear for you and is easy enough to change when needed. But otherwise you know you have us here at EE to help again!