kemsypt
asked on
Take details from website and add to database.
I want to make a database of items and item details from a specific website, I want to be able to enter the Url into one field of the database, and for it to automatically fetch the information for the other fields.
The Website I wish to import this information from
http://www.dealextreme.com
Example Page
http://www.dealextreme.com/p/93909
Example Data:
I don't know how well I explained what I want here, so if there are any questions, please ask.
Thanks, Kemsypt.
The Website I wish to import this information from
http://www.dealextreme.com
Example Page
http://www.dealextreme.com/p/93909
Example Data:
Name | SKU | Price US | Price AUS | Specifications |
MiLi Photo Printer etc. | 93909 | 273.00 | 261.67 | Dimensions: 6.93 etc.
Preferably I'd like this to be able to be done in Excel, but thats not 100% needed.I don't know how well I explained what I want here, so if there are any questions, please ask.
Thanks, Kemsypt.
ASKER
Sorry robert_schutt, I think maybe an some Images will explain what I want more:
What I want is to be able to fill in the SKU field, and for the other information (Name, Price US, Price AUS and specifications) to automatically (or done in batch) load in the specified fields.
(Circled in red is the details I want to import.
What I want is to be able to fill in the SKU field, and for the other information (Name, Price US, Price AUS and specifications) to automatically (or done in batch) load in the specified fields.
(Circled in red is the details I want to import.
Yep, that's exactly what the code does.
There is a list of SKU's hard coded in the routine now, probably that's not what you want.
Can you give a little more info on the database you're going to use? Which type, what does the table look like, etc. Also how do you want to trigger the code, on open or with a button for example? Then I will extend the routine and post a working excel sheet.
There is a list of SKU's hard coded in the routine now, probably that's not what you want.
Can you give a little more info on the database you're going to use? Which type, what does the table look like, etc. Also how do you want to trigger the code, on open or with a button for example? Then I will extend the routine and post a working excel sheet.
ASKER
Table you have there is the exact layout I'm looking for. I really only need a basic table.
And I'd like the code to work on the click of a button preferably.
Thanks, Kemsypt.
And I'd like the code to work on the click of a button preferably.
Thanks, Kemsypt.
Ok, I already started on a mostly generic solution. Should be easy to change to for example Access or MySQL but I made it for MSSQL (express) for the moment.
Starting with the table I created in a local database "ee" (I did that in design mode, this is the code that Management Studio generated):
What I did was add some numbers in the table just incrementing from the number you used as an example. You may want to process a maximum number of pages at a time hence the "TOP 2" in the sql, also useful for testing purposes...
DX-post.xls
Starting with the table I created in a local database "ee" (I did that in design mode, this is the code that Management Studio generated):
CREATE TABLE [tblDX](
[ID] [int] IDENTITY(1,1) NOT NULL,
[SKU] [varchar](20) NOT NULL,
[Name] [varchar](200) NULL,
[PriceUS] [money] NULL,
[PriceAUS] [money] NULL,
[Specs] [varchar](500) NULL,
[LAST_UPDATE] [datetime] NULL,
CONSTRAINT [PK_tblDX] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Attached please find the Excel file. A number of checks will probably have to be added but try and play around with it for a bit.What I did was add some numbers in the table just incrementing from the number you used as an example. You may want to process a maximum number of pages at a time hence the "TOP 2" in the sql, also useful for testing purposes...
DX-post.xls
ASKER
Excel comes up with an error when trying to run your code, since it works for you, I'm assuming its a problem with my excel.
I think it may be an issue with excel not (for some odd reason) being able to access the internet.
Once I can fix that, I shall try out your code, it looks pretty good thus far though, I am grateful for the work put in.
I posted a separate question trying to figure out my excel problem (a reasonable effort on Google didn't do much in the way of a solution)
Excel Problem - System Error &H80004005 (-2147467259) + cannot access webpage
Thanks, Kemsypt.
I think it may be an issue with excel not (for some odd reason) being able to access the internet.
Once I can fix that, I shall try out your code, it looks pretty good thus far though, I am grateful for the work put in.
I posted a separate question trying to figure out my excel problem (a reasonable effort on Google didn't do much in the way of a solution)
Excel Problem - System Error &H80004005 (-2147467259) + cannot access webpage
Thanks, Kemsypt.
ASKER
Either that, or are there any other programs that you could recommend that would do what I desire equally well, or better than excel?
Thanks, Kemsypt.
Thanks, Kemsypt.
Well, since it is basically VB script, if we take out the Excel references, you could just run it as a .vbs
For example:
Option Explicit
Const C_PAGE = "http://www.dealextreme.com/p/"
' ADO constants (from C:\Program Files\Common Files\System\ado\ADOVBS.INC)
Const adCmdText = &H1
Const adCmdStoredProc = &H4
Const adInteger = 3
Const adCurrency = 6
Const adVarChar = 200
Const adParamInput = &H1
Const adExecuteNoRecords = &H80
Dim msg
msg=""
Call GetDetails()
MsgBox msg
Sub GetDetails()
Dim objRE
Set objRE = New RegExp
Dim objConn, objRst
Set objConn = CreateObject("ADODB.Connection")
objConn.ConnectionString = "Provider=SQLOLEDB;Data Source=.\SQLEXPRESS;User ID=ee;Password=ee"
objConn.Open
Dim str_sku
Set objRst = objConn.Execute("SELECT TOP 2 [SKU] FROM [tblDX] WHERE [LAST_UPDATE] IS NULL OR [LAST_UPDATE] < GETDATE() - 1 ORDER BY [LAST_UPDATE], ID")
Dim arrSKUs(), intCountSKUs
intCountSKUs = -1
While Not objRst.EOF
str_sku = objRst.Fields(0).Value
intCountSKUs = intCountSKUs + 1
ReDim Preserve arrSKUs(intCountSKUs)
arrSKUs(intCountSKUs) = str_sku
objRst.MoveNext
Wend
objRst.Close
Set objRst = Nothing
If intCountSKUs = -1 Then
msg = "Nothing to update"
Else
Dim objXMLHTTP, strHtml, objCmd
' prepare update command
Set objCmd = CreateObject("ADODB.Command")
Set objCmd.ActiveConnection = objConn
objCmd.CommandType = adCmdText
objCmd.CommandText = "UPDATE [tblDX] SET [Name] = ?, [PriceUS] = ?, [PriceAUS] = ?, [Specs] = Replace(?,'<br />',char(13)), [LAST_UPDATE] = GETDATE() WHERE [SKU] = ?"
objCmd.Parameters.Append objCmd.CreateParameter("Name", adVarChar, adParamInput, 200)
objCmd.Parameters.Append objCmd.CreateParameter("PriceUS", adCurrency, adParamInput)
objCmd.Parameters.Append objCmd.CreateParameter("PriceAUS", adCurrency, adParamInput)
objCmd.Parameters.Append objCmd.CreateParameter("Specs", adVarChar, adParamInput, 500)
objCmd.Parameters.Append objCmd.CreateParameter("SKU", adVarChar, adParamInput, 20)
' get pages for each SKU to be updated
Set objXMLHTTP = CreateObject("Microsoft.XMLHTTP")
For Each str_sku In arrSKUs ' Array(93909, 93910) ' numbers or strings (for letters or leading zeroes)
objCmd.Parameters(4).Value = str_sku
' get page html
objXMLHTTP.Open "GET", C_PAGE & str_sku, False
objXMLHTTP.Send
strHtml = objXMLHTTP.ResponseText
' find text parts
objRE.IgnoreCase = True
objRE.MultiLine = True
objRE.Pattern = "<div class=""SectionContents"">\s*Item: (.*?)\s*<br />\s*<div><font face=""Arial"" size=2>(.*?)</font></div>\s*</div>"
Dim objMatches, objMatch
Set objMatches = objRE.Execute(strHtml)
If objMatches.Count > 0 Then
objCmd.Parameters(0).Value = objMatches(0).SubMatches(0)
objCmd.Parameters(3).Value = objMatches(0).SubMatches(1) ' Replace(, "<br />", Chr(13)) ' now done in SQL!
Else
objCmd.Parameters(0).Value = "N/A"
objCmd.Parameters(3).Value = "N/A"
End If
Set objMatches = Nothing
' search price US
objRE.Pattern = "<strong>\s*Price:</strong>\s*<span id=""ctl00_content_Price"" style=""font-family:Arial;font-size:15pt;font-weight:bold;"">\$([0-9,.]+)</span>"
Set objMatches = objRE.Execute(strHtml)
If objMatches.Count > 0 Then
objCmd.Parameters(1).Value = objMatches(0).SubMatches(0)
Else
objCmd.Parameters(1).Value = 0 ' default/keep existing?
End If
Set objMatches = Nothing
' search price AUD
objRE.Pattern = "<span id='AUD'>\s*([0-9,.]+)\s*</span>"
Set objMatches = objRE.Execute(strHtml)
If objMatches.Count > 0 Then
objCmd.Parameters(2).Value = objMatches(0).SubMatches(0)
Else
objCmd.Parameters(2).Value = 0 ' ?
End If
Set objMatches = Nothing
' update database
objCmd.Execute , , adExecuteNoRecords
msg = msg & "updated " & str_sku & vbCrLf
Next
' clean up
Set objRE = Nothing
Set objXMLHTTP = Nothing
Set objCmd.ActiveConnection = Nothing
While objCmd.Parameters.Count > 0
objCmd.Parameters.Delete objCmd.Parameters.Count - 1
Wend
Set objCmd = Nothing
'MsgBox "Update complete"
End If
objConn.Close
Set objConn = Nothing
End Sub
ASKER
Haven't forgotten about this, been busy + trying to fix my excel error before i proceed.
Thanks for all your help thus far though.
Thanks for all your help thus far though.
No problem, take your time.
Yeah, I took a look at that other question but really haven't a clue what could cause that.
Didn't the vb script work or is that not an option?
Yeah, I took a look at that other question but really haven't a clue what could cause that.
Didn't the vb script work or is that not an option?
By the way, both in VBA and VBS you need to change the connectionstring to your own needs.
ASKER
Oh I need to
I have little VBA knowledge, so if you could explain what and how I go about doing that, it would be much appreciated.
change the connectionstring to your own needs.? I may have assumed it was usable straight off, how silly of me.
I have little VBA knowledge, so if you could explain what and how I go about doing that, it would be much appreciated.
This code:
objConn.ConnectionString = "Provider=SQLOLEDB;Data Source=.\SQLEXPRESS;User ID=ee;Password=ee"
needs to be adjusted to access your database, so maybe on a different server, different user/password or trusted connection.
ASKER
How would I go about determining any of this?
Do I need to install any other programs to do this? (SQL Express?)
Do I need to install any other programs to do this? (SQL Express?)
Well, yes. I use SQL express but any database would do I guess. In your post you stated you want to save the data in a database. Just saving it to excel is also a possibility.
ASKER
I did state that yes, but I wasn't 100% sure with what I wanted when I wrote that.
Just saving to Excel would be great though
Just saving to Excel would be great though
Ok, then you just need to decide on a procedure. Since the code now uses the database itself to determine which sku's to download, you need a different 'trigger'.
Actually, the first code I posted could be used because that's just downloading a specific array of sku's (and put it in Excel, not a database):
Another option would be to use the excel sheet as input for that as well, insert the sku's at the bottom of the list and the code could check for new sku's or update all rows every time. Any idea what would be a convenient way for you?
Actually, the first code I posted could be used because that's just downloading a specific array of sku's (and put it in Excel, not a database):
For Each str_sku In Array(93909, 93910)
Another option would be to use the excel sheet as input for that as well, insert the sku's at the bottom of the list and the code could check for new sku's or update all rows every time. Any idea what would be a convenient way for you?
Here is an example, updating where sku is filled and description is empty.
DX-post2.xls
DX-post2.xls
ASKER
Ok, by the looks of things, what you just gave me is pretty much absolutely perfect.
Its definitely usable for what I want, although just one more thing (if its possible to do):
Is there any way for it to, if, for example, the details of SKU 93909 have changed since the last time I clicked update, is there any way I could have it inform me thus (Eg. it saying "Updated" in column F)
I'm happy to manually delete the Updated field if its necessary, I was just wandering if it would be possible.
If its not possible, then what I have here will definitely do me will still. Thanks for all your help.
Its definitely usable for what I want, although just one more thing (if its possible to do):
Is there any way for it to, if, for example, the details of SKU 93909 have changed since the last time I clicked update, is there any way I could have it inform me thus (Eg. it saying "Updated" in column F)
I'm happy to manually delete the Updated field if its necessary, I was just wandering if it would be possible.
If its not possible, then what I have here will definitely do me will still. Thanks for all your help.
ASKER
Just found out that the site has changed its name and layout, I would love if you could check to see if everything would still work with the new layout. Given how much help you have been I'll open it in a new question and close this one.
Much appreciation.
Kemsypt.
New Question - https://www.experts-exchange.com/questions/27852625/Updating-Data-Excel.html
Much appreciation.
Kemsypt.
New Question - https://www.experts-exchange.com/questions/27852625/Updating-Data-Excel.html
There's not really a need for a new question. I'll adapt it here, no problem. Also I think the new question would not be clear to anybody what you mean there without a reference to this question, maybe better to just delete it.
The "updated" shouldn't be a problem. The design change gives one problem straight up and that's the url which can no longer be constructed with just the sku. Should there be some search action? Oh wait, I see it's still possible to use http://dx.com/p/93909
I happen to be a client on the site so maybe it's my settings but I completely lost the currency conversions so would it be an option to use a default conversion or get that from another website? (I've seen references to that on EE before).
The "updated" shouldn't be a problem. The design change gives one problem straight up and that's the url which can no longer be constructed with just the sku. Should there be some search action? Oh wait, I see it's still possible to use http://dx.com/p/93909
I happen to be a client on the site so maybe it's my settings but I completely lost the currency conversions so would it be an option to use a default conversion or get that from another website? (I've seen references to that on EE before).
For the update notification and url/design change here is a new version. For the currency I don't have a solution yet and await your answer regarding an alternative or maybe I'm missing something in the site settings.
DX-post3.xls
DX-post3.xls
ASKER
I talked to one of the Live Support people from DX and they informed me they don't have an AU currency conversion live on the new DX yet.
So I think I'll have have to make do with what I have until they introduce that feature.
So, just 2 more things then everything will be perfect:
1 - Could you integrate the Status feature into the original spreadsheet
2 - Would you be able to add, on column G, something to check whether the item is still available? (it says "Delivery: Item is temporarily sold out." if the item is not available)
So what would be perfect is if you could make it so both the Dealextreme and the DX spreadsheets have the Status feature and have the availability feature to.
Thanks substantially for your help.
So I think I'll have have to make do with what I have until they introduce that feature.
So, just 2 more things then everything will be perfect:
1 - Could you integrate the Status feature into the original spreadsheet
2 - Would you be able to add, on column G, something to check whether the item is still available? (it says "Delivery: Item is temporarily sold out." if the item is not available)
So what would be perfect is if you could make it so both the Dealextreme and the DX spreadsheets have the Status feature and have the availability feature to.
Thanks substantially for your help.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Brilliant help at every turn, very much appreciated.
ASKER
Out of curiosity, how difficult was this for you to do? Did it take long?
It didn't take too long, a couple of hours at the most. I know some people like to say they do stuff like this in 5 minutes but I wanted to test the changes every step, have felt embarrassed before posting something 'faulty'. I have done a number of questions similar to this so some code could be copied but each time it is of course a little different. Like I already admitted before, it would have taken longer if I had taken the time to really make the regex patterns more robust but not knowing exactly what your purpose with this is, I wanted to get something up and running first and it did seem to do the job nicely so I just thought I'd post what I had cooked up and see what your reaction was. Turns out I think we can both be happy with this, although there may be some changes needed when DX adds the currencies again or changes the layout. The best result would be of course if my code makes the process more clear for you and is easy enough to change when needed. But otherwise you know you have us here at EE to help again!
Open in new window