Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

Content & Writing Tycoon


You are currently viewing our Content & Writing Tycoon as a guest. Please register to participate.
Login



Reply
Old 07-31-2006, 06:18 PM scraping?
$100 - $999 Monthly

Posts: 91
Trades: 0
How does scraping work? I'm looking for a method to automatically update my clothing list for my clothing site and someone suggested it on a diff forum.

for instance, how do I scrape each clothing item, their brand, their price, and the url in which it is located on the site?
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
mason is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 07-31-2006, 06:20 PM
Karl89.co.uk's Avatar
Novice Talker

Posts: 7
Trades: 2
Well you would make a script to do it for you. Ones I have done have been scripted in PHP and I've used them to gather a lot of valuable information and content.
Karl89.co.uk is offline
Reply With Quote
View Public Profile
 
Old 07-31-2006, 06:28 PM
$100 - $999 Monthly

Posts: 91
Trades: 0
i just don't understand how a script can "pull" information from other sites. I prob just need to look at some code examples to get a better idea.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
mason is offline
Reply With Quote
View Public Profile
 
Old 07-31-2006, 06:28 PM
Karl89.co.uk's Avatar
Novice Talker

Posts: 7
Trades: 2
It's easy enough to do, there must be some code examples around. I'll see if I can find any.
Karl89.co.uk is offline
Reply With Quote
View Public Profile
 
Old 07-31-2006, 06:51 PM
Lpspider's Avatar
Seniority Minority

Posts: 1,534
Trades: 0
Interesting... I'd never really heard of this until now. (well, I heard of it but didn't know what exactly it was).
__________________
AdminAddict.com -
Please login or register to view this content. Registration is FREE
Lpspider is offline
Reply With Quote
View Public Profile Visit Lpspider's homepage!
 
Old 07-31-2006, 06:58 PM
Karl89.co.uk's Avatar
Novice Talker

Posts: 7
Trades: 2
Quote:
Originally Posted by Lpspider View Post
Interesting... I'd never really heard of this until now. (well, I heard of it but didn't know what exactly it was).
I'm about to create one now to pull lots and lots of images from a website.
Karl89.co.uk is offline
Reply With Quote
View Public Profile
 
Old 07-31-2006, 07:12 PM
Lpspider's Avatar
Seniority Minority

Posts: 1,534
Trades: 0
Quote:
Originally Posted by Karl Evans View Post
I'm about to create one now to pull lots and lots of images from a website.
Really... that would be... very beneficial. Something very beneficial indeed... hehe.

Seriously, any chance I could get my hands on that? :devil:
__________________
AdminAddict.com -
Please login or register to view this content. Registration is FREE
Lpspider is offline
Reply With Quote
View Public Profile Visit Lpspider's homepage!
 
Old 07-31-2006, 07:32 PM
Karl89.co.uk's Avatar
Novice Talker

Posts: 7
Trades: 2
Quote:
Originally Posted by Lpspider View Post
Really... that would be... very beneficial. Something very beneficial indeed... hehe.

Seriously, any chance I could get my hands on that? :devil:
Haha. Maybe.

I'll see how many images it pulls and things first, see how much space it uses. :P
Karl89.co.uk is offline
Reply With Quote
View Public Profile
 
Old 07-31-2006, 09:16 PM
Lpspider's Avatar
Seniority Minority

Posts: 1,534
Trades: 0
Quote:
Originally Posted by Karl Evans View Post
Haha. Maybe.

I'll see how many images it pulls and things first, see how much space it uses. :P
Well, tell me how it goes regardless. :P
__________________
AdminAddict.com -
Please login or register to view this content. Registration is FREE
Lpspider is offline
Reply With Quote
View Public Profile Visit Lpspider's homepage!
 
Old 07-31-2006, 09:26 PM
Tran's Avatar
Junior Talker

Posts: 1,223
Trades: 3
To find out how to "scrape" all that information..

You would have to figure out what to parse from the whole page..
Tran is offline
Reply With Quote
View Public Profile
 
Old 08-01-2006, 12:27 AM
Junior Talker

Posts: 60
Trades: 0
I think you can use file_get_contents() in PHP, or fopen() on an exact url (http://...)

Then all your basic regex functions, and string functions to search for things, like <img etc...at least thats how I'd go about doing it.

Of course, this only gets the HTML code

I might make a proper script later on that I can pretty much do whatever with, have some checkboxes to grab all the images and whatnot.
Decepti0n is offline
Reply With Quote
View Public Profile
 
Old 08-01-2006, 06:23 PM
Junior Talker

Posts: 12
Trades: 1
Perl is the classic programming language for scraping.

Check out Perl and LWP and Spidering Hacks by Oreilly for more info.
imported_alan is offline
Reply With Quote
View Public Profile
 
Old 08-02-2006, 12:10 AM
Lpspider's Avatar
Seniority Minority

Posts: 1,534
Trades: 0
Does anyone have a practical way to do this - namely images?
__________________
AdminAddict.com -
Please login or register to view this content. Registration is FREE
Lpspider is offline
Reply With Quote
View Public Profile Visit Lpspider's homepage!
 
Old 08-02-2006, 02:02 AM
Junior Talker

Posts: 138
Trades: 0
I hate to be the 'whitehat' here, but has anyone even thought about the legalities?
imported_Max is offline
Reply With Quote
View Public Profile
 
Old 08-02-2006, 11:09 AM
Shpigford's Avatar
Super Talker

Posts: 108
Trades: 0
Quote:
Originally Posted by Max View Post
I hate to be the 'whitehat' here, but has anyone even thought about the legalities?
It's illegal. Unless the author of any content you "scrape" has given you permission to do so, it's illegal in every way.
__________________
Josh.
Shpigford is offline
Reply With Quote
View Public Profile
 
Old 08-02-2006, 12:18 PM
$100 - $999 Monthly

Posts: 284
Trades: 1
Look, I can think of some legitimate uses of scrapping a website. In fact, one of my projects would greatly benefit from the ability to scrape because it is just a content repositories for other sites who want to be on my site. Some of these sites do not have RSS, so that leaves either email or scraping to automate the content retrevial process.

Then there is wikipedia which uses the Creative Commons license. You can scrape all you want as long as you credit the source.

So, I want to know about scrapping. It is not a black or white issue. It is a grey area and some of us have legitimate uses for it.
__________________
Atomm

Please login or register to view this content. Registration is FREE
: New and Improved Ad Blending!
Advertisepedia
Please login or register to view this content. Registration is FREE
Atomm is offline
Reply With Quote
View Public Profile Visit Atomm's homepage!
 
Old 08-02-2006, 12:25 PM
Shpigford's Avatar
Super Talker

Posts: 108
Trades: 0
Quote:
Originally Posted by Atomm View Post
It is not a black or white issue. It is a grey area and some of us have legitimate uses for it.
Umm...no, it IS black or white. It really is a simple case of unless you have permission (whether it be via Creative Commons or the author themselves), it is illegal.
__________________
Josh.
Shpigford is offline
Reply With Quote
View Public Profile
 
Old 08-02-2006, 03:36 PM
$1,000 - $4,999 Monthly

Posts: 206
Trades: 1
Ignore this, I screwed up the post
__________________
Confessions of an Affiliate marketing Virgin

Please login or register to view this content. Registration is FREE
lyndoman is offline
Reply With Quote
View Public Profile Visit lyndoman's homepage!
 
Old 08-02-2006, 03:39 PM
$1,000 - $4,999 Monthly

Posts: 206
Trades: 1
Hmmm, don't remember giving Google permission to put my site in their cache?

So are you saying Google is breaking the law?

Or is it the same old thing that if you have tons of cash you can do whatever the hell you like?

btw Google, if you read this, I really don't mind that you pinch my stuff. So keep sending the peeps.
__________________
Confessions of an Affiliate marketing Virgin

Please login or register to view this content. Registration is FREE
lyndoman is offline
Reply With Quote
View Public Profile Visit lyndoman's homepage!
 
Old 08-02-2006, 03:52 PM
Shpigford's Avatar
Super Talker

Posts: 108
Trades: 0
If you don't want your page cached by Google: http://www.google.com/support/webmas...y?answer=35306
__________________
Josh.
Shpigford is offline
Reply With Quote
View Public Profile
 
Reply     « Reply to scraping?

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 1.08433 seconds with 12 queries