Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

The Database Forum


You are currently viewing our The Database Forum as a guest. Please register to participate.
Login



Reply
Cache creator & Crawler to detect dead links on website.
Old 06-07-2007, 04:09 PM Cache creator & Crawler to detect dead links on website.
ins
Novice Talker

Posts: 8
Trades: 0
Hi,

I have about 25000 external links on my community website resource database; how easy it is to create a crawler to automatically detect the dead links and report them back to admin for manual removal, and,

How to create and display cache for all these external links, in case the external website is down. Just like Google has.

If you can refer me to some free/cheap software or literature available, it would be great.

Thanks in advance!

Ins
ins is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 06-07-2007, 04:19 PM Re: Cache creator & Crawler to detect dead links on website.
Republikin's Avatar
Defies a Status

Posts: 3,189
Trades: 3
I've done something similar in the past with Snoopy. Although, it was never that amount of links or caching. PHP might not be optimal for that sort of thing anyways. Do you have a language of preference for this project?
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Republikin is offline
Reply With Quote
View Public Profile
 
Old 06-07-2007, 04:28 PM Re: Cache creator & Crawler to detect dead links on website.
ins
Novice Talker

Posts: 8
Trades: 0
Thanks Republikin,

The database is already in PHP/MySQL, and can't change it at this time. All I need is for this software to add a column called cache and store caches for all these links and then to regularly crawl the database to (ping each link?) and generate a report of dead links.

I have heard the script for this is very easy to generate, but I have no idea what and how. So, I am looking for third-party software.

Let me know if you know something on this one.

Thanks.

Ins
ins is offline
Reply With Quote
View Public Profile
 
Old 06-07-2007, 05:15 PM Re: Cache creator & Crawler to detect dead links on website.
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
Sounds like what you really need is a programmer. When you buy a 3rd party component, how will you integrate it with what you have now?
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 06-07-2007, 05:59 PM Re: Cache creator & Crawler to detect dead links on website.
ins
Novice Talker

Posts: 8
Trades: 0
Yes, having a programmer do this for my website would be better. That is why I ask how and how easy it would be, so I can make a better judgement on what to pay, what particular script guy to approach, and how long it will take. Can you give any ideas?

In addition, if I can check a commercially available script, it may not be as bad, becasue I don't think this needs to be integrated in my website (I may be wrong). Just get cache for all links once and put it in my database field, and just receive daily report of dead-links. I believe an outside script can do this.

Ins
ins is offline
Reply With Quote
View Public Profile
 
Old 06-07-2007, 06:22 PM Re: Cache creator & Crawler to detect dead links on website.
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
I don't know. If it was asp/aspx I could do it for a couple hundred bucks, I think. But I don't have any knowledge of PHP, and while .net code can easily talk to a mysql database and do what you need in terms of web interaction. But it would need Windows to run on.

Republican hangs out more in these coding forums than in places like Google and SEO, and so far as I can tell he's very competent at what he sets his mind to. I've only bumped into him so much, but it sounds like he's done this sort of thing before. If you think a coder is the right road to go down, he should be your first candidate.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 06-07-2007, 06:28 PM Re: Cache creator & Crawler to detect dead links on website.
Republikin's Avatar
Defies a Status

Posts: 3,189
Trades: 3
Thanks for the recommendation Learning Newbie. The price and time frame would be dependent a bit on what level of integration you need. If you have an admin backend that it would need to be integrated into this could cost some more time to do it right.

PM me send me the exact details; exactly what you need this script to do and not do. If I can't do it I can certainly help you find a competent and dependable programmer.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Republikin is offline
Reply With Quote
View Public Profile
 
Old 06-07-2007, 07:36 PM Re: Cache creator & Crawler to detect dead links on website.
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
Quote:
Originally Posted by Republikin View Post
The price and time frame would be dependent a bit on what level of integration you need. If you have an admin backend that it would need to be integrated into this could cost some more time to do it right.
This already displays more knowledge about Linux hosting and administration than I've built up over the course of my career. That's not because I'm stupid, I work in a different environment, but for comparisons sake, you know.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 06-10-2007, 08:37 AM Re: Cache creator & Crawler to detect dead links on website.
Average Talker

Posts: 27
Name: Mike Robinson
Location: London, England
Trades: 0
Why not just keep a status alongside each link in the database and the date & time it was set. If a page is not reachable then just set the flag to unavailable and don't display it. Keep checking that the page is unavailable and, perhaps after 3 weeks, delete it.

Why bother keeping a cache of pages? if the page is no longer available then just don't show it. If you want to cache 25K pages then I suspect you might need a lot more storage.

25k links is quite a lot of links for a community web site! Remember that if you have to read all those pages each day/week/month then your web usage (I forget the proper word) costs are likely to mount.

Mike
mike_bike_kite is offline
Reply With Quote
View Public Profile Visit mike_bike_kite's homepage!
 
Old 06-11-2007, 01:11 PM Re: Cache creator & Crawler to detect dead links on website.
Learning Newbie's Avatar
Defies a Status

Latest Blog Post:
Astounding Republican Paranoia
Posts: 5,662
Name: John Alexander
Trades: 0
I agree 25 K is a lot for a community site, a portal, even a normal directory.

But I don't think it would take a huge amount of space, at least if done well. If you figure 20 KB per page on average, that amounts to half a meg, although storing long character data like that in a relational, structured database will balloon the figure.
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Learning Newbie is offline
Reply With Quote
View Public Profile
 
Old 06-11-2007, 09:11 PM Re: Cache creator & Crawler to detect dead links on website.
Average Talker

Posts: 27
Name: Mike Robinson
Location: London, England
Trades: 0
But why store pages that no longer exist? I can understand only wanting to show users good links but I can't see the benefit of showing the user the contents of a page that no longer exists. They should decide whether they want to build a community site that has good links or whether they want to cache the Internet.

Interfacing to the existing system may be easier if they leave the current links table alone (assuming there is just one table full of links) and set up a parallel table containing each link, it's current status, time it was last checked and time it was last ok. You'd want a batch to run periodically to keep the two tables in sync, deleting bad links from the original table, putting them back if they come back on line later and deleting them permanently from both tables if they don't come back online in a given time frame.

For a laugh I once had a go at a program to check peoples home pages and it produced ok reports to show bad links, spelling mistakes etc etc. If you're patient it should still work http://www.checkmypages.com . It does limit the number of warnings and the number of pages it will check though. Company sites tend not to allow web robots to go reading through their pages so it's better for user home pages.

Mike

Last edited by mike_bike_kite; 06-12-2007 at 08:05 AM.. Reason: corrected link
mike_bike_kite is offline
Reply With Quote
View Public Profile Visit mike_bike_kite's homepage!
 
Old 06-12-2007, 12:21 AM Re: Cache creator & Crawler to detect dead links on website.
ForrestCroce's Avatar
Half Man, Half Amazing

Posts: 3,023
Name: Forrest Croce
Location: Seattle, WA
Trades: 0
Quote:
Originally Posted by mike_bike_kite View Post
Interfacing to the existing system may be easier if they leave the current links table alone (assuming there is just one table full of links) and set up a parallel table containing each link, it's current status, time it was last checked and time it was last ok. You'd want a batch to run periodically to keep the two tables in sync, deleting bad links from the original table, putting them back if they come back on line later and deleting them permanently from both tables if they don't come back online in a given time frame.
You could do that using triggers in SQL Server and Oracle; I would assume, or at least hope, that MySQL would let you do the same. That lets you attach a baby stored procedure to table updates, so as soon as it happens, the parallel table is synced.
__________________

Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
ForrestCroce is offline
Reply With Quote
View Public Profile Visit ForrestCroce's homepage!
 
Old 06-12-2007, 05:07 PM Re: Cache creator & Crawler to detect dead links on website.
Average Talker

Posts: 27
Name: Mike Robinson
Location: London, England
Trades: 0
Quote:
You could do that using triggers in SQL Server and Oracle; I would assume, or at least hope, that MySQL would let you do the same.
Yes - MySQL version 5 onwards has triggers and stored procs.

Must admit I hate using triggers though (on any database system) - I always feel they hide the logic of what's going on. My personal preference is to nearly always go with stored procs as it's easy to see what's happening. I also find it easier to maintain the performance of systems when the data grows.

Systems with complicated cascading triggers are usually a nightmare to debug. I remember 20 years ago being responsible for the testing of an intelligent financial database where everything happened as if by magic using triggers. You'd add a row here and have no idea what other tables had changed unless you read through all the trigger code - it just put me off for life!

Mike
mike_bike_kite is offline
Reply With Quote
View Public Profile Visit mike_bike_kite's homepage!
 
Reply     « Reply to Cache creator & Crawler to detect dead links on website.
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.39474 seconds with 12 queries