Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

The Google Forum


You are currently viewing our The Google Forum as a guest. Please register to participate.
Login



Reply
how to unsubmit a site map to google?
Old 05-23-2010, 10:46 AM how to unsubmit a site map to google?
Novice Talker

Posts: 10
Name: Michael
Trades: 0
I made a mistake and I'm hoping someone can help me undo it.

I had a test version of my site running with ten thousand fake user-created pages. Using google webmaster tools, I mistakenly submitted an auto-generated sitemap.xml that had individual links to those ten thousand fake user pages. I soon discovered my error and removed that bogus sitemap.xml and resubmitted a sitemap.xml using webmaster tools, this time containing only my real page links (a much smaller number, about 20).

The problem is that googlebot did not forgot about those 10,000 bogus links, and continued to crawl them, even weeks later, at the rate of about one every 20 or 30 seconds. This is bad, but worse is that all these bogus links seems to be keeping it from ever getting around to crawling my real links.

As I say, I submitted an updated sitemap with only the real links. When I saw that googlebot was still crawling those bogus URLs, even after multiple weeks, I also blocked in robots.txt the particular URL it was going to (the bogus URLs were of the form http://mysite.com/find.aspx?...):

Disallow: /find.aspx

Now I see that googlebot is no longer requesting those find.aspx pages. However, I also notice that it's still not getting around to crawling any of my (20 or so) real pages. So my theory is that it's still spending all its time processing all those bogus URLs, just not actually doing the gets of my pages because it's respecting robots.txt.

So the question is: how can I get google to forget that those 10,000 URLs ever existed? I did of course read google's help about "removing a URL". That's what led me to put in the Disallow above. However, I'm unsure that will work to make it forget about the URLs because the actual URLs that were submitted were of the form: http://mysite.com/find.aspx?param1=val1&param2=val2, where those param values varied for each of the fake user pages. I don't know if putting the 10,000 specific full URL for each of the bogus users into robots.txt file would help matters.

I understand that some of this just takes time to work its way out of the system, but google's been remembering those bogus URLs for a quite a long time...
mkatz is offline
Reply With Quote
View Public Profile
 
 
Register now for full access!
Old 05-23-2010, 10:54 AM Re: how to unsubmit a site map to google?
chrishirst's Avatar
Missing! presumed drunk.

Posts: 42,385
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
Permanent redirect (301 response) the "fake" URIs to one page
__________________
Chris. ->>
Please login or register to view this content. Registration is FREE
<<-

A foolish consistency is the hobgoblin of little minds
Thought for today:- Is SEO the only industry where all the cowboys are Indians?
chrishirst is offline
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Old 05-23-2010, 11:25 AM Re: how to unsubmit a site map to google?
Novice Talker

Posts: 10
Name: Michael
Trades: 0
Thanks for the super quick reply. That makes sense. I'm actually just going to return a 410 Gone whenever googlebot tries to get one of these find.aspx pages. (And of course I'm going to re-allow gets of those find.aspx pages in robots.txt so it can learn they are gone.) That should work, right?

if ( Request.UserAgent.ToLower().Contains( "googlebot" ) )
{
Master.Log( "FindPage Page_Load telling googlebot that the following URL is gone (410): " + Request.Url );
Response.Status = "410 Gone";

return;
}
mkatz is offline
Reply With Quote
View Public Profile
 
Old 05-23-2010, 12:22 PM Re: how to unsubmit a site map to google?
chrishirst's Avatar
Missing! presumed drunk.

Posts: 42,385
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
Never tested the effect of a 410 response on SEs, they do treat a 404 response as "Missing In Action" and will keep on looking for many many months before giving up completely.
__________________
Chris. ->>
Please login or register to view this content. Registration is FREE
<<-

A foolish consistency is the hobgoblin of little minds
Thought for today:- Is SEO the only industry where all the cowboys are Indians?
chrishirst is offline
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Old 05-23-2010, 02:12 PM Re: how to unsubmit a site map to google?
Novice Talker

Posts: 10
Name: Michael
Trades: 0
Well, at least as documented 410 seems like what I need.

To follow up on your other suggestion, supposing I did redirect all the bad URLs to a single page, how exactly would that help? Unless that page ultimately gave a 410, why would it stop looking there? Or is the idea that that single page would be disallowed in robots.txt?
mkatz is offline
Reply With Quote
View Public Profile
 
Old 05-23-2010, 03:27 PM Re: how to unsubmit a site map to google?
chrishirst's Avatar
Missing! presumed drunk.

Posts: 42,385
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
The 301 response will remove the "bad" URIs from the index in around six to eight weeks whereas a 404 response will keep them hanging around for months.

Yes, as documented a 410 response would be correct and should accomplish what you need, unfortunately the SEs are not noted for following HTTP responses as precisely as they are documented.
__________________
Chris. ->>
Please login or register to view this content. Registration is FREE
<<-

A foolish consistency is the hobgoblin of little minds
Thought for today:- Is SEO the only industry where all the cowboys are Indians?
chrishirst is offline
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Old 05-23-2010, 03:43 PM Re: how to unsubmit a site map to google?
benicio's Avatar
Ultra Talker

Posts: 472
Trades: 0
you can just delete it in the google webmasters tool too.
__________________

Please login or register to view this content. Registration is FREE
|
Submit Your Articles in a
Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE

Please login or register to view this content. Registration is FREE
benicio is offline
Reply With Quote
View Public Profile Visit benicio's homepage!
 
Old 05-23-2010, 04:06 PM Re: how to unsubmit a site map to google?
Novice Talker

Posts: 10
Name: Michael
Trades: 0
Thanks benicio. I have indeed removed

http://mysite.com/find.aspx

using google's Remove URL -- but! my question is: if the actual URLs that were included in the mistaken sitemaps were of the form

http://mysite.com/find.aspx?p1=v1&p2=v2& ...

will my removal of http://mysite.com/find.aspx remove those parameterized variants also?

And if not, how do I submit thousands of URLs to the Remove URL tool? The UI I see makes it look like I have to manually submit them one at a time?
mkatz is offline
Reply With Quote
View Public Profile
 
Old 05-23-2010, 09:26 PM Re: how to unsubmit a site map to google?
Extreme Talker

Posts: 219
Name: Tom
Trades: 0
If the URLs never existed (actually returned a 404) then they cannot possibly exist in Google's index. So there is really no need to 301 redirect them. Using the URL removal tool is not going to solve your problem either because, again, the pages shouldn't be in the index.

You're simply seeing old crawl errors from previous crawls (NOT the current crawling of your site). WMT doesn't get updated real time. Many of the screens are updated weekly or biweekly... or even less frequently. So just ignore the errors. They will roll out of Google's webmaster tools.

I would check out your analytics (if you have some) and just make sure your server is not currently seeing requests for those old bogus pages. You should be able to simply scan the list of pages resulting in 404 status codes to see if it's still a problem. But it's not likely...
__________________

Please login or register to view this content. Registration is FREE
-
Please login or register to view this content. Registration is FREE

Social-Media is offline
Reply With Quote
View Public Profile
 
Old 05-24-2010, 04:23 AM Re: how to unsubmit a site map to google?
Bompa's Avatar
Extreme Talker

Posts: 225
Location: The Philippine Islands
Trades: 0
Quote:
Originally Posted by mkatz View Post

Disallow: /find.aspx

Now I see that googlebot is no longer requesting those find.aspx pages.
However, I also notice that it's still not getting around to crawling any of my (20 or so) real pages. So my theory is that it's still spending all its time processing all those bogus URLs, just not actually doing the gets of my pages because it's respecting robots.txt.
Processing 10,000 URLs that it is not allowed to crawl?

Ditch that theory and take a chill pill.

Bompa
__________________

Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
Bompa is offline
Reply With Quote
View Public Profile Visit Bompa's homepage!
 
Old 05-25-2010, 03:06 AM Re: how to unsubmit a site map to google?
Experienced Talker

Posts: 46
Name: sakura
Location: china
Trades: 0
You can resubmit a sitemap with the same name which is right. It is not very hard.
__________________

Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
|
Please login or register to view this content. Registration is FREE
welljy is offline
Reply With Quote
View Public Profile Visit welljy's homepage!
 
Old 06-01-2010, 02:43 AM Re: how to unsubmit a site map to google?
Novice Talker

Posts: 10
Name: Michael
Trades: 0
Thanks to everyone for your feedback. Unfortunately my problem is still not solved. Here is a bit more information.

I did use the google Remove URL tool to successfully Remove all of my find.aspx?... URLs. I also used the google parameters tool to effectively say that the various parameters I had used for find.aspx didn't matter. So all that seemed to help with the find.aspx URLs -- not being crawled any more, and no longer in the index.

However, there is a second set of problematic URLs that I was only reminded of once the find.aspx problem clear up. For some reason, before I Removed find.aspx, searching on "site:trainerlist.com" only gave about 300 results. But once find.aspx was Removed, searching on "site:trainerlist.com" suddenly gave over 10,000 results. These results are second category of URL, which is more problematic.

My site uses URL redirection to allow users to create custom URLs. For instance, Fred Smith can create http://trainerlist.com/fredsmith, which internally redirects to http://trainerlist.com/trainer?name=fredsmith. So, most unfortunately, a URL for each of my ten thousand load balance test users was present in that mistakenly submitted sitemap, with these randomly generated names, all of the form http://trainerlist.com/fredsmith, http://trainerlist.com/bobjones, or whatever -- ten thousand of them. D'oh!

So I'm kind of back where I was before, but my problem is worse because I don't have a single page that I can use the Remove tool with. It seems like my options are:

(a) Submit these 10,000 bogus URLs one by one to the Remove tool. That seems painful, and probably a misuse of the Remove tool.

(b) Just don't worry about it, and continue to return 410 for those bogus user pages, and eventually they will drop out of the index/crawl. That sounds okay, but currently google is only hitting those bogus pages about one per hour. So that sounds like long time to wait, especially if it takes more than one 410 to convince it that a given page is really gone.

(c) Remove my entire site.

I am currently trying (c). I used the Remove tool to remove all of trainerlist.com. I did that, and within hours "site:trainerlist.com" gave zero results. Great, I think, they've cleared me out of the index. So then I clicked the button to "Cancel" the Remove, thinking that now it could start reindexing from scratch. But Cancel just caused the ten-thousand+ hits to come back (i.e., it actually did what it said -- canceled the remove from ever happening). So now I have Removed the whole site once again, and my hope is that by waiting a bit longer I really can get it to remove all the indexed URLs, and then I can take off the removal and start from scratch. But I don't even know if that's how it will work. And even if it works, I don't know if this will take any less time than option (b) of just letting the 410s take care of it.

Any help appreciated. Hopefully this documentation of my boneheaded move and it's possible solution will be helpful to someone else...
mkatz is offline
Reply With Quote
View Public Profile
 
Old 06-01-2010, 08:20 AM Re: how to unsubmit a site map to google?
chrishirst's Avatar
Missing! presumed drunk.

Posts: 42,385
Name: Chris Hirst
Location: Blackpool. UK
Trades: 0
Well it's only been a FEW days!

It takes WEEKS sometime MONTHS.

When it gets to the end of August and nothing has happened then start wondering.
__________________
Chris. ->>
Please login or register to view this content. Registration is FREE
<<-

A foolish consistency is the hobgoblin of little minds
Thought for today:- Is SEO the only industry where all the cowboys are Indians?
chrishirst is offline
Reply With Quote
View Public Profile Visit chrishirst's homepage!
 
Reply     « Reply to how to unsubmit a site map to google?
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.66774 seconds with 12 queries