Tycoon Talk
Become a Big fish!
The number 1 forum for online business!
Post topics, ask questions, share your knowledge.
Tycoon Talk is part of Freelancer.com - find skilled workers online at a fraction of the cost.

PHP Forum


You are currently viewing our PHP Forum as a guest. Please register to participate.
Login



Freelance Jobs

Reply
help to cut up robots file and get contents..
Old 11-03-2007, 08:15 PM help to cut up robots file and get contents..
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
okay i basically made a robots.txt maker to go in the admin of the cms i doing, and basically at the moment it has to textarea one to enter disallowed dirs and files. new line for each

and one for banned bots

i basically done it and it would produce something like this:

Code:
#DISALLOWED DIR
User-agent: *
Disallow: /admin/
Disallow: /includes/
Disallow: /modules/
Disallow: /docs/
Disallow: /dev/
Disallow: /zips/
Disallow: /themes/

#BANNED BOTS
user-agent: randombot
Disallow: /
user-agent: randombot2
Disallow: /
user-agent: randombot3
Disallow: /
i need however to be able to un do it all so i can display them in the textareas.

like so i need to be able to make that become

Bannedbots textarea:

randombot
randombot2
randombot3

Disallowed dir/files:

/admin/
/includes/
/modules/
/docs/
/dev/
/zips/
/themes/

how can i do this?

Thanks,
Dan
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
 
Register now for full access!
Old 11-04-2007, 01:11 AM Re: help to cut up robots file and get contents..
Extreme Talker

Posts: 238
Location: United States
Trades: 0
I'm really bad at explaining things, so I just wrote it.

PHP Code:
$directories = array(); // Array to contain disallowed directories
$bots = array(); // Array to contain disallowed bots

$robots file_get_contents('robots.txt');
if (
$robots === false)
    die(
'Error: file cannot be read.');
$lines split("\n"$robots);

$dirFlag false// flag to look for directories
foreach ($lines as $line){ // Loop through each line
    
if (stripos($line'User-agent: *') === 0){
        
$dirFlag true// If we see User-agent: *, then we know we are looking for disallowed directories
    
}elseif (stripos($line'User-agent:') === 0){
         
$dirFlag false//  If we see User-agent without the *, we want bots, not disallowed directories
        
$bots[] = trim(substr($line11));
    }elseif (
stripos($line'Disallow:') === && $dirFlag){
        
$directories[] = trim(substr($line9));
    }
    
// Ignore all other lines

__________________
The interlocking pieces of web development: usability, performance, accessibility, and standards.
frost is offline
Reply With Quote
View Public Profile
 
Old 11-04-2007, 09:15 AM Re: help to cut up robots file and get contents..
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
THANKS...

But whats the two variables i need to use then?! lol i cant see which one!
im probably being really stupid.

could you just point em out and ill top up your tp
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 11-04-2007, 10:08 AM Re: help to cut up robots file and get contents..
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
okay ignore that i was being stupid.

But im getting undefined function fro stripos because my stupid host STILL hasnt bloody got PHP5 so whats the PHP 4.X alternative?

Thanks,
Dan
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 11-04-2007, 10:20 AM Re: help to cut up robots file and get contents..
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
o wait no i just did a google and found someone wrote a little fix for it

PHP Code:
if (!function_exists("stripos")) {
    function 
stripos($haystack$needle$offset=0) {
        return 
strpos(strtolower($haystack), strtolower($needle), $offset);
    }

</SPAN>

from what i can tell it basically makes the function
okay now im not getting the error

But instead of it getting whats in the file it is just showing Array in both textareas AND it appears to be deleting the file contents!

it does seem to be saving it but deletes onload

okay ill post what i have below, please not the above function to fix the stripos() problem is in my functions file and becasuse the rebots editor is included into another file it gets included... if that makes sense

PHP Code:
<?php
     
//Open up the file  
  
$fh fopen(DOC_ROOT."/robots.txt""w+");
if(
$_POST['submit'])
{
$disallowed_dir explode("\n"$_POST['disallowed_dir']);
$disallowed "User-agent: * \n";
foreach(
$disallowed_dir as $line)
{
$disallowed.= "Disallowed: $line \n";
}
$disallowed.= "\n\n";
######
$bannedbots explode("\n"$_POST['bannedbots']);

foreach(
$bannedbots as $bot)
{
$bannedbots "User-agent: $bot \n";
$bannedbots.= "Disallowed: \ \n\n";
}
$robotstxt "
#Disallowed Dirs and Files \n\n
$disallowed
#Banned Bots \n
$bannedbots";
     
//Write to the file
     
fwrite($fh"$robotstxt");
//End if 
     
else { 
$directories = array(); // Array to contain disallowed directories
$bots = array(); // Array to contain disallowed bots
$robots file_get_contents(DOC_ROOT.'/robots.txt');
if (
$robots === false)
    die(
'Error: file cannot be read.');
$lines split("\n"$robots);
$dirFlag false// flag to look for directories
foreach ($lines as $line){ // Loop through each line
    
if (stripos($line'User-agent: *') === 0){
        
$dirFlag true// If we see User-agent: *, then we know we are looking for disallowed directories
    
}elseif (stripos($line'User-agent:') === 0){
         
$dirFlag false//  If we see User-agent without the *, we want bots, not disallowed directories
        
$bots[] = trim(substr($line11));
    }elseif (
stripos($line'Disallow:') === && $dirFlag){
        
$directories[] = trim(substr($line9));
    }
    
// Ignore all other lines
}  

echo 
'<form action="" method="post">
Disallowed folders and files.
<textarea name="disallowed_dir" cols="50" rows="10">
'
.$directories.'
</textarea>
<br /><br />
Banned Bots
<textarea name="bannedbots" cols="50" rows="10">
'
.$bots.'
</textarea>
<br /><br />
<input type="submit" name="submit" value="Save" />
</form>'
;
}
     
//Close the file up
     
fclose($fh);
?>


So whats WRONG?! ARGH
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE

Last edited by dansgalaxy; 11-04-2007 at 10:26 AM..
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 11-04-2007, 10:47 AM Re: help to cut up robots file and get contents..
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
Another error: when it adds the banned bots, it just adds the last one on the list it seems and ignores the rest no clue why ANYONE
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 11-04-2007, 12:00 PM Re: help to cut up robots file and get contents..
jamestl2's Avatar
No scale-itch here...

Latest Blog Post:
Wordpress Relative URLs Plugin
Posts: 2,389
Name: <member type="brilliant" alt="foolish">James Lewitzke</member>
Location: / public_html / Universe / Virgo_Supercluster / Local_Group / Milky_Way / Orion_Arm / Solar_System / Earth / North_America / USA / Wisconsin
Trades: 0
It’s tough to tell without looking at your server directory hierarchy. Are all the folders you listed at the root of the directory?

I’m not exactly sure what changes the coding made (I’m a coding noob), but remember that you can also allow robots into certain areas, for example:

Code:
  user-agent: randombot1
  disallow: /
  allow: /textarea1
This would block the bot from all areas of the directory except textarea1
__________________
Engipress -
Please login or register to view this content. Registration is FREE


Please login or register to view this content. Registration is FREE
for Wordpress Projects
jamestl2 is offline
Reply With Quote
View Public Profile Visit jamestl2's homepage!
 
Old 11-04-2007, 12:49 PM Re: help to cut up robots file and get contents..
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
i know but theres no need, the bot blocker is basically so if i get any bad bots they can be blocked from the server

and the disallow is for folders which contain admin stuff etc

okay heres the latest version of the script, and i think i know the part which is causeing problems.

PHP Code:
<?php

if($_POST['submit'])
{
$disallowed_dir explode("\n"$_POST['disallowed_dir']);
$disallowed "User-agent: * \n";
foreach(
$disallowed_dir as $line)
{
$disallowed.= "Disallowed: $line";
}
$disallowed.= "\n\n";

######
$bannedbots explode("\n"$_POST['bannedbots']);
$banned_bots '';
foreach(
$bannedbots as $bot)
{
$banned_bots.= "User-agent: $bot \n";
$banned_bots.= "Disallowed: \ \n";
}
$robotstxt "
#Disallowed Dirs and Files \n\n
$disallowed
#Banned Bots \n
$banned_bots";
     
//Open up the file  
      
$fh fopen(DOC_ROOT."/robots.txt""w");
     
//Write to the file
     
fwrite($fh"$robotstxt");
       
//Close the file up
     
fclose($fh);
  echo 
'Saved'
//End if 
     
else { 
$directories = array(); // Array to contain disallowed directories
$bots = array(); // Array to contain disallowed bots
$robots file_get_contents(DOC_ROOT.'/robots.txt');
if (
$robots === false)
    die(
'Error: file cannot be read.');
$lines explode("\n"$robots);
$dirFlag false// flag to look for directories
foreach ($lines as $line){ // Loop through each line
    
if (stripos($line'User-agent: *') === 0){
        
$dirFlag true// If we see User-agent: *, then we know we are looking for disallowed directories
    
}elseif (stripos($line'User-agent:') === 0){
         
$dirFlag false//  If we see User-agent without the *, we want bots, not disallowed directories
        
$bots[] = trim(substr($line11));
    }elseif (
stripos($line'Disallow:') === && $dirFlag){
        
$directories[] = trim(substr($line9));
    }
    
// Ignore all other lines
}
  
$bots implode("\n",$bots);
     
$directories implode("\n",$directories);
 
 
echo 
'<form action="" method="post">
Disallowed folders and files.
<textarea name="disallowed_dir" cols="50" rows="10">
'
.$directories.'
</textarea>
<br /><br />
Banned Bots
<textarea name="bannedbots" cols="50" rows="10">
'
.$bots.'
</textarea>
<br /><br />
<input type="submit" name="submit" value="Save" />
</form>'
;
}
//End else

?>
and i think this part is causing my problems, it is now saving without any problems but it isnt displaying the disallowed dirs

and i think this is the problem:
PHP Code:
$dirFlag false// flag to look for directories
foreach ($lines as $line){ // Loop through each line
    
if (stripos($line'User-agent: *') === 0){
        
$dirFlag true// If we see User-agent: *, then we know we are looking for disallowed directories
    
}elseif (stripos($line'User-agent:') === 0){
         
$dirFlag false//  If we see User-agent without the *, we want bots, not disallowed directories
        
$bots[] = trim(substr($line11));
    }elseif (
stripos($line'Disallow:') === && $dirFlag){
        
$directories[] = trim(substr($line9));
    } 
because as far as i can tell it IS still picking up the user-agent: * for both because it doesnt get it if you know what i mean so i think i need to have it like

user-agent: [a-z][A-Z][0-9] so on the it knows its about a banned bot and not * because the bot name would start with something other than * so it needs to like show that?

but im clueless as to how.

Thanks,
Dan
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 11-05-2007, 10:25 PM Re: help to cut up robots file and get contents..
Extreme Talker

Posts: 238
Location: United States
Trades: 0
Strpos doesn't use regular expressions, so * should only match the character *. Is your list showing a bunch of disallowed bots as "*" or a bunch of disallowed directories as "/"? I've tested the section of code that reads the file both on PHP 5 and PHP 4 (after I replaced the stripos's) and it seems to work properly for me. Maybe it's something to do with the format of robots.txt that I missed. I have no idea though.

Actually, if it's not picking up the directories and it's only picking up the bots, then it is either NOT picking up User-agent: * not picking up Disallow:.

And if you want to do the [a-z][A-Z] etc. thing, then you will have use preg_match(). preg_match() can be used in this case- I just chose not to.
__________________
The interlocking pieces of web development: usability, performance, accessibility, and standards.
frost is offline
Reply With Quote
View Public Profile
 
Old 11-06-2007, 12:00 PM Re: help to cut up robots file and get contents..
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
im not sure to tell you the truth, so it works for you?

u can see my robots here: http://calm.dansgalaxy.co.uk/robots.txt
but on the admin of it i have it showing

Array() the blank textarea which the directories are in, and then the textarea with the bots

and im quite clueless now.
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 11-06-2007, 02:44 PM Re: help to cut up robots file and get contents..
Extreme Talker

Posts: 238
Location: United States
Trades: 0
Ah, I see the problem now. The correct way to disallow a directory is to use "Disallow: /blah". Your script is writing them as "Disallowed: /blah".
__________________
The interlocking pieces of web development: usability, performance, accessibility, and standards.
frost is offline
Reply With Quote
View Public Profile
 
Old 11-06-2007, 03:23 PM Re: help to cut up robots file and get contents..
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
YAY.. but.

okay after tweaking, i have got it almost right, thanks frost for pointing that out it gave me the corners and sides of the puzzle!

Now heres the but.

its working and its reading it fine.

But it seems to add a \n when it saves to both fields so im getting a extra blank
user-agent:
Disallow:

and Disallow: on the bots and Disallowed dirs bits

;/ so how do i make it delete unneeded \n from the end of the bits before saving??

could trim do this im not sure about \n and the trim function.
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 11-06-2007, 03:46 PM Re: help to cut up robots file and get contents..
Extreme Talker

Posts: 238
Location: United States
Trades: 0
Yes, trim() removes all kinds of whitespace by default, but that might not be your problem. In PHP, the following strings are equivalent (well, nearly equivalent, depending on which OS you are using, but that is irrelevant).
PHP Code:
$stringOne "1st line
2nd line
3rd line"
;

$stringTwo "1st line\n2nd line\n3rd line"
If you write either of those strings to a file, you will get three lines. However, if you do this:
PHP Code:
$stringThree "1st line\n
2nd line\n
3rd line"

then you will get a double-spaced result.

It's possible that you may be doing something like that judging by the code you posted a couple days ago. If not, then it could be just an unnecessary double \n\n or something.
__________________
The interlocking pieces of web development: usability, performance, accessibility, and standards.
frost is offline
Reply With Quote
View Public Profile
 
Old 11-06-2007, 05:14 PM Re: help to cut up robots file and get contents..
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
yea it is because in the robots it got double newline
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Old 11-06-2007, 05:18 PM Re: help to cut up robots file and get contents..
dansgalaxy's Avatar
Defies a Status

Posts: 6,521
Name: Dan
Location: Swindon
Trades: 0
okay its stopped most of the double new lines,

okay here it is, i type the stuff in save it.
its fine.

if i then open and resave (without even touching them)
it automatically ads a new line so on the second save it adds another newline which then means another thing for both.

get me?
__________________
Discounted Web Hosting With XDnet!
>> Get 25% of hosting~ Promo: Webmaster-talk <<

Please login or register to view this content. Registration is FREE
dansgalaxy is offline
Reply With Quote
View Public Profile Visit dansgalaxy's homepage!
 
Reply     « Reply to help to cut up robots file and get contents..
 

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off





   
RSS Feed  Feeds: RSS   JS   XML
RSS Feed  Feeds for this forum: RSS   JS   XML



Page generated in 0.63319 seconds with 12 queries