Forum Discussion

Parveez_70209's avatar
Parveez_70209
Icon for Nimbostratus rankNimbostratus
Jan 31, 2014

After Applying Robots.txt also Site is Visible into the Internet Search Engine

Hi All,

 

I applied the below script to our Virtual-Server profile:

 

=================================================================================================

 

when HTTP_REQUEST { if { [string tolower [HTTP::uri]] equals "/robots.txt" } { HTTP::respond 200 content "User-agent: *\nDisallow: /" } }

 

================================================================================================ But still https://ctx.redprairie.net/ is the 1st result in a google search for "Caltex RedPrairie" . And to be more specific below is coming up in Google search: But we can directly click on the website.

 

https://ctx.redprairie.net/A description for this result is not available because of this site's robots.txt – learn more

 

Kindly assist.

 

Thanks and Regards

 

11 Replies

  • That's very interesting. So Google explains this:

    While Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web. As a result, the URL of the page and, potentially, other publicly available information such as anchor text in links to the site, or the title from the Open Directory Project (www.dmoz.org), can appear in Google search results.

    And they suggest adding a meta tag to each page to completely block indexing:

    https://support.google.com/webmasters/answer/93710

    I think the easiest thing would be to add the following meta tag to each page in the application, but if you absolutely had to do it with an iRule, it might look something like this:

    when HTTP_REQUEST {
        HTTP::header remove Accept-Encode
        STREAM::disable
    }
    when HTTP_RESPONSE {
        STREAM::expression "@[/head]@[meta name=\"robots\" content=\"noindex\"]\r\n[/head]@"
        STREAM::enable
    }
    

    Apply an empty STREAM profile to the VIP. There are two additional concerns:

    1. Not all crawlers may honor this tag
    2. Google will have to crawl your site again to catch this tag and remove its listing
  • Hi Kevin,

     

    So, based on your suggestion, created a Stream Profile: named TEST( with all default setting, nothing mentioned) which I am going to attach to the Virtual Profile.

     

    General Properties Name TEST

     

    Parent Profile : stream Settings Custom

     

    Source : Not selected

     

    Target : Not selected

     

    Now checking the Irule, what we need to replace the square backets. Kindly suggest.

     

    Thanks and Regards Parveez

     

  • I used square brackets in the example code because the forum blocked regular HTML. Replace the left square brackets [ with a less than sign, and replace the right square brackets ] with a greater than sign. The idea is that you're using the STREAM iRule to replace the end head tag in the response HTML with a meta tag and a new end head tag.

     

  • Hi Kevin,

     

    Thank you, now Irule looks good:

     

    when HTTP_REQUEST { HTTP::header remove Accept-Encode STREAM::disable } when HTTP_RESPONSE { STREAM::expression "@@ \r\n@" STREAM::enable }

     

    And as you said, need to add an empty Stream Profile, is the below OK:

     

    TEST( with all default setting, nothing mentioned) which I am going to attach to the Virtual Profile.

     

    General Properties Name TEST Parent Profile : stream Settings Custom Source : Not selected Target : Not selected

     

    Thanks and Regards Parveez

     

    • Kevin_Stewart's avatar
      Kevin_Stewart
      Icon for Employee rankEmployee
      Yes, an empty STREAM profile. You can actually just assign the unmodified parent STREAM profile to the VIP.
  • Hi Kevin,

     

    We applied the same, but still thats coming in Internet search engine.

     

    Thanks and Regards Parveez

     

  • Keep in mind that the Googlebot has to re-crawl your site. You don't have any control over when that happens. You can manually remove the listing though if you follow the links in the search result.

     

  • Hi Kevin,

     

    We applied the new Irule alongwith the Stream Profile, but it seems like some outage. Like getting the login screen, and once after entering, seems like application's options were not displayed correctly.

     

    Thanks and Regards Parveez

     

  • Hi Kevin,

     

    Yes the same Irule and alongwith that created a default settings Stream Profile. Login page was coming, and even though it allowed for login. After login application contents were not coming.

     

    Thanks and Regards Parveez

     

  • Hi Kevin,

     

    Thank you, will re-check this one now as suggested. Will keep you posted.

     

    And now also same empty Stream Profile right ?

     

    Thanks and Regards Parveez