Tuesday, May 13, 2008

URL Re-Write Codes

If your pages are dynamically generated and more than two special characters (?, #, @, & etc) are there in the URL then that will be a problem for indexing of the page. To avoid this you need to create a new file at the root of your website and name it as ".htaccess". Here you need to write down some codes collected from different sources.

RewriteEngine on
RewriteRule ^/([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)\.html$ index.php?cat=$1&page=$2

Explanation of the code

First line indicates to enable rewrites. The second line is the redirects rule,
RewriteRule has 2 parts, the first part, the from part, “^/([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)\.html$” is to tell apache that if you get a url which looks like this, redirect it to the second part, the to part, which may be like“index.php?cat=$1&page=$2“

The technique is you need to put the links on your page, which are static, and when some one clicks or Search engine follow them it will redirect them to the actual dynamic page

In the from part there are 2 variables which are alphanumeric, which is defined as “([a-zA-Z0-9_-]+)” (from a-z, A-Z, 0-9, _-]+)) the variable ends as soon as a character is reached which is not in the range, specified there. In this case it’s the ‘/’, then the second variable starts which is for the pages. As you can see in this case, one can use there keywords on the urls for better ranking

If you had moved a file to a new location and want all links to the old location to be forwarded to the new location. Though you shouldn’t really ever » move a file once it has been placed on the web; at least when you simply have to, you can do your best to stop any old links from breaking.

RewriteEngine on
RewriteRule ^old\.html$ new.html

Though this is the simplest example possible, it may throw a few people off. The structure of the ‘old’ URL is the only difficult part in this RewriteRule. There are three special characters in there.

* The caret, ^, signifies the start of an URL, under the current directory. This directory is whatever directory the .htaccess file is in. You’ll start almost all matches with a caret.
* The dollar sign, $, signifies the end of the string to be matched. You should add this in to stop your rules matching the first part of longer URLs.
* The period or dot before the file extension is a special character in regular expressions, and would mean something special if we didn’t escape it with the backslash, which tells Apache to treat it as a normal character. So, this rule will make your server transparently redirect from old.html to the new.html page. Your reader will have no idea that it happened, and it’s pretty much instantaneous.Forcing New Requests

Sometimes you do want your readers to know a redirect has occurred, and can do this by forcing a new HTTP request for the new page. This will make the browser load up the new page as if it was the page originally requested, and the location bar will change to show the URL of the new page. All you need to do is turn on the [R] flag, by appending it to the rule:

RewriteRule ^old\.html$ new.html [R]

Using Regular Expressions

Now we get on to the really useful stuff. The power of mod_rewrite comes at the expense of complexity. If this is your first encounter with regular expressions, you may find them to be a tough nut to crack, but the options they afford you are well worth the slog. I’ll be providing plenty of examples to guide you through the basics here.

Using regular expressions you can have your rules matching a set of URLs at a time, and mass-redirect them to their actual pages. Take this rule;

RewriteRule ^products/([0-9][0-9])/$ /productinfo.php?prodID=$1

This will match any URLs that start with ‘products/’, followed by any two digits, followed by a forward slash. For example, this rule will match an URL like products/12/ or products/99/, and redirect it to the PHP page.

The parts in square brackets are called ranges. In this case we’re allowing anything in the range 0-9, which is any digit. Other ranges would be [A-Z], which is any uppercase letter; [a-z], any lowercase letter; and [A-Za-z], any letter in either case.

We have encased the regular expression part of the URL in parentheses, because we want to store whatever value was found here for later use. In this case we’re sending this value to a PHP page as an argument. Once we have a value in parentheses we can use it through what’s called a back-reference. Each of the parts you’ve placed in parentheses are given an index, starting with one. So, the first back-reference is $1, the third is $3 etc.

Thus, once the redirect is done, the page loaded in the readers’ browser will be something like productinfo.php?prodID=12 or something similar. Of course, we’re keeping this true URL secret from the reader, because it likely ain’t the prettiest thing they’ll see all day.

Multiple Redirects

If your site visitor had entered something like products/12, the rule above won’t do a redirect, as the slash at the end is missing. To promote good URL writing, we’ll take care of this by doing a direct redirect to the same URL with the slash appended.

RewriteRule ^products/([0-9][0-9])$ /products/$1/ [R]

Multiple redirects in the same .htaccess file can be applied in sequence, which is what we’re doing here. This rule is added before the one we did above, like so:

RewriteRule ^products/([0-9][0-9])$ /products/$1/ [R]
RewriteRule ^products/([0-9][0-9])/$ /productinfo.php?prodID=$1

Thus, if the user types in the URL products/12, our first rule kicks in, rewriting the URL to include the trailing slash, and doing a new request for products/12/ so the user can see that we likes our trailing slashes around here. Then the second rule has something to match, and transparently redirects this URL to productinfo.php?prodID=12. Slick.

Match Modifiers

You can expand your regular expression patterns by adding some modifier characters, which allow you to match URLs with an indefinite number of characters. In our examples above, we were only allowing two numbers after products. This isn’t the most expandable solution, as if the shop ever grew beyond these initial confines of 99 products and created the URL productinfo.php?prodID=100, our rules would cease to match this URL.

So, instead of hard-coding a set number of digits to look for, we’ll work in some room to grow by allowing any number of characters to be entered. The rule below does just that:

RewriteRule ^products/([0-9]+)$ /products/$1/ [R]

Note the plus sign (+) that has snuck in there. This modifier changes whatever comes directly before it, by saying ‘one or more of the preceding character or range.’ In this case it means that the rule will match any URL that starts with products/ and ends with at least one digit. So this’ll match both products/1 and products/1000.

Other match modifiers that can be used in the same way are the asterisk, *, which means ‘zero or more of the preceding character or range’, and the question mark, ?, which means ‘zero or only one of the preceding character or range.’

Adding Guessable URLs

Using these simple commands you can set up a slew of ‘shortcut URLs’ that you think visitors will likely try to enter to get to pages they know exist on your site. For example, I’d imagine a lot of visitors try jumping straight into our stylesheets section by typing the URL http://www.yourhtmlsource.com/css/. We can catch these cases, and hopefully alert the reader to the correct address by updating their location bar once the redirect is done with these lines:

RewriteRule ^css(/)?$ /stylesheets/ [R]

The simple regular expression in this rule allows it to match the css URL with or without a trailing slash. The question mark means ‘zero or one of the preceding character or range’ — in other words either yourhtmlsource.com/css or yourhtmlsource.com/css/ will both be taken care of by this one rule.

This approach means less confusing 404 errors for your readers, and a site that seems to run a whole lot smoother all ’round.

Canonical Hostnames

Description:

The goal of this rule is to force the use of a particular hostname, in preference to other hostnames which may be used to reach the same site. For example, if you wish to force the use of www.example.com instead of example.com, you might use a variant of the following recipe.

Solution:

# For sites running on a port other than 80
RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteCond %{SERVER_PORT} !^80$
RewriteRule ^/(.*) http://www.example.com:%{SERVER_PORT}/$1 [L,R]

# And for a site running on port 80
RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://www.example.com/$1 [L,R]

Trailing Slash Problem
Description:

Every webmaster can sing a song about the problem of the trailing slash on URLs referencing directories. If they are missing, the server dumps an error, because if you say /~quux/foo instead of /~quux/foo/ then the server searches for a file named foo. And because this file is a directory it complains. Actually it tries to fix it itself in most of the cases, but sometimes this mechanism need to be emulated by you. For instance after you have done a lot of complicated URL rewritings to CGI scripts etc.

Solution:

The solution to this subtle problem is to let the server add the trailing slash automatically. To do this correctly we have to use an external redirect, so the browser correctly requests subsequent images etc. If we only did a internal rewrite, this would only work for the directory page, but would go wrong when any images are included into this page with relative URLs, because the browser would request an in-lined object. For instance, a request for image.gif in /~quux/foo/index.html would become /~quux/image.gif without the external redirect!

So, to do this trick we write

RewriteEngine on
RewriteBase /~quux/
RewriteRule ^foo$ foo/ [R]

The crazy and lazy can even do the following in the top-level .htaccess file of their homedir. But notice that this creates some processing overhead.

RewriteEngine on
RewriteBase /~quux/
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^(.+[^/])$ $1/ [R]

301 Redirect

301 redirect is the most efficient and Search Engine Friendly method for webpage redirection. It's not that hard to implement and it should preserve your search engine rankings for that particular page. If you have to change file names or move pages around, it's the safest option. The code "301" is interpreted as "moved permanently".

You can Test your redirection with Search Engine Friendly Redirect Checker

Below are a Couple of methods to implement URL Redirection

IIS Redirect

* In internet services manager, right click on the file or folder you wish to redirect
* Select the radio titled "a redirection to a URL".
* Enter the redirection page
* Check "The exact url entered above" and the "A permanent redirection for this resource"
* Click on 'Apply'

ColdFusion Redirect

<.cfheader statuscode="301" statustext="Moved permanently">
<.cfheader name="Location" value="http://www.new-url.com">

PHP Redirect

Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://www.new-url.com" );
?>

ASP Redirect

<%@ Language=VBScript %>
<%
Response.Status="301 Moved Permanently"
Response.AddHeader "Location","http://www.new-url.com/"
%>

ASP .NET Redirect

JSP (Java) Redirect

<%
response.setStatus(301);
response.setHeader( "Location", "http://www.new-url.com/" );
response.setHeader( "Connection", "close" );
%>

CGI PERL Redirect

$q = new CGI;
print $q->redirect("http://www.new-url.com/");

Ruby on Rails Redirect

def old_action
headers["Status"] = "301 Moved Permanently"
redirect_to "http://www.new-url.com/"
end

Redirect Old domain to New domain (htaccess redirect)

Create a .htaccess file with the below code, it will ensure that all your directories and pages of your old domain will get correctly redirected to your new domain.

The .htaccess file needs to be placed in the root directory of your old website (i.e the same directory where your index file is placed)

Options +FollowSymLinks
RewriteEngine on
RewriteRule (.*) http://www.newdomain.com/$1 [R=301,L]

Please REPLACE www.newdomain.com in the above code with your actual domain name.

In addition to the redirect I would suggest that you contact every backlinking site to modify their backlink to point to your new website.

Note* This .htaccess method of redirection works ONLY on Linux servers having the Apache Mod-Rewrite moduled enabled.

Redirect to www (htaccess redirect)

Create a .htaccess file with the below code, it will ensure that all requests coming in to domain.com will get redirected to www.domain.com

The .htaccess file needs to be placed in the root directory of your old website (i.e the same directory where your index file is placed)

Options +FollowSymlinks
RewriteEngine on
rewritecond %{http_host} ^domain.com [nc]
rewriterule ^(.*)$ http://www.domain.com/$1 [r=301,nc]

Please REPLACE domain.com and www.newdomain.com with your actual domain name.

Note* This .htaccess method of redirection works ONLY on Linux servers having the Apache Mod-Rewrite moduled enabled.

Sources
----------
http://httpd.apache.org/docs/2.0/misc/rewriteguide.html
http://www.yourhtmlsource.com/sitemanagement/urlrewriting.html
http://www.webconfs.com/how-to-redirect-a-webpage.php

http://www.johny.org/2008/05/13/apache-url-rewrite-for-seo/

No comments: