URL redirection, also called URL
forwarding, is a World Wide Web technique for making a web page
available under more than one URL address. When a web browser attempts
to open a URL that has been redirected, a page with a different URL is
opened. For example, www.example.com is redirected to example.iana.org.
Similarly, Domain redirection or domain forwarding is when all pages in a
URL domain are redirected to a different domain, as when wikipedia.com
and wikipedia.net are automatically redirected to wikipedia.org. URL
redirection can be used for URL shortening, to prevent broken links when
web pages are moved, to allow multiple domain names belonging to the
same owner to refer to a single web site, to guide navigation into and
out of a website, for privacy protection, and for less innocuous
purposes such as phishing attacks.
Purposes
There are several reasons to use URL redirection :
Similar domain names
A
user might mis-type a URL—for example, "example.com" and "exmaple.com".
Organizations often register these "mis-spelled" domains and re-direct
them to the "correct" location: example.com. The addresses example.com
and example.net could both redirect to a single domain, or web page,
such as example.org. This technique is often used to "reserve" other
top-level domains (TLD) with the same name, or make it easier for a true
".edu" or ".net" to redirect to a more recognizable ".com" domain.
Moving pages to a new domain
Web pages may be redirected to a new domain for three reasons:
• a site might desire, or need, to change its domain name;
• an author might move his or her individual pages to a new domain;
• two web sites might merge.
With
URL redirects, incoming links to an outdated URL can be sent to the
correct location. These links might be from other sites that have not
realized that there is a change or from bookmarks/favorites that users
have saved in their browsers.
The
same applies to search engines. They often have the older/outdated
domain names and links in their database and will send search users to
these old URLs. By using a "moved permanently" redirect to the new URL,
visitors will still end up at the correct page. Also, in the next search
engine pass, the search engine should detect and use the newer URL.
Logging outgoing links
The
access logs of most web servers keep detailed information about where
visitors came from and how they browsed the hosted site. They do not,
however, log which links visitors left by. This is because the visitor's
browser has no need to communicate with the original server when the
visitor clicks on an outgoing link.
This
information can be captured in several ways. One way involves URL
redirection. Instead of sending the visitor straight to the other site,
links on the site can direct to a URL on the original website's domain
that automatically redirects to the real target. This technique bears
the downside of the delay caused by the additional request to the
original website's server. As this added request will leave a trace in
the server log, revealing exactly which link was followed, it can also
be a privacy issue.[1]
The same
technique is also used by some corporate websites to implement a
statement that the subsequent content is at another site, and therefore
not necessarily affiliated with the corporation. In such scenarios,
displaying the warning causes an additional delay.
Short aliases for long URLs
Main article: URL shortening
Web
applications often include lengthy descriptive attributes in their URLs
which represent data hierarchies, command structures, transaction paths
and session information. This practice results in a URL that is
aesthetically unpleasant and difficult to remember, and which may not
fit within the size limitations of microblogging sites. URL shortening
services provide a solution to this problem by redirecting a user to a
longer URL from a shorter one.
Meaningful, persistent aliases for long or changing URLs
See also: Permalink, PURL, and Link rot
Sometimes
the URL of a page changes even though the content stays the same.
Therefore URL redirection can help users who have bookmarks. This is
routinely done on Wikipedia whenever a page is renamed.
Post/Redirect/Get
Main article: Post/Redirect/Get
Post/Redirect/Get
(PRG) is a web development design pattern that prevents some duplicate
form submissions, creating a more intuitive interface for user agents
(users).
Manipulating search engines
Redirect
techniques are used to fool search engines. For example, one page could
show popular search terms to search engines but redirect the visitors
to a different target page. There are also cases where redirects have
been used to "steal" the page rank of one popular page and use it for a
different page, They will also redirect using searches with search
engines as searches, usually involving the 302 HTTP status code of
"moved temporarily."[2][3]
Search engine providers have noticed the problem and are working on appropriate actions[citation needed].
As a result, today, such manipulations usually result in less rather than more site exposure.
Manipulating visitors
URL
redirection is sometimes used as a part of phishing attacks that
confuse visitors about which web site they are visiting[citation
needed]. Because modern browsers always show the real URL in the address
bar, the threat is lessened. However, redirects can also take you to
sites that will otherwise attempt to attack in other ways. For example, a
redirect might take a user to a site that would attempt to trick them
into downloading antivirus software and, ironically, installing a trojan
of some sort instead.
Removing referer information
When
a link is clicked, the browser sends along in the HTTP request a field
called referer which indicates the source of the link. This field is
populated with the URL of the current web page, and will end up in the
logs of the server serving the external link. Since sensitive pages may
have sensitive URLs (for example,
http://company.com/plans-for-the-next-release-of-our-product), it is not
desirable for the referer URL to leave the organization. A redirection
page that performs referrer hiding could be embedded in all external
URLs, transforming for example http://externalsite.com/page into
http://redirect.company.com/http://externalsite.com/page. This technique
also eliminates other potentially sensitive information from the
referer URL, such as the session ID, and can reduce the chance of
phishing by indicating to the end user that they passed a clear gateway
to another site.
Techniques
Several
different kinds of response to the browser will result in a
redirection. These vary in whether they affect HTTP headers or HTML
content. The techniques used typically depend on the role of the person
implementing it and their access to different parts of the system. For
example, a web author with no control over the headers might use a
Refresh meta tag whereas a web server administrator redirecting all
pages on a site is more likely to use server configuration.
Manual redirect
The simplest technique is to ask the visitor to follow a link to the new page, usually using an HTML anchor like:
Please follow this link.
This
method is often used as a fall-back — if the browser does not support
the automatic redirect, the visitor can still reach the target document
by following the link.
HTTP status codes 3xx
In
the HTTP protocol used by the World Wide Web, a redirect is a response
with a status code beginning with 3 that causes a browser to display a
different page. The different codes describe the reason for the
redirect, which allows for the correct subsequent action (such as
changing links in the case of code 301, a permanent change of address).
The HTTP standard defines several status codes for redirection:
• 300 multiple choices (e.g. offer different languages)
• 301 moved permanently
• 302 found (originally temporary redirect, but now commonly used to specify redirection for unspecified reason)
• 303 see other (e.g. for results of cgi-scripts)
• 307 temporary redirect
All
of these status codes require that the URL of the redirect target be
given in the Location: header of the HTTP response. The 300 multiple
choices will usually list all choices in the body of the message and
show the default choice in the Location: header.
(Status codes 304 not modified and 305 use proxy are not redirects).
An HTTP response with the 301 "moved permanently" redirect looks like this:
HTTP/1.1 301 Moved Permanently
Location: http://www.example.org/
Content-Type: text/html
Content-Length: 174
Moved
This page has moved to http://www.example.org/.
Using server-side scripting for redirection
Web
authors producing HTML content can't usually create redirects using
HTTP headers as these are generated automatically by the web server
program when serving an HTML file. The same is usually true even for
programmers writing CGI scripts, though some servers allow scripts to
add custom headers (e.g. by enabling "non-parsed-headers"). Many web
servers will generate a 3xx status code if a script outputs a
"Location:" header line. For example, in PHP, one can use the "header"
function:
header('HTTP/1.1 301 Moved Permanently');
header('Location: http://www.example.com/');
exit();
(More headers may be required to prevent caching[4]).
The
programmer must ensure that the headers are output before the body.
This may not fit easily with the natural flow of control through the
code. To help with this, some frameworks for server-side content
generation can buffer the body data. In the ASP scripting language, this
can also be accomplished using response.buffer=true and
response.redirect "http://www.example.com/"
While
the HTTP protocol says the URI in a Location header must be
absolute,[5] most browsers tolerate relative URIs though some display a
warning to the user.
Apache mod_rewrite
The
Apache HTTP Server's mod_alias extension can be used to redirect
certain requests. Typical configuration directives look like:
Redirect permanent /oldpage.html http://www.example.com/newpage.html
Redirect 301 /oldpage.html http://www.example.com/newpage.html
For
more flexible URL rewriting and redirection, Apache mod_rewrite can be
used. E.g. to redirect a requests to a canonical domain name:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^([^.:]+\.)*oldsite\.example\.com\.?(:[0-9]*)?$ [NC]
RewriteRule ^(.*)$ http://newsite.example.net/$1 [R=301,L]
Such
configuration can be applied to one or all sites on the server through
the server configuration files or to a single content directory through a
.htaccess file.
Refresh Meta tag and HTTP refresh header
Netscape
introduced the meta refresh feature which refreshes a page after a
certain amount of time. This can specify a new URL to replace one page
with another. This is supported by most web browsers. See
• HTML tag
• An exploration of dynamic documents
A
timeout of zero seconds effects an immediate redirect. This is treated
like a 301 permanent redirect by Google, allowing transfer of PageRank
to the target page.[6]
This is an example of a simple HTML document that uses this technique:
Please follow this link.
This
technique can be used by web authors because the meta tag is contained
inside the document itself. The meta tag must be placed in the "head"
section of the HTML file. The number "0" in this example may be replaced
by another number to achieve a delay of that many seconds. The anchor
in the "body" section is for users whose browsers do not support this
feature.
The same effect can be achieved with an HTTP refresh header:
HTTP/1.1 200 ok
Refresh: 0; url=http://www.example.com/
Content-type: text/html
Content-length: 78
Please follow this link.
This response is easier to generate by CGI programs because one does not need to change the default status code.
Here is a simple CGI program that effects this redirect:
#!/usr/bin/perl
print "Refresh: 0; url=http://www.example.com/\r\n";
print "Content-type: text/html\r\n";
print "\r\n";
print "Please follow this link!"
Note: Usually, the HTTP server adds the status line and the Content-length header automatically.
The
W3C discourage the use of meta refresh, since it does not communicate
any information about either the original or new resource, to the
browser (or search engine). The W3C's Web Content Accessibility
Guidelines (7.4) discourage the creation of auto-refreshing pages, since
most web browsers do not allow the user to disable or control the
refresh rate. Some articles that they have written on the issue include
W3C Web Content Accessibility Guidelines (1.0): Ensure user control of
time-sensitive content changes, Use standard redirects: don't break the
back button! and Core Techniques for Web Content Accessibility
Guidelines 1.0 section 7.
JavaScript redirects
JavaScript can cause a redirect by setting the window.location attribute, e.g.:
window.location='http://www.example.com/'
Normally
JavaScript pushes the redirector site's URL to the browser's history.
It can cause redirect loops when users hit the back button. With the
following command you can prevent this type of behaviour. [7]
window.location.replace('http://www.example.com/')
However,
HTTP headers or the refresh meta tag may be preferred for security
reasons and because JavaScript will not be executed by some browsers and
many web crawlers.
Frame redirects
A slightly different effect can be achieved by creating a single HTML frame that contains the target page:
Please follow link!
One
main difference to the above redirect methods is that for a frame
redirect, the browser displays the URL of the frame document and not the
URL of the target page in the URL bar.
This
cloaking technique may be used so that the reader sees a more memorable
URL or to fraudulently conceal a phishing site as part of website
spoofing.[8]
Redirect chains
One
redirect may lead to another. For example, the URL
http://www.wikipedia.com/wiki/URL_redirection (note the domain name) is
first redirected to http://www.wikipedia.org/wiki/URL_redirection and
then to the correct URL: http://en.wikipedia.org/wiki/URL_redirection.
This is unavoidable if the different links in the chain are served by
different servers though it should be minimised by rewriting the URL as
much as possible on the server before returning it to the browser as a
redirect.
Redirect loops
Sometimes
a mistake can cause a page to end up redirecting back to itself,
possibly via other pages, leading to an infinite sequence of redirects.
Browsers should stop redirecting after a certain number of hops and
display an error message.
The HTTP standard states:
A
client should detect infinite redirection loops, since such loops
generate network traffic for each redirection. Previous versions of this
specification recommended a maximum of five redirections; some clients
may exist that implement such a fixed limitation.
Note
that the URLs in the sequence might not repeat, e.g.:
http://www.example.com/1 -> http://www.example.com/2 ->
http://www.example.com/3 ...
Services
There
exist services that can perform URL redirection on demand, with no need
for technical work or access to the web server your site is hosted on.
URL redirection services
A
redirect service is an information management system, which provides an
internet link that redirects users to the desired content. The typical
benefit to the user is the use of a memorable domain name, and a
reduction in the length of the URL or web address. A redirecting link
can also be used as a permanent address for content that frequently
changes hosts, similarly to the Domain Name System.
Hyperlinks
involving URL redirection services are frequently used in spam messages
directed at blogs and wikis. Thus, one way to reduce spam is to reject
all edits and comments containing hyperlinks to known URL redirection
services; however, this will also remove legitimate edits and comments
and may not be an effective method to reduce spam.
Recently, URL redirection services have taken to using AJAX as an efficient, user friendly method for creating shortened URLs.
A major drawback of some URL redirection services is the use of delay pages, or frame based advertising, to generate revenue.
History
The
first redirect services took advantage of top-level domains (TLD) such
as ".to" (Tonga), ".at" (Austria) and ".is" (Iceland). Their goal was to
make memorable URLs. The first mainstream redirect service was V3.com
that boasted 4 million users at its peak in 2000. V3.com success was
attributed to having a wide variety of short memorable domains including
"r.im", "go.to", "i.am", "come.to" and "start.at". V3.com was acquired
by FortuneCity.com, a large free web hosting company, in early 1999. In
2001 emerged .tk (Tokelau) as a TLD used for memorable names.[9] As the
sales price of top level domains started falling from $70.00 per year to
less than $10.00, use of redirection services declined.
With
the launch of TinyURL in 2002 a new kind of redirecting service was
born, namely URL shortening. Their goal was to make long URLs short, to
be able to post them on internet forums. Since 2006, with the 140
character limit on the extremely popular Twitter service, these short
URL services have been heavily used.
Referrer Masking
Redirection
services can hide the referrer by placing an intermediate page between
the page the link is on and its destination. Although these are
conceptually similar to other URL redirection services, they serve a
different purpose, and they rarely attempt to shorten or obfuscate the
destination URL (as their only intended side-effect is to hide referrer
information and provide a clear gateway between other websites.)
This
type of redirection is often used to prevent potentially-malicious
links from gaining information using the referrer, for example a session
ID in the query string. Many large community websites use link
redirection on external links to lessen the chance of an exploit that
could be used to steal account information, as well as make it clear
when a user is leaving a service, to lessen the chance of effective
phishing .
Here is a simplistic example of such a service, written in PHP.
$url = htmlspecialchars($_GET['url']);
header( 'Refresh: 0; url=http://'.$url );
?>
Attempting to redirect to http://.
Please
note that the above example does not check who called it (e.g. by
referrer, although that could be spoofed). Also, it does not check the
url provided. This means that a malicious person could link to the
redirection page using a url parameter of his/her own selection, from
any page, which uses the web server's resources.
Comments