Note: This section is still under construction, and may be lacking content (or some inaccuracies); It may be better to check back at a later date. Thank you.
This page will give you some tips on how to get your pages listed on search engines.
meta Tagsmeta tags are small chunks of code that go into the <head>
section of your HTML pages.
They are used as indicators to web browsers, search engines, etc.
to give information about your page and its contents.
meta elements are usually specified something like this:
<meta name="xxx" content="xxx">
…although http-equiv is sometimes used in place of name.
There are several meta tags you can put in your pages to help search engines, which are listed below:
The description meta tag keyword allows you to add a description
for your page something like so:
<meta name="description" content="A little text to describe the page">
Keywords or phrases can be used when to find keywords or phrases that the user enters into the search engine. They are seperated by commas:
<meta name="keywords" content="rick,bull,rick bull">
So that example would match the individual keywords rick
and
bull
, or the phrase rick bull
.
Some search engines require that you add an e-mail address for the author of the page:
<meta name="author" content="rickbull@rickmusic.fsnet.co.uk">
Specifying the document's language via predefined language codes can help search engines to bring up results that are in the user's desired language:
<meta name="language" content="en-gb">
Robots are (for example) search engines that visit your page, analyse it
and then add it to their databases so that users can search them and possibly
bring up your site as a result. There are two ways to specify which pages robots
should or should not visit - the robots.txt file,
or the robots <meta> tag.
The robots.txt file is a file that robots check for on your server to tell
it which directories/files it should and should not read.
It is more flexible and preferable to use than using the <meta> tag,
but some servers do not allow you to modify or create this file (usually for
security reasons).
The robots.txt file must be in lower-case, and there should be only one in the root of the website (e.g. http://www.rickbull.co.uk/ is my root address, so my robots.txt file's URI would be http://www.rickbull.co.uk/robots.txt).
There are two parts to the robots.txt file: User-Agent and Disallow:
This keyword allows you to specify which robots are to use the proceding rules. For example, if we wanted to have rules only apply to the WebCrawler robot, we would use this line of code:
User-agent: WebCrawler
You then break down a line and add any disallow rules that you want to apply to this robot. There must be a blank line between each group of user-agents and rules, as is shown in the example robots.txt file.
You may also use an asterisk (*) to denote that the following
rules apply to all robots. Note:
the robots.txt is not Regular Expression aware; * has no special
meaning other than any-robot
,
i.e. you can not say, for example, web* - this will just mean a robot
named web*
and not web followed by any other characters as it would in regular expressions.
Once you have identified your target robots you need add some directories
or files that it should ignore. You do this with the Disallow directive:
Disallow: /my_secret_files/
In this case the robot would ignore the contents of /my_secret_files/
.
You can also apply this to individual files, or to allow access to all files on
your server you can use this code:
User-agent: *
Disallow:
Or create an empty robots.txt file. Also note that robots take these values as partial URIs. For example this code:
Disallow: /rick
…would not only disallow the directory
,
but also any other files found that start with rick in the root directory,
e.g. /rick/rick_bull.htm,
/rick.php, /rickamous.xyz, etc..
One last thing to note is that you can add comments with the hash (#)
character, and all text until the end of that line will be ignored by the robot.
Below is an example robots.txt file:
User-agent: WebCrawler
Disallow: /temp/
Disallow: /some_personal_files/personal.txt
Disallow: /some_personal_files/personal2.htm
User-agent: SneakBot #Don't let SneakBot see anything
Disallow: /
User-agent: BadBoy
Disallow: /~ #Stop bad boy from seeing anything with ~ at the start
Disallow: /private.htm
User-agent: GoodRobot #Allow GoodRobot to go anywhere
Disallow:
User-agent: * #All robots
Disallow: /my_stuff/<meta> TagIf you do not have persmission to edit the robots.txt file on your server
you can use the robots <meta> tag instead.
There are six directives/keywords you can use:
indexnoindexfollownofollowallindex and follow to mean that
this page should be indexed, and all links should be followednonenoindex and nofollow to mean that
this page should not be indexed, and no links should be followedYou can use these keywords like any other <meta> tag:
<meta name="robots" content="noindex">
You can also specify more than one directive by seperating them by commas:
<meta name="robots" content="index,nofollow">
Obviously though you should not specify conflicting directives, such as
"index,noindex" or "nofollow,follow".
Using headings in the correct manner (i.e. to empahsis the structure of the document) can also help some search engines to get an idea of what your page is all about. For more information on using headings please visit the basics of HTML section.