Back to Search Engine Optimization Techniques home

Search engine optimization glossary of terms


A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

A

Agent Name Delivery

Agent names are given to web browsers and other programs that are associated with the web. For example, the agent name for microsoft internet explorer 5.5 looks like this:

Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)

The agent name for Altavista's spider is:

Scooter/2.0 G.R.A.B. V1.1.0

Theoretically you can use agent names to your advantage in optimizing your page. By delivering pages from the webserver depending on what agent enters the page, you can send specifically optimized pages to the user viewing your page. For example, if altavista's spider "Scooter/2.0 G.R.A.B. V1.1.0" comes to index your page, you can set up your webserver to deliver a specifically optimized page for altavista. This means that only altavista's spider will see the optimized page, and anyone else will see a normal page.

ALT tags

ALT tags are incorporated with the HTML tag "IMG". The img tag is used to place images on your website. The ALT tag is used to describe those images. The ALT tags are considered important by the search engines, so it is a good idea to have these on your website if you have images on your page.

Altavista

Altavista is a search engine. You should know this.

Applet

A small program that is often written in java which is a part of a web page. Some examples would include chat programs and some media driven websites use applets to control media within the page. It is possible that the use of applets can stop spiders from indexing a page.

Ask jeeves

A meta search engine which can be asked questions in english. This engine uses directhit's database for search results.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

B

Bait and switch

The provision of one page for a search engine or directory and a different page for other user agents at the same URL. Various methods can be used e.g. Agent name delivery or IP delivery


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

C

CGI

Common Gateway Interface - a standard interface between web server software and other programs running on the same machine. CGI can be used to post information to the web server for processing from a form. e.g. Orders can be placed online, and with the help of CGI, the information can then be passed to the webserver to put into a database for processing.

Client

A computer, program or process which makes requests for information from another computer, program or process. Web browsers are client programs. Search engine spiders are (or can be said to behave as) clients. e.g. CuteFTP is an FTP client that connects to FTP servers to transmit data.

Cloaking

The hiding of page content. Normally carried out to stop page thieves stealing optimized pages. See also Bait-and-Switch.

Comment

The HTML <!-- and --> tags are used to hide text from browsers. Some search engines ignore text between these symbols but others index such text as if the comment tags were not there. Comments are often used to hide javascript code from non-compliant browsers, and sometimes (notably on Excite) to provide invisible keywords to some search engines. Comments are now frowned upon by most search engines, and will be penalised for the usage of the comments tag if not used properly.

Crawler

See Spider


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

D

Dead link

An internet link which doesn't lead to a page or site, probably because the server is down or the page has moved or no longer exists. Most search engines have techniques for removing such pages from their listings automatically, but as the internet continues to increase in size, it becomes more and more difficult for a search engine to check all the pages in the index regularly. Reporting of dead links helps to keep the indexes clean and accurate, and this can usually be done by submitting the dead link to the search engine.

Directory

A server or collection of servers dedicated to indexing internet web pages and returning lists of pages which match particular queries. Directories (also known as Indexes) are normally compiled manually, by user submission (such as at dmoz.org), and often involve an editorial selection and/or categorization process (such as at LookSmart and Yahoo).

Domain

A sub-set of internet addresses. Domains are hierarchical, and lower-level domains often refer to particular web sites within a top-level domain. The most significant part of the address comes at the end - typical top-level domains are .com, .edu, .gov, .org (which sub-divide addresses into areas of use). There are also various geographic top-level domains (e.g. .ar, .ca, .fr, .ra etc.) referring to particular countries.
The relevance to search engine terminology is that web sites which have their own domain name (e.g. http://www.nativetongues.com) will often achieve better positioning than web sites which exist as a sub-directory of another organisation's domain (e.g. http://ourworld.compuserve.com/homepages/tijana/).

Doorway page

A page aimed at specific keywords or phrases. A doorway page usually has a brief description of the website with a logo and link text on the page. Here is a Doorway page example.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

F

Frames

An HTML technique for combining two or more separate HTML documents within a single web browser screen. Compound interacting documents can be created to make a more effective web page presented in multiple windows or sub-windows.
A framed web site often causes great problems for search engines, and may not be indexed correctly. Search engines will often index only the part of a framed site within the <NOFRAMES> section, so make sure that the <NOFRAMES> section includes relevant text which can be indexed by the spiders. If your site uses frames, consider providing a gateway page or adding navigational links within the framed pages. Submit the main page - the one containing the <FRAMESET> tag to the search engines. If you use a gateway page, submit this separately.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

H

Hidden text

Text on a web page which is visible to search engine spiders but not visible to human visitors. This is sometimes because the text has been set the same colour as the background, because multiple TITLE tags have been used or because the text is an HTML comment. Hidden text is often used for spamdexing. Many search engines can now detect the use of hidden text, and often remove offending pages from their database or lower such pages' positioning. Text can also be hidden using agent name delivery or IP delivery either to present different text to different search engine spiders or to hide the real HTML source from competitors.

Hostname

The unique name given to a machine or computer on the internet.

HTML

(HyperText Markup Language) The coding language that all web sites follow to display on the world wide web.

HTTP

(HyperText Transfer Protocol) The protocol for moving hypertext files across the Internet. Requires a HTTP client program on one end, and an HTTP server program on the other end. HTTP is the most important protocol used in the world wide web.

Hyperlinks

Hyperlinks are used to "link" documents together. A web page for example can link to 10 different search engines. So by clicking these links you can go to each search engine. You can also link games, software, microsoft word documents, adobe acrobat documents and the list goes on. Links are used to create a sort of "journey" through the web. When the web was first introduced, it was quite hard for people to find you (search engines weren't very popular at all and extremely small). Everyone had to link to everyone else to try and get noticed on the web.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

I

Image map

A set of hyperlinks attached to areas of an image. If the image map is included within the web page, the search engines should have no problem following the links, although it is good practice to provide text links too.

IP delivery

Similar to agent name delivery, this technique presents different content depending on the IP address of the client. It is very difficult to view pages hidden using this technique, because the real page is only visible if your IP address is the same as (for example) a search engine's spider.

IP

(Internet Protocol Number) A unique number consisting of 4 parts separated by dots. e.g.

165.113.245.2

Every machine that is on the internet has a unique IP number.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

J

Java

A programming language whose programs can run on a number of different types of computers and/or operating systems. Used extensively to produce applets for web pages.

Javascript

A simple iterpreted computer language used for small programming tasks within HTML web pages. The scripts are normally interpreted (or run) on the client computer by the web browser. Some search engines have been know to index this scripts, presumably erroneously.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

K

Keyword

A word that forms part of a search engine query. Also used in search engine optimization to target main words of a web page.

Keyword density

The percentage of a keyword within a webpage. Some search engines use this property for positioning web pages within their indexes.

Keyword phrase

A phrase that forms a search engine query.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

L

Link popularity

A measure of the number and quality of links pointing to a particular site (inbound links). Many search engines are using this as part of the positioning process.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

M

Mirror sites

Copies of web sites or pages, often on different servers. The process of registering these multiple copies with search engines is often treated as spamdexing, because it artificially increases the relevancy of the pages. Filters such as the Infoseek Sniffer now remove muliple mirrors from the indexes.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

O

Open directory project

A directory project run by thousands of volunteer editors. In principal, this is a very exciting and powerful way to organise the web. In practice, there have been some problems with the behaviour of some of the editors, which has caused some initial difficulty for the organisers. Initially known as NewHoo, the project is now part of Netscape (and therefore of AOL).


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

Q

Query

A word, phrase or group of words, possibly combined with other syntax used to pass instructions to a search engine or a directory in order to locate web pages. For details of which queries are being used, visit the goto.com search inventory page.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

R

Real names

An alternate website address system in operation at altavista. Brand names used in searches are mapped directly to the appropriate website, usually because the company owning the brand name has paid a fee to Real names. http://www.realnames.com

Refresh tag

The HTTP-EQUIV meta tag is used to issue HTTP commands, and is frequently used with the REFRESH tag to refresh page content after a given number of seconds. Gateway pages sometimes use this technique to force browsers to a different page or site. Most search engines are wise to this, and will index the final page and/or reduce the ranking. Infoseek has a strong policy against this technique, and they might penalize your site, or even ban it.

Relevancy algorithm

The method a search engine or directory uses to match the keywords in a query with the content of each web page, so that the web pages found can be ordered suitably in the query results. Each search engine or directory is likely to use a different algorithm, and to change or improve its algorithm from time to time.

Robot

Any browser program which follows hypertext links and accesses web pages but is not directly under human control. Examples are the search engine spiders, the "harvesting" programs which extract e-mail addresses and other data from web pages and various intelligent web searching programs. A database of web robots is maintained by Webcrawler.
robots.txt
A text file stored in the top level directory of a web site to deny access by robots to certain pages or sub-directories of the site. Only robots which comply with the Robots Exclusion Standard will read and obey the commands in this file. Robots will read this file on each visit, so that pages or areas of sites can be made public or private at any time by changing the content of robots.txt before re-submitting to the search engines. The simple example below attempts to prevent all robots from visiting the /secret directory:
User-agent: *
Disallow: /secret
For more information, please refer to the Altavista robots.txt page.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

S

Spamdexing

The alteration or creation of a document with intent to deceive an electronic catalog or filing system. Any technique that increases the potential position of a site at the expense of the quality of the search engine's database can also be regarded as spamdexing - also known as spamming or spoofing.

Spamming

See spamdexing. Spamming is also used more generally to refer to the sending of unsolicited bulk electronic mail, and the search engine use is derived from this term.

Spider

The part of a search engine which surfs the web, storing URLs and indexing keywords and text of each page it finds. Please refer to the Search Engine Watch Spiderspotting Chart for details of individual spiders. See also Robot.

SSI

Server Side Includes. Used to add dynamically generated content to a web page. Depending on what you want to control, you can insert a copyright disclaimer at the bottom of every page for example. If you want to change it, you only change it once in the SSI. Basically the server can deliver a page to your browser. This helps when you want compatability over all platforms and web browsers. For example, some web pages will use javascript to insert these variables. Some browsers however do not support javascript, so SSI can come in handy.

When a browser requests a file from your webserver, the server can be made to look for the Server Side Includes within HTML files and execute them. The SSI command within your HTML code will be replaced be the results of the SSI program. Even if you look at the source code of your page, the SSI will look like it was part of the page.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

T

Traffic

The visitors to a web page or web site. Also refers to the number of visitors, hits, accesses etc. over a given period.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

U

URL

Universal Resource Locator. An address which can specify any internet resource uniquely. The beginning of teh address indicates the type of resource - e.g. http: for webpages ftp: for file transfers, telnet: for computer login sessions or mailto: for e-mail addresses.


Top  A  B  C  D  F  H  I  J  K  L  M  O  Q  R  S  T  U  X 

X

XML

Extensible markup language. A new language which promises more efficient data delivery over the web. XML does nothing itself - it must be implemented using "parser" software or XSL.

XSL

Extensible scripting language. An XML style sheet language supported by the newer web browsers internet explorer 5 and netscape 5.

© Dion Foster 2002