|
|
SearchEngine Terms |
|
This
glossary or list of search engine terms is designed to complement the
services we offer, as well as discussions taking place in some of the
leading message boards regarding Search Engine/Internet Marketing.
- Adjacency
- A
property of the relationship between words in a search engine (or directory)
query. Search engines often allow users to specify that words should
be next to one another or somewhere near one another in the web pages
searched.
-
-
-
The
process of sending search engine spiders to a tailored page, yet directing
your visitors to what you want them to see. This is done using server
side includes (or other dynamic content techniques). SSI, for example,
can be used to deliver different content to the client depending on
the value of HTTP_USER_AGENT. Most normal browser software packages
have a user agent string which starts with "Mozilla" (coined from Mosaic
and Godzilla). Most search engine spiders have specific agent names,
such as "Gulliver", "Infoseek sidewinder", "Lycos spider" and "Scooter".
-
-
By
switching on the value of HTTP_USER_AGENT (a process known as agent
detection), different pages can be presented at the same URL, so
that normal visitors will never see the page submitted to search engines
(and vice versa).
-
-
In
practise this is somewhat simplistic. Some search engines pretend to
be "plain mozilla" browsers to prevent use of agent name delivery. Effective
use of agent name delivery can be very difficult, and may not even work.
-
-
How
do you spot agent name delivery at work? This is quite difficult, as
the owners of web pages using agent name delivery can control what you
see! You may be able to guess that a page is using this technique if
it appears to be indexed incorrectly or the title or description don't
match the page you see, but this could also have been achieved by switching
pages after the relevant search engine has indexed it. If you really
want to see the search engines' tailored version of a page, write a
program (e.g. a Perl script) to retrieve the URL with HTTP_USER_AGENT
set to each of the strings used by the search engine spiders. If agent
name delivery is in use, one or more of the retrieved pages will be
different to the others!
-
-
-
-
-
A
popular search engine with the largest database on the web, indexing
more than 140 million pages. Its main URL is http://www.altavista.com/.
Until 1998, this search engine provided the search facility for Yahoo.
Altavista indexes all the words in a web page, and new pages are normally
added to the database fairly quickly, within a couple of working days.
You are asked to submit just the main page of your site. The Altavista
spider will then explore your site and index a representative sample
of the pages. Some problems with spamming have been noticed. The use
of keyword meta tags is penalised. Altavista places various alternative
options before its search results, including suggested questions (using
the Ask Jeeves service), RealNames. Paid entries are beginning to appear
at the start of the search results.
-
-
-
The
default search engine for users of the AOL internet service provider,
and hence a busy site. Its URL is http://www.netfind.com/.
It is essentially the same engine as Excite.
-
-
-
A
small program, often written in Java, which usually runs in a web browser, as part
of a web page. It is possible that the use of such a program may cause
spiders and robots to stop indexing a page.
-
-
ArchitextSpider
-
The
name of the Excite search engine's spider.
-
-
-
-
- Bait-and-Switch
- The
provision of one page for a search engine or directory and a different
page for other user agents at the same URL. Various methods can be used,
e.g. Agent Name Delivery or IP
Delivery.
-
-
Bridge
Page
-
-
- CGI
- Common Gateway Interface - a standard interface between web server
software and other programs running on the same machine.
-
-
CGI
Program
-
Strictly, any program which handles its input and output data according
to the CGI standard. In practice, CGI programs are used to handle forms
and database queries on web pages, and to produce non-static web page
content.
-
-
Channels,
Channel listings
-
Lists of links to selected (and usually popular) web sites. The links
are maintained by search engines and directories and are sorted into
categories or channels. Sites are picked by a channel editor, often
because of a site's already high ranking with the search engines. Some
search engines and directories allow visitors to nominate sites for
inclusion in their channels.
-
-
Client
-
A computer, program or process which makes requests for information
from another computer, program or process. Web browsers are client programs.
Search engine spiders are (or can be said to behave as) clients.
-
-
Click
through
-
The process of clicking on a link in a search engine output page to
visit an indexed site.
-
-
This is an important link in the process of receiving visitors to
a site via search engines. Good ranking may be useless if visitors do
not click on the link which leads to the indexed site. The secret here
is to provide a good descriptive title and an accurate and interesting
description.
-
-
Cloaking
-
The hiding of page content. Normally carried out to stop page thieves
stealing optimized pages. See also Bait-and-Switch.
-
-
Clustering
-
The listing of only one page from each web site in a search engine
or directory's list of search results. This avoids occupation of all
the top results by a small number of web sites and makes the list of
results clearer and more useful to the user.
-
-
Comment
-
The HTML <!-- and --> tags are used to hide text from browsers.
Some search engines ignore text between these symbols but others index
such text as if the comment tags were not there. Comments are often
used to hide javascript
code from non-compliant browsers, and sometimes (notably on Excite)
to provide invisible keywords to some search engines.
-
-
Crawler
-
-
- Dead
Link
- An internet link which doesn't lead to a page or site, probably because
the server is down or the page has moved or no longer exists. Most search
engines have techniques for removing such pages from their listings
automatically, but as the internet continues to increase in size, it
becomes more and more difficult for a search engine to check all the
pages in the index regularly. Reporting of dead links helps to keep
the indexes clean and accurate, and this can usually be done by submitting
the dead link to the search engine.
-
-
De-listing
-
The removal of pages from a search engine's index.
Removal can occur for various reasons, including unreliability of the
machine that hosts a site or because of perceived attempts at spamdexing.
-
-
Description
-
Descriptive text associated with a web page and displayed, usually
with the page title and URL, when the page appears in a list of pages
generated by a search engine or directory as a result of a query. Some
search engines take this description from the DESCRIPTION Meta
tag - others generate their own from the text in the
page. Directories often use text provided at registration.
-
-
Direct
Hit
-
A system which monitors the search engine users' selections from search
engine results, counting which results are clicked on most, and how
long visitors spend at that site, so as to improve relevancy. Used by
HotBot
and as a plug-in to Apple's new innovative Sherlock search system. See
http://www.directhit.com.
-
-
-
A server or a collection of servers dedicated to indexing internet
web pages and returning lists of pages which match particular queries.
Directories (also known as Indexes) are normally compiled manually,
by user submission (such as at whatsnew.com),
and often involve an editorial selection and/or categorization process
(such as at LookSmart
and Yahoo).
-
-
-
-
-
Domain
-
A sub-set of internet addresses. Domains are hierarchical, and lower-level
domains often refer to particular web sites within a top-level domain.
The most significant part of the address comes at the end - typical
top-level domains are .com, .edu, .gov, .org (which sub-divide addresses
into areas of use). There are also various geographic top-level domains
(e.g. .ar, .ca, .fr, .ro etc.) referring to particular countries.
-
-
The relevance to search engine terminology is that web sites which
have their own domain name (e.g. http://www.nativetongues.com) will
often achieve better positioning than web sites which exist as a sub-directory
of another organisation's domain (e.g. http://ourworld.compuserve.com/homepages/tijana/).
-
-
Doorway
Page
-
-
-
Dynamic
content
-
Information on web pages which changes or is changed automatically,
e.g. based on database content or user information. Sometimes it's possible
to spot that this technique is being used, e.g. if the URL ends with
.asp, .cfm, .cgi or .shtml. It is possible
to serve dynamic content using standard (normally static) .htm
or .html type pages, though. Search engines will currently index
dynamic content in a similar fashion to static content, although they
will not usually index URLs which contain the ? character.
-
-
-
-
-
-
Regarded as one of the best search engines, with an index of 55 million
pages. It can be slow to index new sites. The URL is http://www.excite.com.
Sites using frames
must have a NOFRAMES section in order to be listed. Some spamming has
been noticed. Excite previously ignored the DESCRIPTION meta
tag, but is now using this in its listings (although
the contents do not affect relevancy, which is based mainly on the title
and body text). The use of gateway
pages and hidden
text is allowed. Excite has an audio/video search facility
which is a branded component of RealNetworks' RealPlayer G2.
- Fake
Copy Listings
- Sometimes a malicious company will steal a web page or the entire
contents of a web site, re-publish at a different URL and register with
one or more search engines. This can cause a loss of traffic from the
original site if the search engines position the copy higher in the
listings. If you find that someone has stolen your site in this way,
write to the company concerned and ask them to remove the stolen content.
Also contact the hosting service used by the company, any company that
benefits from the theft and any search engine(s) concerned. If the thieves
refuse to remove the material or ignore you, obtain legal advice. It
is also well worth having printed evidence to support your claim that
your copy of the material was there first, and that you have the copyright!
See also Mirror
Sites.
-
-
False
Drop
-
A web page retrieved from a search engine or directory which is not
relevant to the query used. This could be for one of the following reasons:
- The web page contained the keywords entered, but used in the wrong
context, with a different meaning or with a different inter-relationship
to that expected.
- The web page is an attempt at spamdexing.
- The search engine has a fault in its database or a bug in its
query program.
-
Font
and Background Spoofs
-
Various techniques used to place invisible text in a web page, to
improve positioning without affecting the appearance of the page. These
are mostly based on setting the font and background colours to the same
value (e.g. white). Most search engines now detect these tricks.
-
-
-
An HTML technique for combining two or more separate HTML documents
within a single web browser screen. Compound interacting documents can
be created to make a more effective web page presented in multiple windows
or sub-windows.
A framed web site often causes great problems for search engines,
and may not be indexed correctly. Search engines will often index
only the part of a framed site within the <NOFRAMES> section,
so make sure that the <NOFRAMES> section includes relevant text
which can be indexed by the spiders. If your site uses frames, consider
providing a gateway page or adding navigational links within the framed
pages. Submit the main page - the one containing the <FRAMESET>
tag to the search engines. If you use a gateway page, submit this
separately.
-
- Gateway
Page
- A web page submitted to a search engine (spyder) to give the relevance-algorithm
of that particular spyder the data it needs, in the format that it needs
it, in order to place a site at the proper level of relevance for the
topic(s) in question. (This determination of topical relevance is called
"placement".)
-
-
A gateway page may present information to the spyder, but obscure
it from a casual human viewer. The gateway page exists so as to allow
a web-site to present one face to the spyder, and another to human viewers.
There are several reasons why one might want to do this. One, is that
the author may not want to publicly disclose placement tactics. Another
is that the format that may be easiest for a given spyder to understand,
may not be the format that the author wishes to present to his viewers
for aesthetics. Still another may be that the format that is best for
one spyder may differ from that which is best for another. By using
gateway pages, you can present your site to each spyder in the way which
is known or thought to be best for that particular spyder.
Also known as bridge pages, doorway page, entry pages, portals or
portal pages.
-
-
An example gateway page:
-
-
-
-
A portal partnership between Infoseek and Disney, with search capabilities
based on the Infoseek
index, at http://go.com/.
-
-
GoTo
(Now referred to as Overture)
-
A search engine, powered by Inktomi,
which only returns one URL per domain in its search results. Operates
a "pay per click" scheme where websites can pay to increase their relevancy.
The URL is http://www.goto.com/
which will redirect to http://www.overture.com.
-
-
Gulliver
-
-
Heading
-
Many search engines give extra weight and importance to the text found
inside HTML heading sections. It is generally considered good advice
to use headings when designing web pages and to place keywords inside
headings.
-
-
-
Text on a web page which is visible to search engine spiders but not
visible to human visitors. This is sometimes because the text has been
set the same colour as the background, because multiple TITLE tags have
been used or because the text is an HTML comment. Hidden text is often
used for spamdexing.
Many search engines can now detect the use of hidden text, and often
remove offending pages from their database or lower such pages' positioning.
-
-
-
-
-
In the context of visitors to web pages, a hit (or site hit) is a
single access request made to the server for either a text file or a
graphic. If, for example, a web page contains ten buttons constructed
from separate images, a single visit from someone using a web browser
with graphics switched on (a "page view") will involve eleven hits on
the server. (Often the accesses will not get as far as your server because
the page will have been cached by a local internet service provider).
-
-
In the context of a search engine query, a hit is a measure of the
number of web pages matching a query returned by a search engine or
directory.
-
-
-
One of the largest search engines, indexing 110 million pages. Powered
by Inktomi,
new submissions appear to be taking two weeks or longer to appear. The
URL is http://www.hotbot.com/.
-
-
HTML
-
HyperText Markup Language - the (main) language used to write web
pages.
-
-
HTTP
-
HyperText Transfer Protocol - the (main) protocol used to communicate
between web servers and web browsers (clients).
-
- Image
Map
- A set of hyperlinks attached to areas of an image. This may be defined
within a web page, or as an external file.
-
-
If the image map is defined as an external file, search engines may
have problems indexing your other pages, unless you duplicate the links
as conventional text hyperlinks.
-
-
If the image map is included within the web page, the search engines
should have no problem following the links, although it's good practice
to provide text links too, to aid the visually impaired and those accessing
the web with graphics switched off or using text only browsers.
-
-
Inbound
Link
-
A hypertext link to a particular page from elsewhere, bringing
traffic
to that page. Inbound links are counted to produce a measure of the
page
popularity. Searches for the inbound links to a page
can be made on Altavista,
Infoseek
and Hotbot.
-
-
Index
-
See Directory.
Also refers to the database of web pages maintained by a search engine
or directory.
-
-
-
-
-
-
One of the largest search engines. New sites are normally added very
quickly, within one or two business days. The URL is
http://www.infoseek.com/.
Infoseek is one of the few search engines to treat singular and plural
forms as the same word. Very sensitive to page
popularity in its positioning
algorithm.
-
-
-
The database used by some of the largest search engines, including
Hotbot.
Inktomi is also used by Yahoo
when no matches are found in Yahoo's own database.
-
-
-
Similar to agent
name delivery, this technique presents different content
depending on the IP address of the client. It is very difficult to view
pages hidden using this technique, because the real page is only visible
if your IP address is the same as (for example) a search engine's spider.
-
- Java
- A computer programming language whose programs can run on a number
of different types of computer and/or operating system. Used extensively
to produce applets
for web pages.
-
-
-
An simple interpreted computer language used for small programming
tasks within HTML web pages. The scripts are normally interpreted (or
run) on the client computer by the web browser. Some search engines
have been known to index these scripts, presumably erroneously.
-
- Keyword
- A word which forms (part of) a search engine query.
-
-
Keyword
Density
-
A property of the text in a web page which indicates how close together
the keywords appear. Some search engines use this property for Positioning.
Analysers are available which allow comparisons between pages. Pages
can then be produced with the similar keyword densities to those found
in high ranking pages.
-
-
Keyword
Domain Name
-
The use of keywords as part of the URL to a website. Positioning
is improved on some search engines when keywords are reinforced in the
URL.
-
-
Keyword
Phrase
-
A phrase which forms (part of) a search engine query.
-
-
Keyword
Purchasing
-
The buying of search keywords from search engines, usually to control
banner ad placement. All the major search engines (except EuroSeek
and GoTo)
insist that keyword purchasing is only used for banner ad placement,
and doesn't influence search results. The display of banner ads for
bought keywords can be studied using a service called Bannerstake
from Thomson and Thomson at http://www.namestake.com
which returns the banner ads displayed when particular queries are used.
-
-
Keyword
Stuffing
-
The repeating of keywords and keyword phrases in META tags or elsewhere.
-
-
-
A file maintained on a server
in which details of all file accesses are stored. Analysing log files
can be a powerful way to find out about a web site's visitors, where
they come from and which queries
are used to access a site.Various software packages are available to
analyse log files, and some are listed below.
-
-
-
-
-
-
-
-
One of the largest search engines, Lycos appears to be moving towards
becoming a directory
and is using the Open
Directory for some search results. It can be slow to
index new sites. The lycos spider ignores meta tags in pages. Lycos
can be found at http://www.lycos.com.
-
-
-
-
-
-
A search of searches. A query is submitted to more than one search
engine or directory, and results are reported from all the engines,
possibly after removal of duplicates and sorting. Also the meta
search engine of the same name, found at http://www.metasearch.com/.
-
-
-
-
-
-
A construct placed in the HTML header of a web page, providing information
which is not visible to browsers. The most common meta tags (and those
most relevant to search engines) are KEYWORDS and DESCRIPTION.
-
-
The KEYWORDS tag allows the author to emphasise the importance of
certain words and phrases used within the page. Some search engines
will respond to this information - others will ignore it. Don't use
quotes around the keywords or keyphrases.
-
-
The DESCRIPTION tag allows the author to control the text of the summary
displayed when the page appears in the results of a search. Again, some
search engines will ignore this information.
-
-
The HTTP-EQUIV meta tag is used to issue HTTP commands, and is frequently
used with the REFRESH tag to refresh page content after a given number
of seconds. Gateway pages sometimes use this technique to force browsers
to a different page or site. Most search engines are wise to this, and
will index the final page and/or reduce the ranking. Infoseek has a
strong policy against this technique, and they might penalize your site,
or even ban it.
-
Other common meta tags are GENERATOR (usually advertising the software
used to generate the page) and AUTHOR (used to credit the author of
the page, and often containing e-mail address, homepage URL and other
information).
-
-
-
-
-
-
Multiple copies of web sites or web pages, often on different servers.
The process of registering these multiple copies with search engines
is often treated as spamdexing, because it artificially increases the
relevancy of the pages. Filters such as the Infoseek Sniffer now remove
multiple mirrors from the indexes.
-
-
Misspellings
-
People quite often spell words incorrectly when using search engines.
Pages which use common misspellings will quite often receive extra hits,
so it is a useful technique to include common misspellings of words
in alt tags, keywords, page names and titles. A similar effect occurs
when spaces are missed out and words are accidentally joined together.
-
-
MultiCrawl
-
-
-
Multiple
Domain Names
-
The use of several extra domains to provide gateway pages or gateway
sites to the main site.
-
-
Multiple
Keyword Tags
-
The use of more than one Keywords META tag in order to try to increase
the relevancy of the best keywords on a page. This is not recommended.
It may be detected as a spamming technique, or all but one of the tags
may simply be ignored.
-
-
Multiple
Titles
-
It used to be possible to repeat the HTML title tag in the header
section of a page several times to improve search engine positioning.
Most search engines now detect this trick.
-
-
NewHoo
-
-
-
-
A search engine with an additional "pay to access" special collection
of business, health and consumer publication articles. The first search
engine to ban meta
search engines from its database. The URL is
http://www.northernlight.com.
-
- Open
Directory Project
- A directory project run by thousands of volunteer editors. In principal,
this is a very exciting and powerful way to organise the web. In practice,
there have been some problems with the behaviour of some of the editors,
which has caused some initial difficulty for the organisers. Initially
known as NewHoo, the project is now part of Netscape (and therefore
of AOL). See http://directory.mozilla.org.
-
-
-
-
-
Optimization
-
Changes made to a web page to improve the positioning of that page
with one or more search engines. A means of helping potential customers
or visitors to find a web site. Optimization may involve design/layout
changes, new text for the title-tags, meta-tags, alt- attributes, headings,
and changes to the first 200-250 words of the main text. A large image
map at the top of a page should be moved further down the page. Frames
should be avoided (unless navigational links are also provided within
the frames).
-
- Page
Popularity
- A measure of the number and quality of links to a particular page
(inbound links). Many search engines (and most noticeably Infoseek)
are increasingly using this number as part of the positioning
process. The number and quality of inbound links is becoming as important
as the optimisation of page content. A free service to measure page
popularity can be found at http://www.linkpopularity.com.
-
-
-
Used in site statistics as a measure of pages viewed rather than server
hits. Many server hits may be made to access a single page, causing
many separate log file entries. Analysis software can determine that
these server hits were generated when a visitor viewed a single page,
and group them together to provide this more useful method of counting
visitors. See also Hit
and Unique
Visitor.
-
-
Placement
-
-
-
-
In order not to overburden any particular server, most search engine
spiders limit their access to each server. If your page is hosted on
the same server as thousands of other pages, the spider may never get
the time to reach (and index) your page. This can be a powerful argument
for having your own server.
-
-
Portal
-
-
-
Portal
Page
-
-
-
-
A generic term for any site which provides an entry point to the internet
for a significant number of users.
Examples are search engines, directories, built-in default browser
or service provider homepages, sites hardwired to browser buttons,
sites offering free homepages, e-mail or personalised news and any
popular (or heavily advertised) sites that significant numbers of
people may bookmark or set as default pages.
-
-
-
The process of ordering web sites or web pages by a search engine
or a directory so that the most relevant sites appear first in the search
results for a particular query. Software such as PositionAgent,
Rank
This and Webposition
can be used to determine how a URL is positioned for a particular search
engine when using a particular search phrase. The GoHip
Search site allows you to see positioning information
from many of the big search engines, displayed all on one page.
-
-
Positioning
Technique
-
A method of modifying a web page so that search engines (or a particular
search engine) treat the page as more relevant to a particular query
(or a set of queries).
-
- Query
- A word, a phrase or a group of words, possibly combined with other
syntax used to pass instructions to a search engine or a directory in
order to locate web pages.
-
-
-
-
-
An alternate website address system in operation at Altavista.
Brand names used in searches are mapped directly to the appropriate
website, usually because the company owning the brand-name has paid
a fee to RealNames. http://www.realnames.com/
-
-
Referrer
-
The URL of the web page from which a visitor came. The server's referrer
log file will indicate this. If a visitor came directly from a search
engine listing, the query used to find the page will usually be encoded
in the referer URL, making it easy to see which keywords are bringing
visitors. The referer information can also be accessed as document.referrer
within JavaScript or via the HTTP_REFERER environment variable (accessible
from scripting languages).
-
-
Refresh
Tag
-
See the paragraph about HTTP_EQUIV under Meta
Tag.
-
-
-
The process of informing a search engine or directory that a new web
page or web site should be indexed.
-
-
Relevancy
Algorithm
-
The method a search engine or directory uses to match the keywords
in a query with the content of each web page, so that the web pages
found can be ordered suitably in the query results. Each search engine
or directory is likely to use a different algorithm, and to change or
improve its algorithm from time to time.
-
-
Re-submission
-
Repeating the search engine registration process one or more times
for the same page or site. Under certain circumstances, this is regarded
with suspicion by the search engines, as it could indicate that someone
is experimenting with spamming techniques.
The Infoseek and Altavista search engines are particularly vulnerable
to spamming because they list sites very quickly, and are thus easy
to experiment with. Both engines de-list sites for repeated re-submission
and Infoseek, for example, does not allow more than one submission
of the same page in a 24 hour period. Occasional re-submission of
changed pages is not normally a problem.
-
-
-
Any browser program which follows hypertext links and accesses web
pages but is not directly under human control. Examples are the search
engine spiders,
the "harvesting" programs which extract e-mail addresses and other data
from web pages and various intelligent web searching programs. A database
of web robots is maintained by Webcrawler.
-
-
robots.txt
-
A text file stored in the top level directory of a web site to deny
access by robots
to certain pages or sub-directories of the site. Only robots which comply
with the Robots Exclusion Standard will read and obey the commands
in this file. Robots will read this file on each visit, so that pages
or areas of sites can be made public or private at any time by changing
the content of robots.txt before re-submitting to the search engines.
The simple example below attempts to prevent all robots from visiting
the /secret directory:
User-agent: *
Disallow: /secret
-
-
-
-
-
A server or a collection of servers dedicated to indexing internet
web pages, storing the results and returning lists of pages which match
particular queries. The indexes are normally generated using spiders.
Some of the major search engines are Altavista,
Excite,
Hotbot,
Infoseek,
Lycos,
Northern
Light and Webcrawler.
Note that Yahoo
is a directory,
not a search engine. The term Search Engine is also often used
to describe both directories and search engines.
-
-
Searchking
-
A smaller search engine which allows visitors to vote on the relevance
of the pages returned by their queries, thus ranking sites based on
the opinions of searchers. Unlike some of the major search engines,
there is good customer support. http://www.searchking.com.
-
-
Search
Term
-
-
-
-
A computer, program or process which responds to requests for information
from a client. On the internet, all web pages are held on servers. This
includes those parts of the search engines and directories which are
accessible from the internet.
-
-
Sidewinder
-
-
-
Siphoning
-
The use of various means to steal another site's traffic. Techniques
used include the wholesale copying of web pages (with the copied page
altered slightly to direct visitors to a different site, and then registered
with the search engines) and the use of keywords or keyword phrases
"belonging" to other organisations, companies or web sites.
-
-
Site
Hit
-
-
-
Skewing
-
Artificially changing search engine results so that, for example,
popular queries will return artificially created listings. Infoseek
is currently experimenting with this technique, using a small group
of reviewers to artificially force higher relevance for certain sites.
-
-
Slurp
-
-
-
-
-
-
Sniffer
-
The name of the filter program used by the Infoseek search engine
to prevent spamdexing. It detects multiple mirror pages, font and background
spoofs, multiple title tags, keyword stuffing and possibly other types
of spamdexing.
-
-
-
The alteration or creation of a document with intent to deceive an
electronic catalog or filing system. Any technique that increases the
potential position of a site at the expense of the quality of the search
engine's database can also be regarded as spamdexing - also known as
spamming or spoofing.
-
-
Spamming
-
See spamdexing.
Spamming is also used more generally to refer to the sending of unsolicited
bulk electronic mail, and the search engine use is derived from this
term.
-
-
-
That part of a search engine which surfs the web, storing the URLs
and indexing the keywords and text of each page it finds. Please refer
to the Search Engine Watch SpiderSpotting
Chart for details of individual spiders. See also Robot.
-
-
Spidering
-
The process of surfing the web, storing URLs and indexing keywords,
links and text.
-
-
Typically, even the largest search engines cannot spider all of the
pages on the net. This is due to the huge amount of data available,
the speed at which the new data appears, the use of politeness
windows and practical limits on the number of pages that
can be visited in a given time . The search engines have to make compromises
in order to visit as many sites as possible, and they do this in different
ways. For example, some only index the home pages of each site, some
only visit sites they're explicitly told about, and some make judgements
about the importance of sites (from number and quality of inbound links)
before "digging deeper" into the subpages of a site.
-
-
-
Similar to a gateway
page but provides an initial display which must be viewed
before a visitor reaches the main page. This usually acts as a kind
of "opening title" sequence, and can be extremely annoying.
-
-
Spoofing
-
-
-
SSI
-
Server Side Includes. Used (for example) to add dynamically generated
content to a web page.
-
-
Stealth
Script
-
A CGI script which switches page content depending on who or what
is accessing the page. See agent
name delivery.
-
-
Stemming
-
A function of some search engines and directories which allows results
to be returned from some or all keywords based on the same stem as the
keyword entered as a search term. For example, when stemming is switched
on, a search for the word dance will return matches for any word
whose stem is danc-, matching the keywords dance, dancer
and dancing.
-
-
-
A word which is ignored in a query because the word is so commonly
used that it makes no contribution to relevancy. Examples are common
net words such as computer and web, and general words
like get, I, me, the and you.
-
-
-
Any agent which submits your site to many search engines and directories.
Useful to get listed with many of the minor search engines, but don't
rely on such services to get listed with the major search engines. Many
of these services are automatic and run from web sites. Others run off
line. Some are free. Beware of supplying your email address to the so
called FFA (free for all) services - you may receive lots of spam.
-
- Title
- The text contained between the start and end HTML tags of the same
name. This text is associated with (but not displayed in) the web page
containing these tags, and is displayed in a special position (usually
at the top of the window) by the web browser.
-
- Title text is important because it normally forms the link to the
page from the search engine listings, and because the search engines
pay special attention to the title text when indexing the page.
-
- Don't confuse this text with heading text within the web page which
often looks like the title. Usually this will be rendered either using
the HTML heading tags or just rendered with a large font size.
-
-
-
The visitors to a web page or web site. Also refers to the number
of visitors, hits, accesses etc. over a given period.
-
- Unique
Visitor
- A real visitor to a web site.
-
- Web servers record the IP addresses of each visitor, and this is used
to determine the number of real people who have visited a web site.
-
- If for example, someone visits twenty pages within a web site, the
server will count only one unique visitor (because the page accesses
are all associated with the same IP address) but twenty page accesses.
-
- See also hit
and page
view.
-
-
URL
-
Universal Resource Locator. An address which can specify any internet
resource uniquely. The beginning of the address indicates the type of
resource - e.g. http: for web pages, ftp: for file transfers, telnet:
for computer login sessions or mailto: for e-mail addresses.
-
-
URL
Submission
-
-
-
-
An account on a hosting company server, usually linked to its own
domain. This provides an inexpensive way to run a web site with its
own top level domain, and is usually indistinguishable from having a
separate physical server, except that the virtual server may share an
IP address with other virtual servers on the same machine. A virtual
server account is fine for most uses, but will often be slower to respond
than a physically separate server, and physical access to the machine
will seldom be allowed. The cost of a virtual server account is a small
fraction of that needed to run a real server, mainly because of the
expense of the dedicated line needed to connect the server continuously
to the rest of the net.
-
- Web
Copywriting
- The writing of text especially for a web page. Similar to the writing
of copy for any other type of publication, good web copywriting can
have a great effect on search engine positioning, so it forms a major
part of optimization.
-
-
-
-
- XML
- Extensible Markup Language. A new language which promises more efficient
data delivery over the web. XML does nothing itself - it must be implemented
using 'parser' software or XSL.
-
-
-
Extensible Scripting Language - an XML
style sheet language supported by the newer web browsers Internet Explorer
5 and Netscape 5.
-
- Yahoo
- Similar to a search engine, but with a database generated by hand,
this is the world's most used directory of web sites. The main URL is
http://www.yahoo.com.
It is notoriously difficult to get listed in Yahoo and, once listed,
even more difficult to get your listing changed or to get out! To increase
the odds of getting listed, try the following:
- Select the three categories you want to be listed in very carefully.
Consider the regional categories. Ensure that the categories match the
content of your site.
- Apply to one of their local subsidiaries for your own country or city.
- Make sure that your site is well-designed and easy to navigate.
- Ensure your site has no dead links.
- Ensure that your pages download quickly.
- Provide good contact information on your site.
- If you manage to get listed, keep the e-mail they send you. You can
e-mail the same person subsequently to get your listing changed.
-
|
|