3.4.3

Search Engine Indexing

Test yourself

Search Engine Indexing

In order for search engines to find useful results when users carry out web searches, they have to store details of as many web pages as possible.

Illustrative background for IndexingIllustrative background for Indexing ?? "content

Indexing

  • In order for search engines to find useful results when users carry out web searches, they have to store details of as many web pages as possible.
  • Search engine indexing refers to the methods used to maintain such a database.
  • Search engine indexing allows results to be found efficiently.
Illustrative background for Web crawlersIllustrative background for Web crawlers ?? "content

Web crawlers

  • Search engines make use of software known as web crawlers, which traverse the web by visiting web pages and following the links contained on each.
  • The web crawlers store key information about each page in an index.
  • This is then used by the search engine to find results for a web user’s search.
Illustrative background for Indexed informationIllustrative background for Indexed information ?? "content

Indexed information

  • The information stored may include:
    • Keywords in the webpage.
    • Meta tags within the code of the page.
    • How recently the website is updated.

PageRank Algorithm

The PageRank algorithm is a method of ranking the web pages in a set of Google search results.

Illustrative background for PageRank algorithmIllustrative background for PageRank algorithm ?? "content

PageRank algorithm

  • PageRank is named after Larry Page who, along with fellow Stanford University student Sergey Brin, developed the algorithm in the mid-1990s.
    • The algorithm judges how important each web page is.
  • The higher the score for a particular page, the closer to the top of the list of results it appears.
Illustrative background for OperationIllustrative background for Operation ?? "content

Operation

  • Google searches often contain millions of results, so PageRank is an important method of ensuring useful pages are more prominent in the list.
  • The algorithm has to be run constantly for each web page as the number of external links to the page can change at any time.
Illustrative background for AlgorithmIllustrative background for Algorithm ?? "content

Algorithm

  • The PageRank algorithm is written as follows:
    • PR(A)=(1d)+d(PR(T1)C(T1)+...+PR(Tn)C(Tn))PR(A) = (1-d) + d \left(\frac{PR(T_1)}{C(T_1)} + ... + \frac{PR(T_n)}{C(T_n)}\right)
  • Using ‘Web page A’ as an example:
    • PR(A)PR(A) is to the PageRank of ‘Web page A’.
    • PR(Ti)PR(T_i) is the PageRank of any of nn pages which link to ‘Web page A’.
    • C(Ti)C(T_i) is the number of outbound links on the pages which link to ‘Web page A’.
    • dd is the damping factor, which is usually set to 0.85.

Jump to other topics

1Components of a Computer

2Software & Software Development

3Exchanging Data

4Data Types, Data Structures & Algorithms

5Legal, Moral, Cultural & Ethical Issues

6Elements of Computational Thinking

6.1Thinking Abstractly

6.2Thinking Procedurally

6.3Thinking Logically

7Problem Solving & Programming

8Algorithms

Go student ad image

Unlock your full potential with GoStudent tutoring

  • Affordable 1:1 tutoring from the comfort of your home

  • Tutors are matched to your specific learning needs

  • 30+ school subjects covered

Book a free trial lesson