Scientific American explores the science behind web searching, and goes on to look at research on the cutting edge of the field.

First, prospective content is identified and collected on an ongoing basis. Special software code called a crawler is used to probe pages published on the Web, retrieve these and linked pages, and aggregate pages in a single location. In the second step, the system counts relevant words and establishes their importance using various statistical techniques. Third, a highly efficient data structure, or tree, is generated from the relevant terms, which associates those terms with specific Web pages.

Link



Related Leave a Comment