In the 1990’s there were just around 600 webpages which were slim enough to handle and thus was easy to find. Later it grew when the first ever directory, Global Navigator Network (GNN) came into being. Later it paved path to many other directories which helped people to find a webpage. And thus came the popular Yahoo Web Directory. Well now that the web widened vastly, it became difficult for a directory to tell “what was in each site” but could tell “what each site is about”. This difference let the thought for Webcrawler and this evolved to search engine. The early search engines involved Excite, HotBot, NorthernLight etc. Then in the 1998 came the Google and pioneered or could say rather killed Web Directories.
Google uses the pagerank system (an algorithm) to find the popularity of a page. Again when the www grew it was again difficult to handle. The solution to this was links. Links are shortterm for hyperlink. Hyperlink is a connection that connects either to another page or to somewhere in the same page. More the links to a page more the rank for the page. But a less number of links from a high ranked page rewarded better (Always remember that search engine is smart enough to detect garbage links pointing to a webpage). Links could tell the crawler the presence of a website and this could index its pages. Search Engines crawls web pages using a software called spider which follows link after link. It just not reads the words of the webpage but maintains the order, relationship etc.
Now that Google reigns over the internet many clone scripts provider websites became curious about its working and tried to build theirs. Inout Scripts is one such company which keenly loved the crawler software and came up with Search Engine Script during early 2000s. The Script used API keys from major search engines (Google, Yahoo, Bing). Inout Site Search is another useful tool which allows integration to any website and fetch results, gives a faster search experience.
Inout Scripts’ crave for technology didn’t stop them and built a GoogleBot clone (Inout Spider), an actual crawler software which can build its own database. A Database which is scalable (Hadoop system). The crawler software starts with an “URL” basically termed as “seed” and then it crawls and build its own database. It means that you need to add in initial url and then it begins from there, give some time like a day or 2 and you will see how your DB has grown. Doesn’t it sound awesome? With a Search Engine script and a Spider like software you can have your own search solution.
Now that Search engines helps better, people loves searching than go to a web directory. It saves time to find the content you are looking for in a website. Pops up the most relevant and matching content for you. Well, Dmoz is a popular directory which provides information about a website, do explore if not yet.