every webmaster as long as your site is not serious drop right, then through the website backstage server, you can find diligent spider visit your site, but you have not thought about from the programming point of view, the spider is how to do? For this, the parties have different opinions. There is a saying that the spider is from seed station (or high weight station), from high to low according to the weight of the layer. Another version of spider crawling in the URL collection is no obvious sequence, the search engine will be based on your website content update rules, automatically calculate the best time is when you climb the site, and then grab.
page collection, spiders crawl the web is what we used to say. Then the spider (called noble baby robot), they were interested in the page is divided into three categories:
three procedures are generally explained and summarized, some detailed technical details will be used separately to explain other articles.
1. has never been caught spiders.
This is just to
so how effective the discovery of the three page and crawl, is the original intention and purpose of spider program design. Then there is a problem related to the starting point of a spider crawling.
for different search engines will differ from its starting point for the capture, love Shanghai, Mr.Zhao is more inclined to the latter. In a way the index page link completion mechanism of love Shanghai official blog published "(address: 贵族宝贝stblog.baidu-tech贵族宝贝/? P=2057) in the paper, it clearly pointed out that" spider "will try to detect the release cycle, with reasonable frequency to check the site
3. spiders crawl, but now the page has been deleted.
2. spider have been, but there are changes to the page content page.
search engine? Some people would say that is the accuracy of the search results, some people will say that the richness of query results, but these are not the most deadly local search engine. For the search engines, the most deadly is the query time. Just imagine, if you query a keyword in Shanghai love interface, it takes 5 minutes to put your query results back to you, it must be you soon abandoned love Shanghai.
search engine in order to meet the requirement of speed demanding (now query time unit commercial search engines are the number of microsecond level), so the cache supports query needs, that is to say we have received in the query search results are not timely, but the server cache has good results. Then the general process of search engines is what look like? We can understand three segment.
The new page The What is the most important