Distributed Web Search

Speaker:  Ricardo Baeza-Yates – Palo Alto, CA, United States
Topic(s):  Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing

Abstract

In the ocean of Web data, Web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters. Web data, however, is always evolving. The number of active Web sites continues to grow (close to 200 million in 2017) and there are currently more than hundreds of billion indexed pages. On the other hand, Internet users are above two billion and billions of queries are issued each day. Soon, centralized systems are likely to become less effective against such a data-query load, thus suggesting the need of fully distributed search engines.  Such engines need to maintain high quality answers, fast response time, high query throughput, high availability and scalability; despite network latency and scattered data. In this talk we present the main challenges behind the design of a distributed Web retrieval system and our research in all the components of such web search engine: crawling, indexing, and query processing. 

About this Lecture

Number of Slides:  100
Duration:  45 - 120 minutes
Languages Available:  English, Portuguese, Spanish
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.