Semantic Search

Speaker:  Ricardo Baeza-Yates – Palo Alto, CA, United States
Topic(s):  Artificial Intelligence, Machine Learning, Computer Vision, Natural language processing


Semantic search lies in the cross roads of information retrieval and natural language
processing and is the current frontier of search technology. The first part consists in building a semantically annotated index with the help of a knowledge base.  For this we first need to predict the language of each document and parse it accordingly to that language. Second, we need to extract all entities and concepts mentioned in the document with the help of the knowledge base. All the knowledge base infrastructure needs to be independent of the language and we instantiate each language in the lexicon of the knowledge base.

The second part is predicting the intention behind the query, which implies doing semantic query understanding.  This process implies the same semantic processing as document. After, based on all this information, we should predict one or more possible intentions with a certain probability, which is particularly important for ambiguous queries. These scores will be one of the inputs for the final semantic ranking. For example, given the query ``bond'', possible results for query understanding are a financial instrument, the movie character, a chemical reaction, or a term for endearment.

  Semantic ranking refers to ranking search results using semantic information. In a standard search engine, a rank is computed by using signals or features coming from the search query, from the documents in the collection being searched and from the search context, such as the language and device being used. In our case, we add semantic relations between the entities and concepts found in the query was the same objects in the documents, that will come from different data sources. For this we use machine learning in several stages. The first stage selects the data sources that we should use to answer the query. In the second stage, each data source generates a set of answers using ``learning to rank.'' The third and final stage ranks these data sources, selecting and ordering the intentions as well as the answers inside each intention (e.g., news) that will appear in the final composite answer. All these stages are language independent, but may use language dependent features.

About this Lecture

Number of Slides:  60
Duration:  45 - 90 minutes
Languages Available:  English, Portuguese, Spanish
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.