Parameswaran, Aditya and Kaushik, Raghav and Arasu, Arvind Efficient Parsing-based Keyword Search over Databases. Technical Report. Stanford InfoLab.
BibTeX | DublinCore | EndNote | HTML |
![]() | PDF 1416Kb |
Abstract
We study a parsing-based semantics for keyword search over databases that relies on parsing the search query using a grammar. The parsing-based semantics is often used to override the traditional “bag-of-words” semantics in web search and enterprise search scenarios. Compared to the “bag-of-words” semantics, the parsing-based semantics is richer and more customizable. While a formalism for parsing-based semantics for keyword search has been proposed in prior work and ad-hoc implementations exist, the problem of designing efficient algorithms to support the semantics is largely unstudied. In this paper, we present a suite of efficient algorithms and auxiliary indexes for this problem. Our algorithms work for a broad classes of grammars used in practice, and cover a variety of database matching functions (set- and substring-containment, approximate and exact equality) and scoring functions to filter and rank different parses. We formally analyze the running time complexity of our algorithms and provide a thorough empirical evaluation over real-world data to show that our algorithms scale well with the size of the database and grammar
Item Type: | Techreport (Technical Report) |
---|---|
ID Code: | 1057 |
Deposited By: | Aditya Parameswaran |
Deposited On: | 17 Oct 2012 20:39 |
Last Modified: | 17 Oct 2012 20:39 |
Download statistics
Repository Staff Only: item control page