Parameswaran, Aditya and Garcia-Molina, Hector and Rajaraman, Anand (2010) Towards the Web of Concepts: Extracting Concepts from Large Datasets. Proceedings of the Very Large Data Bases Conference (VLDB) , 3 ((1-2)).
BibTeX | DublinCore | EndNote | HTML |
| PDF (Towards the Web of Concepts: Extracting Concepts from Large Datasets) - Draft Version 229Kb |
Abstract
Concepts are sequences of words that represent real or imaginary entities or ideas that users are interested in. As a first step towards building a web of concepts that will form the backbone of the next generation of search technology, we develop a novel technique to extract concepts from large datasets. We approach the problem of concept extraction from corpora as a market-baskets problem, adapting statistical measures of support and confidence. We evaluate our concept extraction algorithm on datasets containing data from a large number of users (e.g., the AOL query log data set), and we show that a high-precision concept set can be extracted.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | concepts, concept mining, web of concepts, information extraction, query logs, association-rule mining, algorithms, experimentation |
Projects: | Miscellaneous |
ID Code: | 917 |
Deposited By: | Aditya Parameswaran |
Deposited On: | 09 Apr 2009 13:40 |
Last Modified: | 01 Jul 2011 15:17 |
Download statistics
Repository Staff Only: item control page