Schema-agnostic queries for large-schema databases: A distributional semantics approach

The evolution of data environments towards the growth in the size, complexity, dynamicity and decentralisation (SCoDD) of schemas drastically impacts contemporary data management. The SCoDD trend emerges as a central data management concern in Big Data scenarios, where users and applications have a demand for more complete data, produced by independent data sources, under different semantic assumptions and contexts of use. Most Database Management Systems (DBMSs) today target a closed communication scenario, where the symbolic schema of the database is known a priori by the database user, which is able to interpret it in an unambiguous way. The context in which the data is consumed and produced is well-defined and it is typically the same context in which the data was created. In contrast, data management under the SCoDD conditions target an open communication scenario where the symbolic system of the database is unknown by the user and multiple interpretation contexts are possible. In this case the database can be created under a different context from the database user. The emergence of this new data environment demands the revisit of the semantic assumptions behind databases and the design of data access mechanisms which can support semantically heterogeneous (open communication) data environments.


This work aims at filling this gap by proposing a complementary semantic model for databases, based on distributional semantic models. Distributional semantics provides a complementary perspective to the formal perspective of database semantics, which supports semantic approximation as a first-class database operation. Differently from models which describe uncertain and incomplete data or probabilistic databases, distributional-relational models focuses on the construction of conceptual approximation approaches for databases, supported by a comprehensive semantic model automatically built from large-scale unstructured data external to the database, which serves as a semantic/commonsense knowledge base. The semantic model can be used to support schema-agnostic queries, i.e. abstracting the data consumer from a specific conceptualization behind the data.


The proposed distributional-relational semantic model is supported by a distributional structured vector space model, named τ − Space, which represents structured data under a distributional semantic model representation which, in coordination with a query planning approach, supports a schema-agnostic query mechanism for large-schema databases. The query mechanism is materialized in the Treo query engine and is evaluated using schema-agnostic natural language queries.


The evaluation of the query mechanism confirms that distributional semantics provides a high-recall, medium-high precision, and low maintainability solution to cope with the abstraction and conceptual-level differences in schema-agnostic queries over large-schema/schema-less open domain datasets. Moreover, the compositional semantic model defined by the query planning mechanism supports expressive schema-agnostic queries over large-schema/schema-less open domain datasets. The proposed distributional-relational structured vector space model (τ − Space) materialized as an inverted index, supports the development of a schema-agnostic query mechanism with interactive query execution time.

Team

André Freitas

Dr Edward Curry

Institution: NUI Galway

Funder

Relevant Publications

2020
[30] André Freitas, Seán O'Riáin, Edward Curry, "Querying and Searching Heterogeneous Knowledge Graphs in Real-time Linked Dataspaces", Chapter in Real-time Linked Dataspaces, Springer International Publishing, Cham, pp. 105-124, 2020. [bib] [pdf] [doi]
2016
[29] André Freitas, Edward Curry, "Big Data Curation", Chapter in New Horizons for a Data-Driven Economy: A Roadmap for Usage and Exploitation of Big Data in Europe, Springer International Publishing, 2016. [bib] [pdf] [doi]
2015
[28] André Freitas, João C.P. da Silva, Edward Curry, Paul Buitelaar, "Approximate and selective reasoning on knowledge graphs: A distributional semantics approach", In Data & Knowledge Engineering, vol. 100, pp. 211-225, 2015. [bib] [pdf] [doi]
[27] Andre Freitas, Siegfried Handschuh, Edward Curry, "Distributional-Relational Models: Scalable Semantics for Databases", In AAAI Spring Symposium Series (SSS-15), pp. 57-60, 2015. [bib] [pdf]
[26] Andre Freitas, Juliano Efson Sales, Siegfried Handschuh, Edward Curry, "How hard is this query? Measuring the Semantic Complexity of Schema-agnostic Queries", In 11th International Conference on Computational Semantics (IWCS 2015), London, UK, pp. 294-304, 2015. [bib] [pdf]
2014
[25] Andre Freitas, Joao C. Pereira da Silva, Edward Curry, "On the Semantic Mapping of Schema-agnostic Queries: A Preliminary Study", In Natural Language Interfaces for Web of Data Workshop 2014, Rival del Garda, 2014. [bib] [pdf]
[24] André Freitas, João Carlos Pereira da Silva, Edward Curry, Paul Buitelaar, "A Distributional Semantics Approach for Selective Reasoning on Commonsense Graph Knowledge Bases", Chapter in 19th International Conference on Application of Natural Language to Information Systems (NLDB 2014), Springer, Montpellier, France, pp. 21-32, 2014. [bib] [pdf] [doi]
[23] Andre Freitas, Edward Curry, "Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach", In 18th International Conference on Intelligent User Interfaces (IUI'14), ACM, Haifa, Israel, pp. 279-288, 2014. [bib] [pdf]
[22] Danilo Carvalho, Cagatay Calli, André Freitas, Edward Curry, "EasyESA: A Low-effort Infrastructure for Explicit Semantic Analysis (Demo)", In 13th International Semantic Web Conference (ISWC 2014), Springer, Rival del Garda, pp. 177-180, 2014. [bib] [pdf]
[21] André Freitas, Rafael Vieira, Edward Curry, Danilo Carvalho, João Carlos Pereira da Silva, "On the Semantic Representation and Extraction of Complex Category Descriptors", Chapter in 19th International Conference on Application of Natural Language to Information Systems (NLDB 2014), Springer, Montpellier, France, pp. 45-50, 2014. [bib] [pdf] [doi]
2013
[20] André Freitas, João Gabriel Oliveira, Seán O'Riain, João C.P. da Silva, Edward Curry, "Querying linked data graphs using semantic relatedness: A vocabulary independent approach", In Data & Knowledge Engineering, vol. 88, pp. 126-141, 2013. [bib] [pdf] [doi]
[19] Andre Freitas, Sean O'Riain, Edward Curry, Joao C.P. da Silva, Danilo S. Carvalho, "Representing Texts as Contextualized Entity-Centric Linked Data Graphs", In 2013 24th International Workshop on Database and Expert Systems Applications, IEEE, Prague, pp. 133-137, 2013. [bib] [pdf] [doi]
[18] André Freitas, Fabrício F. de Faria, Seán O'Riain, Edward Curry, "Answering natural language queries over linked data graphs", In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval, ACM, New York, NY, USA, pp. 1107-1108, 2013. [bib] [pdf] [doi]
[17] André Freitas, Edward Curry, "Do it yourself (DIY) Jeopardy QA System (Demo)", In 12th International Semantic Web Conference (ISWC 2013), Sydney, Australia, pp. 1-4, 2013. [bib] [pdf]
[16] Andre Freitas, Joao C. P. da Silva, Sean O'Riain, Edward Curry, "Distributional Relational Networks", In AAAI 2013 Fall Symposium on Semantics for Big Data, pp. 1-6, 2013. [bib] [pdf]
[15] André Freitas, Sean O'Riain, Edward Curry, "Crossing the Vocabulary Gap for Querying Complex and Heterogeneous Databases: A Distributional-Compositional Semantics Perspective (Abstract)", In 3rd Workshop on Data Extraction and Object Search (DEOS), 29th British National Conference on Databases (BNCOD), Oxford, UK, 2013. [bib]
[14] André Freitas, Seán O'Riain, Edward Curry, "A Distributional Semantic Search Infrastructure for Linked Dataspaces", Chapter in 10th European Semantic Web Conference (ESWC), Springer, Montpellier, France, pp. 214-218, 2013. [bib] [pdf] [doi]
2012
[13] Tope Omitola, André Freitas, Edward Curry, Séan O'Riain, Nicholas Gibbins, Nigel Shadbolt, "Capturing Interactive Data Transformation Operations Using Provenance Workflows", Chapter in 3rd International Workshop on Role of Semantic Web in Provenance Management (SWPM 2012), pp. 29-42, 2012. [slides] [bib] [pdf] [doi]
[12] André Freitas, Benedikt Kämpgen, João Gabriel Oliveira, Seán O'Riain, Edward Curry, "Representing Interoperable Provenance Descriptions for ETL Workflows", Chapter in 3rd International Workshop on Role of Semantic Web in Provenance Management (SWPM 2012), pp. 43-57, 2012. [slides] [bib] [pdf] [doi]
[11] André Freitas, Edward Curry, João Gabriel Oliveira, Sean O'Riain, "Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends", In IEEE Internet Computing, vol. 16, no. 1, pp. 24-33, 2012. [slides] [bib] [pdf] [doi]
[10] André Freitas, Edward Curry, Seán O'Riain, "A distributional approach for terminological semantic search on the Linked Data Web", In Proceedings of the 27th Annual ACM Symposium on Applied Computing - SAC '12, ACM Press, New York, New York, USA, pp. 384-392, 2012. [slides] [bib] [pdf] [doi]
[9] Andre Freitas, Danilo Carvalho, João Carlos Silva, Sean O'Riain, Edward Curry, "A Semantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia", In 1st Workshop on the Web of Linked Entities (WoLE 2012), Boston, MA, pp. 70-81, 2012. [slides] [bib] [pdf]
2011
[8] Andre Freitas, Joao Gabriel Oliveira, Edward Curry, Se´n O'Riain, "A Multidimensional Semantic Space for Data Model Independent Queries over RDF Data", In 2011 IEEE Fifth International Conference on Semantic Computing, IEEE, pp. 344-351, 2011. [slides] [bib] [pdf] [doi]
[7] Andre Freitas, Tomas Knap, Sean O'Riain, Edward Curry, "W3P: Building an OPM based provenance model for the Web", In Future Generation Computer Systems, vol. 27, no. 6, pp. 766-774, 2011. [bib] [pdf] [doi]
[6] André Freitas, Edward Curry, Joao Gabriel Oliveira, Sean O'Riain, "A Distributional Structured Semantic Space for Querying RDF Graph Data", In International Journal of Semantic Computing, vol. 5, no. 4, pp. 433-462, 2011. [slides] [bib] [pdf] [doi]
[5] André Freitas, João Gabriel de Oliveira, Sean O'Riain, Edward Curry, João Carlos Pereira da Silva, "Treo: Combining Entity-Search, Spreading Activation and Semantic Relatedness for Querying Linked Data", In 1st Workshop on Question Answering over Linked Data (QALD-1), pp. 1-14, 2011. [bib] [pdf]
[4] André Freitas, João Oliveira, Sean O'Riain, Edward Curry, João Pereira da Silva, "Querying Linked Data using Semantic Relatedness: A Vocabulary Independent Approach", In Proceedings of the 16th International Conference on Applications of Natural Language to Information Systems, NLDB 2011, Springer Berlin Heidelberg, vol. 6716, Berlin, Heidelberg, pp. 40-51, 2011. [slides] [bib] [pdf] [doi]
[3] André Freitas, João Oliveira, Seán O'Riain, Edward Curry, João Pereira da Silva, "Natural Language Processing and Information Systems", Springer Berlin Heidelberg, vol. 6716, Berlin, Heidelberg, pp. 286-289, 2011. [bib] [pdf] [doi]
2010
[2] André Freitas, Arnaud Legendre, Sean O'Riain, Edward Curry, "Prov4J: A Semantic Web Framework for Generic Provenance Management", In The Second International Workshop on Role of Semantic Web in Provenance Management (SWPM 2010), pp. 1-6, 2010. [slides] [bib] [pdf]
[1] Edward Curry, Andre Freitas, Sean O'Riáin, "The Role of Community-Driven Data Curation for Enterprises", Chapter in Linking Enterprise Data, Springer US, Boston, MA, pp. 25-47, 2010. [slides] [bib] [pdf] [doi]
Powered by bibtexbrowser