Loose coupling in heterogeneous event-based systems via approximate semantic matching and dynamic enrichment

There has been a significant change in the data landscape with the emergence of the Internet of Things (IoT). Tens of billions of devices are expected to connect to the Internet in the coming years within smart buildings, smart grids, smart cities, and cyber-physical systems. A basic requirement to realize the IoT is an infrastructure of sensing and communication solutions. Middleware systems, such as event processing, are also required to abstract the application developers from the underlying technologies.


Large-scale event processing environments are open, distributed, and heterogeneous in semantics and contexts. Interoperability is a key requirement and currently addressed by top-down granular agreements represented by ontologies and taxonomies for semantics. Such approaches are non-scalable, and achieving such agreements may be unfeasible under the characteristics of current and future event environments such as the IoT. This thesis analyses this problem using a decoupling versus coupling trade-off framework.

Event producers and consumers do not know each other and are decoupled in space, time, and synchronization to enable scalable deployments. They have boundaries that they have to cross in order to communicate with other systems. Such boundaries are syntactic, semantic, and pragmatic. Events are boundary objects that convey meanings signified by symbols. They must effectively cross the three levels of boundaries to establish interoperability and communication between event agents.

The current event processing paradigm is focused on crossing lower syntactic boundaries. Thus, human agents are needed in the loop to cross semantic and pragmatic boundaries through explicit agreements on event types, properties, values, and contexts, introducing coupling into these systems. Coupling limits the paradigm and contradicts the fundamental basis of decoupling for scalability. A trade-off can be concluded between decoupling for scalability and coupling for interoperability.

Space, time, and synchronization decoupling dimensions of event systems contribute to event transfer. I define two new types of problematic coupling dimensions: the semantic coupling and the pragmatic coupling. They correspond to granular and labour-intensive agreements on event semantics and contexts by humans involved in developing and using the event system. Such agreements may not be feasible in large-scale environments such as the IoT. Current approaches to semantic and context interoperability in event processing are coupled on one or more of these two dimensions, limiting scalability.


This thesis concerns two research questions of how semantic and pragmatic coupling can be loosened effectively and efficiently. I propose an approach based on four elements: subsymbolic semantics, free tagging, dynamic native enrichment, and approximation. A statistical vector-space model of semantics is built from a textual corpus that reflects the mutual understanding of event producers and consumers. Subscriptions are consumers’ expressions to match events of interest. Free tags, called themes, are added to events and subscriptions to improve their meanings. Subscriptions are enhanced with indications of context to dynamically enrich events. Terms in events and subscriptions are decoded into their subsymbolic vector representations that are then matched using an approximate probabilistic matcher, resulting in scored relevance of events to subscriptions.

The hypotheses underlying the proposed approach are empirically validated within synthetic and real-world scenarios from the smart cities and energy management domains. A loose semantic coupling can be achieved with coarse-grained agreements on statistical semantics, with 100 approximate subscriptions compensating for 74, 000 exact subscriptions otherwise needed. The approximate matcher achieves a magnitude of 1, 000 events/sec of throughput, and an effectiveness of over than 95% F1Measure. Using thematic tagging, a lightweight amount of tags is needed: around 2−7 for events and 2−15 for subscriptions. It delivers a magnitude of 800 events/sec in the worst case and 85% F1Measure as opposed to 62% worst-case for non-thematic processing.


Loose pragmatic coupling is achieved with 4 high-level clauses in the subscriptions to guide the dynamic enricher. They specify the source, the retrieval method, the context search strategy, and the fusion method of events with context. Enrichment is instantiated with spreading activation in Linked Data graphs. It is tested with 24, 000 events, with live DBpedia, a structured version of Wikipedia, as a contextual source. It reaches an efficiency and effectiveness of 7 times more than other instantiations of the enricher. The research discussed in this thesis has been deployed in working systems for energy and water management where it has had an impact on real world applications. The model has also been developed into the concept of thingsonomies, an architecture for the Internet of Things that can tackle variety and allows IoT systems to evolve into large-scale, heterogeneous, and loosely coupled environments.

Team

Souleiman Hasan

Dr Edward Curry

Institution: NUI Galway

Funder

Relevant Publications

2021
[28] Edward Curry, Sonja Zillner, Andreas Metzger, Arne J. Berre, Sören Auer, Ray Walshe, Marija Despenic, Milan Petkovic, Dumitru Roman, Walter Waterfeld, Robert Seidl, Souleiman Hasan, Umair ul Hassan, Adegboyega Ojo, "Technical Research Priorities for Big Data", Chapter in The Elements of Big Data Value, Springer International Publishing, Cham, pp. 97-126, 2021. [bib] [pdf] [doi]
2020
[27] Souleiman Hasan, Edward Curry, "Approximate Semantic Event Processing in Real-time Linked Dataspaces", Chapter in Real-time Linked Dataspaces, Springer International Publishing, Cham, pp. 209-225, 2020. [bib] [pdf] [doi]
[26] Edward Curry, Willem Fabritius, Souleiman Hasan, Christos Kouroupetroglou, Umair ul Hassan, Wassim Derguech, "A Model for Internet of Things Enhanced User Experience in Smart Environments", Chapter in Real-time Linked Dataspaces, Springer International Publishing, Cham, pp. 271-294, 2020. [bib] [pdf] [doi]
[25] Edward Curry, Wassim Derguech, Souleiman Hasan, Christos Kouroupetroglou, Umair ul Hassan, Willem Fabritius, "Building Internet of Things-Enabled Digital Twins and Intelligent Applications Using a Real-time Linked Dataspace", Chapter in Real-time Linked Dataspaces, Springer International Publishing, Cham, pp. 255-270, 2020. [bib] [pdf] [doi]
2019
[24] Edward Curry, Wassim Derguech, Souleiman Hasan, Christos Kouroupetroglou, Umair ul Hassan, "A Real-time Linked Dataspace for the Internet of Things: Enabling “Pay-As-You-Go” Data Management in Smart Environments", In Future Generation Computer Systems, vol. 90, pp. 405-422, 2019. [bib] [pdf] [doi]
2018
[23] Edward Curry, Souleiman Hasan, Christos Kouroupetroglou, Willem Fabritius, Umair ul Hassan, Wassim Derguech, "Internet of Things Enhanced User Experience for Smart Water and Energy Management", In IEEE Internet Computing, vol. 22, no. 1, pp. 18-28, 2018. [bib] [pdf] [doi]
2017
[22] Piyush Yadav, Umair ul Hassan, Souleiman Hasan, Edward Curry, "The Event Crowd", In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, ACM, New York, NY, USA, pp. 44-53, 2017. [bib] [pdf] [doi]
[21] Tarek Zaarour, Niki Pavlopoulou, Souleiman Hasan, Umair ul Hassan, Edward Curry, "Automatic Anomaly Detection over Sliding Windows", In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, ACM, New York, NY, USA, pp. 310-314, 2017. [bib] [pdf] [doi]
[20] Asra Aslam, Souleiman Hasan, Edward Curry, "Challenges with Image Event Processing", In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems, ACM, New York, NY, USA, pp. 347-348, 2017. [bib] [pdf] [doi]
[19] Piyush Yadav, Souleiman Hasan, Adegboyega Ojo, Edward Curry, "The Role of Open Data in Driving Sustainable Mobility in Nine Smart Cities", In 25th European Conference on Information Systems (ECIS 2017), Guimarães, Portugal, pp. 1248-1263, 2017. [bib] [pdf]
[18] Umair ul Hassan, Souleiman Hasan, Wassim Derguech, Louise Hannon, Eoghan Clifford, Christos Kouroupetroglou, Sander Smit, Edward Curry, "Water Analytics and Management with Real-Time Linked Dataspaces", Chapter in Government 3.0 – Next Generation Government Technology Infrastructure and Services, Springer, pp. 173-196, 2017. [bib] [pdf] [doi]
[17] Souleiman Hasan, Edward Curry, "Word Re-Embedding via Manifold Dimensionality Retention", In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, pp. 321-326, 2017. [bib] [pdf]
2015
[16] Wassim Derguech, Sami Bhiri, Souleiman Hasan, Edward Curry, "Using Formal Concept Analysis for Organizing and Discovering Sensor Capabilities", In The Computer Journal, vol. 58, no. 3, pp. 356-367, 2015. [bib] [pdf] [doi]
[15] Souleiman Hasan, Edward Curry, "Tackling Variety in Event-based Systems", In The 9th ACM International Conference on Distributed Event-Based Systems (DEBS 2015), ACM New York, NY, USA, Oslo, Norway, pp. 256-265, 2015. [bib] [pdf] [doi]
[14] Souleiman Hasan, Edward Curry, "Thingsonomy: Tackling Variety in Internet of Things Events", In IEEE Internet Computing, vol. 19, no. 2, pp. 10-18, 2015. [bib] [pdf]
2014
[13] Souleiman Hasan, Edward Curry, "Approximate Semantic Matching of Events for the Internet of Things", In ACM Transactions on Internet Technology, vol. 14, no. 1, pp. 1-23, 2014. [bib] [pdf] [doi]
[12] Souleiman Hasan, Edward Curry, "Thematic event processing", In Proceedings of the 15th International Middleware Conference on - Middleware '14, ACM Press, New York, New York, USA, pp. 109-120, 2014. [bib] [pdf] [doi]
2013
[11] Wassim Derguech, Souleiman Hasan, Sami Bhiri, Edward Curry, "Organizing Capabilities Using Formal Concept Analysis", In 2013 Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, IEEE, Hammamet, Tunisia, pp. 260-265, 2013. [bib] [pdf] [doi]
[10] James O'Donnell, Edward Corry, Souleiman Hasan, Marcus Keane, Edward Curry, "Building Performance Optimization Using Cross-Domain Scenario Modeling, Linked Data, and Complex Event Processing", In Building and Environment, vol. 62, pp. 102-111, 2013. [bib] [pdf] [doi]
[9] Souleiman Hasan, Richard C. Medland, Marcus Foth, Edward Curry, "Curbing resource consumption using team-based feedback: paper printing in a longitudinal case study", In 8th International Conference on Persuasive Technology (PERSUASIVE 2013), Springer-Verlag, Sydney, NSW, pp. 75-86, 2013. [slides] [bib] [pdf] [doi]
[8] Souleiman Hasan, Sean O'Riain, Edward Curry, "Towards Unified and Native Enrichment in Event Processing Systems", In 7th ACM International Conference on Distributed Event-Based Systems (DEBS 2013), ACM, Arlington, Texas, USA, pp. 171-182, 2013. [bib] [pdf]
[7] Souleiman Hasan, Kalpa Gunaratna, Yongrui Qin, Edward Curry, "Approximate Semantic Matching in the COLLIDER Event Processing Engine (Demo)", In 7th ACM International Conference on Distributed Event-Based Systems (DEBS 2013), ACM, Arlington, Texas, USA, pp. 337-338, 2013. [bib] [pdf]
[6] Edward Curry, James O'Donnell, Edward Corry, Souleiman Hasan, Marcus Keane, Sean O'Riain, "Linking building data in the cloud: Integrating cross-domain building data using linked data", In Advanced Engineering Informatics, vol. 27, no. 2, pp. 206-219, 2013. [bib] [pdf]
2012
[5] Souleiman Hasan, Sean O'Riain, Edward Curry, "Approximate Semantic Matching of Heterogeneous Events", In 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012), ACM, Berlin, Germany, pp. 252-263, 2012. [slides] [bib] [pdf] [doi]
[4] Edward Curry, Souleiman Hasan, Sean O'Riáin, "Enterprise Energy Management using a Linked Dataspace for Energy Intelligence", In The Second IFIP Conference on Sustainable Internet and ICT for Sustainability (SustainIT 2012), IEEE, Pisa, Italy, pp. 1-6, 2012. [slides] [bib] [pdf]
[3] Edward Curry, Souleiman Hasan, Mark White, Hugh Melvin, "An Environmental Chargeback for Data Center and Cloud Computing Consumers", In First International Workshop on Energy-Efficient Data Centers, Springer, Madrid, Spain, pp. 117-128, 2012. [slides] [bib] [pdf]
2011
[2] Souleiman Hasan, Edward Curry, Mauricio Banduk, Sean O'Riain, "Toward situation awareness for the semantic sensor web: Complex event processing with dynamic linked data enrichment", In 4th International Workshop on Semantic Sensor Networks 2011 (SSN11), pp. 69-82, 2011. [bib] [pdf]
[1] Edward Curry, Souleiman Hasan, Umair ul Hassan, Micah Herstand, Sean O'Riain, "An Entity-Centric Approach To Green Information Systems", In Proceedings of the 19th European Conference on Information Systems (ECIS 2011), Helsinki, Finland, pp. 1-7, 2011. [bib] [pdf]
Powered by bibtexbrowser