Data to Knowledge: Modernizing Political Event Data for Big Data Social Science
Speaker: Latifur Rahman Khan – Plano, TX, United StatesTopic(s): Information Systems, Search, Information Retrieval, Database Systems, Data Mining, Data Science
Abstract
We have developed the software and big data infrastructure to provide machine coded event data from news reports from historical and real-time inputs from the web. The project is ongoing and will produce coded news reports based on NLP applications across English, and Spanish news reports. Human annotations and validations are conducted for data validation and cross-lingual support. Geo-location of the events is also improved for better spatial resolutions. One of the main computational challenges we address in this work is related to the efficiency and scalability of parsing online news articles in real-time. In particular, we designed a distributed system with Apache Spark and Kafka to process large amount of news articles for event coders and the actor recommender system. This system processes articles in near real-time while generating events which are provided to end users using our REST API at http://eventdata.utdallas.edu.
However, with regard to event data generation, most of the approaches including our prior work rely on pattern-matching techniques constrained by large dictionaries that are too costly to develop, update, or expand to emerging domains or additional languages. Recently, we provide an effective solution to those challenges. Here we develop the 3M-Transformers (Multilingual, Multi-label, Multi-task) approach for Event Extraction from domain specific multi-lingual corpora, dispensing external large repositories for such task, and expanding the substantive focus of analysis to organized crime, an emerging concern for security research. Our results indicate that our 3M-Transformers configurations outperform state-of-the-art usual Transformers models for event extraction on actors, actions and locations in English, Spanish, and Portuguese languages
About this Lecture
Number of Slides: 80Duration: 60 minutes
Languages Available: English
Last Updated:
Request this Lecture
To request this particular lecture, please complete this online form.
Request a Tour
To request a tour with this speaker, please complete this online form.
All requests will be sent to ACM headquarters for review.