RSS-Feed http://example.com en-gb TYPO3 News Sun, 27 May 2018 00:03:06 +0000 Sun, 27 May 2018 00:03:06 +0000 TYPO3 EXT:news news-2105 Fri, 27 Apr 2018 09:58:42 +0000 Data Science Conference LWDA 2018 in Mannheim http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/data-science-conference-lwda-2018-in-mannheim-1/ The Data and Web Science Group is hosting the Data Science Conference LWDA 2018 in Mannheim on August 22-24, 2018.

LWDA, which expands to „Lernen, Wissen, Daten, Analysen“ („Learning, Knowledge, Data, Analytics“), covers recent research in areas such as knowledge discovery, machine learning & data mining, knowledge management, database management & information systems, information retrieval. 

The LWDA conference is organized by and brings together the various special interest groups of the Gesellschaft für Informatik (German Computer Science Society) in this area. The program comprises of joint research sessions and keynotes as well as of workshops organized by each special interest group.

Further information can be found on the conference website: https://www.uni-mannheim.de/lwda-2018/.

Download the conference poster.

]]>
Other Topics - Künstliche Intelligenz I Topics - Data Mining Topics - Decision Support Topics - Web Search and IR Chris Heiner Rainer Simone
news-2098 Tue, 17 Apr 2018 09:27:36 +0000 Paper accepted at IJCAI 2018 http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-at-ijcai-2018/ Together with our colleagues Paola, Irene and Stefano at Sapienza University in Rome we have a paper accepted at the 27th International Joint Conference on Artificial Intelligence (IJCAI), the premier conference in the field of AI:

  • Stefano Faralli, Irene Finocchi, Simone Paolo Ponzetto and Paola Velardi: Efficient Pruning of Large Knowledge Graphs.
]]>
Publications Simone Research
news-2097 Tue, 17 Apr 2018 09:24:14 +0000 Paper accepted at JCDL 2018 http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-at-jcdl-2018/ We have a paper accepted at the 2018 Joint Conference on Digital Libraries (JCDL), the top conference in the field of digital libraries

  • Federico Nanni, Simone Paolo Ponzetto and Laura Dietz: Entity-Aspect Linking:  Providing Fine-Grained Semantics of Entities in Context.

The work presented in the paper is a collaboration between the DWS group and Prof. Laura Dietz at the University of New Hampshire in the context of an Elite Post-Doc grant of the Baden-Württemberg Stiftung recently awarded from Laura.

 

 

]]>
Research Publications Simone
news-2096 Tue, 17 Apr 2018 09:08:19 +0000 Paper accepted at SIGIR 2018 http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-at-sigir-2018/ Together with our colleague Ivan Vulic at the University of Cambridge we have a paper accepted at the 41st International ACM Conference on Research and Development in Information Retrieval (SIGIR), the premier conference in the field of Information Retrieval:

  • Robert Litschko, Goran Glavas, Ivan Vulic and Simone Paolo Ponzetto: Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only.
]]>
Research Publications Simone
news-2060 Fri, 19 Jan 2018 13:07:59 +0000 Paper accepted for Digital Scholarship in the Humanities http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-for-digital-scholarship-in-the-humanities/ We have a paper accepted in Digital Scholarship in the Humanities, the premier journal in the field of Digital Humanities.

Federico Nanni, Laura Dietz and Simone Paolo Ponzetto. Toward a computational history of universities: Evaluating text mining methods for interdisciplinarity detection from PhD dissertation abstracts. To appear in Digital Scholarship in the Humanities. DOI: 10.1093/llc/fqx062 (available with a free-access article link here). 

The work presented in the paper is a collaboration between the DWS group and Prof. Laura Dietz at the University of New Hampshire.

Abstract

For the first time, historians of higher education have large data sets of primary sources that reflect the complete output of academic institutions at their disposal. To analyze this unprecedented abundance of digital materials, scholars have access to a large suite of computational methods developed in the field of Natural Language Processing. However, when the intention is to move beyond exploratory studies and use the results of such analyses as quantitative evidences, historians need to take into account the reliability of these techniques. The main goal of this article is to investigate the performance of different text mining methods for a specific task: the automatic identification of interdisciplinary works from a corpus of PhD dissertation abstracts. Based on the output of our study, we provide the research community of a new data set for analyzing recent changes in interdisciplinary practices in a large sample of European universities. We show the potential of this collection by tracking the growth in adoption of computational approaches across different research fields, during the past 30 years.

]]>
Research Simone Publications
news-2059 Fri, 19 Jan 2018 12:56:24 +0000 Paper accepted for Knowledge-Based Systems http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-for-knowledge-based-systems/ Together with our colleagues of the Natural Language Engineering (NLE) Lab of the University of Valencia we have a paper accepted for Knowledge-Based Systems journal (2016 Impact Factor: 4.529).

Goran Glavaš, Marc Franco-Salvador, Simone P. Ponzetto and Paolo Rosso. A resource-light method for cross-lingual semantic textual similarity. To appear in Knowledge-Based Systems. DOI: 10.1016/j.knosys.2017.11.041. A pre-print version is available here

Abstract

Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for predicting cross-lingual semantic similarity of short texts, however, make use of tools and resources (e.g., machine translation systems, syntactic parsers or named entity recognition) that for many languages (or language pairs) do not exist. In contrast, we propose an unsupervised and a very resource-light approach for measuring semantic similarity between texts in different languages. To operate in the bilingual (or multilingual) space, we project continuous word vectors (i.e., word embeddings) from one language to the vector space of the other language via the linear translation model. We then align words according to the similarity of their vectors in the bilingual embedding space and investigate different unsupervised measures of semantic similarity exploiting bilingual embeddings and word alignments. Requiring only a limited-size set of word translation pairs between the languages, the proposed approach is applicable to virtually any pair of languages for which there exists a sufficiently large corpus, required to learn monolingual word embeddings. Experimental results on three different datasets for measuring semantic textual similarity show that our simple resource-light approach reaches performance close to that of supervised and resource-intensive methods, displaying stability across different language pairs. Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross-lingual plagiarism detection, and show that it yields performance comparable to those of complex resource-intensive state-of-the-art models for the respective tasks.

]]>
Research Simone Publications
news-2058 Fri, 19 Jan 2018 12:44:46 +0000 Paper accepted for the Journal of Natural Language Engineering http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-for-the-journal-of-natural-language-engineering/ We have a new journal paper in the Natural Language Engineering journal summarizing the findings of the first part of our DFG JOIN-T (Joining Ontologies and semantics INduced from Text) project with the colleagues of the Language Technology Group of the University of Hamburg

Chris Biemann, Stefano Faralli, Alexander Panchenko and Simone Paolo Ponzetto: A framework for enriching lexical semantic resources with distributional semantics. To appear in the Journal of Natural Language Engineering. DOI: 10.1017/S135132491700047X. A pre-print version is available here

You can find the project homepage here.

Abstract

We present an approach to combining distributional semantic representations induced from text corpora with manually constructed lexical semantic networks. While both kinds of semantic resources are available with high lexical coverage, our aligned resource combines the domain specificity and availability of contextual information from distributional models with the conciseness and high quality of manually crafted lexical networks. We start with a distributional representation of induced senses of vocabulary terms, which are accompanied with rich context information given by related lexical items. We then automatically disambiguate such representations to obtain a full-fledged proto-conceptualization, i.e. a typed graph of induced word senses. In a final step, this proto-conceptualization is aligned to a lexical ontology, resulting in a hybrid aligned resource. Moreover, unmapped induced senses are associated with a semantic type in order to connect them to the core resource. Manual evaluations against ground-truth judgments for different stages of our method as well as an extrinsic evaluation on a knowledge-based Word Sense Disambiguation benchmark all indicate the high quality of the new hybrid resource. Additionally, we show the benefits of enriching top-down lexical knowledge resources with bottom-up distributional information from text for addressing high-end knowledge acquisition tasks such as cleaning hypernym graphs and learning taxonomies from scratch.

]]>
Simone Research Publications
news-2038 Tue, 12 Dec 2017 20:01:51 +0000 Understanding Euroscepticism Through the Lens of Big Data at Villa Vigoni http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/understanding-euroscepticism-through-the-lens-of-big-data-at-villa-vigoni/ During the first week of December 2017 we participated in a hackathon that brought together researchers from the field of natural language processing and political science to look at ways to leverage today's abundance of digital primary sources for better understanding the continent-wide rise of Euroscepticism.

Colleagues from top academic schools like Bocconi University (Italy), Gesis (Germany), London School of Economics and Turing Institute (UK), among others, worked closely together to share and discuss complementary methodologies and developed new models of spatial placement from text for the topic of European integration.

The event was a joint effort organized by the Data and Web Science Group, the Digital Humanities Group at FBK Trento and Unitelma Roma. This collaboration is part of an ongoing larger effort from members of DWS to explore the benefits of expertise in the fields of artificial intelligence and natural language processing to support cutting-edge research in political and computational social sciences (see also our work in the context of our research collaborative center SFB 884 on the "Political Economy of Reform").

]]>
Research Simone
news-2022 Tue, 07 Nov 2017 16:30:43 +0000 DepCC: A Dependency-Parsed Text Corpus from the Common Crawl http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/depcc-a-dependency-parsed-text-corpus-from-the-common-crawl/ Together with our colleagues at the University of Hamburg, we just released a new web-scale dependency-parsed corpus based on the CommonCrawl. DepCC is a large linguistically analyzed corpus in English including 365 million documents, composed of 252 billion tokens and 7.5 billion of named entity occurrences in 14.3 billion sentences from a web-scale crawl.

You can find the corpus here: https://commoncrawl.s3.amazonaws.com/contrib/depcc/CC-MAIN-2016-07/index.html

A description is available in this paper: https://arxiv.org/abs/1710.01779

]]>
Research Simone
news-1988 Wed, 13 Sep 2017 08:10:14 +0000 EMNLP 2017 Outstanding Paper http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/emnlp-2017-outstanding-paper/ Our paper on Topic-Based Agreement and Disagreement in US Electoral Manifestos, coauthored with the DH Group at FBK Trento was selected as an Outstanding Paper at EMNLP 2017, one of the premier conferences in the field of NLP. 

The paper presents a topic-based analysis of agreement and disagreement in US political manifestos, which relies on a new method for topic detection based on key concept clustering. Data and software can be found here.

]]>
Research Simone