RSS-Feed http://example.com en-gb TYPO3 News Sun, 27 May 2018 03:26:55 +0000 Sun, 27 May 2018 03:26:55 +0000 TYPO3 EXT:news news-2108 Mon, 07 May 2018 06:56:32 +0000 Roche Hypo University Challenge won by DWS-AI http://dws.informatik.uni-mannheim.deen/people/professors/prof-dr-ing-margret-keuper/singleview/detail/News/roche-hypo-university-challenge-won-by-dws-ai/ We are happy to announce that Jakob Huber and Timo Sztyler reached the 1st place in the Hypo University Challenge that was hosted by Roche Diabetes Care GmbH and powered by IBM. The goal of the challenge was to develop an algorithm that predicts the probability for a nocturnal hypoglycemic event (severe, mild, hypo) in the upcoming 10, 20, 30, 40, and 60 minutes.

 

Today, more than 425 million people have Diabetes Mellitus, a metabolic disorder characterized by an increased blood sugar level. Keeping this untreated can lead to a hyperglycemia which results in confusion, abdominal pain, and coma. The treatment of diabetes lasts as long as life, i.e., there is no cure.

 

After the challenge, they were invited to present their solution approach as part of the Roche internal "Diagnostics R&D Fair" in Basel where they also received a trophy for winning the challenge.

]]>
Research
news-2098 Tue, 17 Apr 2018 09:27:36 +0000 Paper accepted at IJCAI 2018 http://dws.informatik.uni-mannheim.deen/people/professors/prof-dr-ing-margret-keuper/singleview/detail/News/paper-accepted-at-ijcai-2018/ Together with our colleagues Paola, Irene and Stefano at Sapienza University in Rome we have a paper accepted at the 27th International Joint Conference on Artificial Intelligence (IJCAI), the premier conference in the field of AI:

  • Stefano Faralli, Irene Finocchi, Simone Paolo Ponzetto and Paola Velardi: Efficient Pruning of Large Knowledge Graphs.
]]>
Publications Simone Research
news-2097 Tue, 17 Apr 2018 09:24:14 +0000 Paper accepted at JCDL 2018 http://dws.informatik.uni-mannheim.deen/people/professors/prof-dr-ing-margret-keuper/singleview/detail/News/paper-accepted-at-jcdl-2018/ We have a paper accepted at the 2018 Joint Conference on Digital Libraries (JCDL), the top conference in the field of digital libraries

  • Federico Nanni, Simone Paolo Ponzetto and Laura Dietz: Entity-Aspect Linking:  Providing Fine-Grained Semantics of Entities in Context.

The work presented in the paper is a collaboration between the DWS group and Prof. Laura Dietz at the University of New Hampshire in the context of an Elite Post-Doc grant of the Baden-Württemberg Stiftung recently awarded from Laura.

 

 

]]>
Research Publications Simone
news-2096 Tue, 17 Apr 2018 09:08:19 +0000 Paper accepted at SIGIR 2018 http://dws.informatik.uni-mannheim.deen/people/professors/prof-dr-ing-margret-keuper/singleview/detail/News/paper-accepted-at-sigir-2018/ Together with our colleague Ivan Vulic at the University of Cambridge we have a paper accepted at the 41st International ACM Conference on Research and Development in Information Retrieval (SIGIR), the premier conference in the field of Information Retrieval:

  • Robert Litschko, Goran Glavas, Ivan Vulic and Simone Paolo Ponzetto: Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only.
]]>
Research Publications Simone
news-2075 Fri, 23 Feb 2018 14:41:28 +0000 Dmitry Ustalov has defended his PhD thesis http://dws.informatik.uni-mannheim.deen/people/professors/prof-dr-ing-margret-keuper/singleview/detail/News/dmitry-ustalov-has-defended-his-phd-thesis/ Dmitry Ustalov has successfully defended his Kandidat Nauk (PhD) thesis on “Models, Methods and Algorithms for Constructing a Word Sense Network for Natural Language Processing” («Модели, методы и алгоритмы построения семантической сети слов для задач обработки естественного языка» in Russian). The defense was held at the South Ural State University (Chelyabinsk, Russia) on February 21, 2018. This thesis, among many other contributions, proposes the Watset and Watlink methods for extracting, inducing, clustering, and linking the word senses from the unstructured data.

Abstract

The goal of the thesis is to develop models, methods, and algorithms for constructing a semantic network that establishes semantic links between individual word senses using the weakly structured dictionaries; as well as to implement them as the software system for word sense network construction. Therefore, Part I reviews the state-of-the-art in the field of natural language processing and urges the development of new efficient ontology induction algorithms for under-resourced languages.

Part II proposes two new algorithms, Watset and Watlink, that extract and structure the knowledge available in unstructured form. Watset is a meta-algorithm for fuzzy graph clustering. This algorithm creates an intermediate representation of the input graph that naturally reflects the “ambiguity” of its nodes. Then, it uses hard clustering to discover clusters in this intermediate graph. This makes it possible to discover synsets in a synonymy graph. Watlink is an algorithm for discovering the disambiguated hierarchical links between individual word senses. This algorithm uses the synsets obtained using Watset to contextualize the input asymmetric word links. To increase the recall of the linking, it optionally uses a regularized projection learning approach to predict additional relevant links.

Part III describes the implementation of the proposed models, methods, and algorithms as a software system. The system is implemented in Python, AWK, and Bash programming languages using the scikit-learn, TensorFlow, NetworkX, and Raptor libraries. Also, it defines the representation of the produced word sense network as Linked Data.

Part IV reports the results of the experiments conducted on the Russian language, an under-resourced natural language. Both Watset and Watlink show state-of-the-art performance on the synset induction and hypernymy detection tasks on the RuWordNet and Yet Another RussNet gold standards.

]]>
Research Group
news-2060 Fri, 19 Jan 2018 13:07:59 +0000 Paper accepted for Digital Scholarship in the Humanities http://dws.informatik.uni-mannheim.deen/people/professors/prof-dr-ing-margret-keuper/singleview/detail/News/paper-accepted-for-digital-scholarship-in-the-humanities/ We have a paper accepted in Digital Scholarship in the Humanities, the premier journal in the field of Digital Humanities.

Federico Nanni, Laura Dietz and Simone Paolo Ponzetto. Toward a computational history of universities: Evaluating text mining methods for interdisciplinarity detection from PhD dissertation abstracts. To appear in Digital Scholarship in the Humanities. DOI: 10.1093/llc/fqx062 (available with a free-access article link here). 

The work presented in the paper is a collaboration between the DWS group and Prof. Laura Dietz at the University of New Hampshire.

Abstract

For the first time, historians of higher education have large data sets of primary sources that reflect the complete output of academic institutions at their disposal. To analyze this unprecedented abundance of digital materials, scholars have access to a large suite of computational methods developed in the field of Natural Language Processing. However, when the intention is to move beyond exploratory studies and use the results of such analyses as quantitative evidences, historians need to take into account the reliability of these techniques. The main goal of this article is to investigate the performance of different text mining methods for a specific task: the automatic identification of interdisciplinary works from a corpus of PhD dissertation abstracts. Based on the output of our study, we provide the research community of a new data set for analyzing recent changes in interdisciplinary practices in a large sample of European universities. We show the potential of this collection by tracking the growth in adoption of computational approaches across different research fields, during the past 30 years.

]]>
Research Simone Publications
news-2059 Fri, 19 Jan 2018 12:56:24 +0000 Paper accepted for Knowledge-Based Systems http://dws.informatik.uni-mannheim.deen/people/professors/prof-dr-ing-margret-keuper/singleview/detail/News/paper-accepted-for-knowledge-based-systems/ Together with our colleagues of the Natural Language Engineering (NLE) Lab of the University of Valencia we have a paper accepted for Knowledge-Based Systems journal (2016 Impact Factor: 4.529).

Goran Glavaš, Marc Franco-Salvador, Simone P. Ponzetto and Paolo Rosso. A resource-light method for cross-lingual semantic textual similarity. To appear in Knowledge-Based Systems. DOI: 10.1016/j.knosys.2017.11.041. A pre-print version is available here

Abstract

Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for predicting cross-lingual semantic similarity of short texts, however, make use of tools and resources (e.g., machine translation systems, syntactic parsers or named entity recognition) that for many languages (or language pairs) do not exist. In contrast, we propose an unsupervised and a very resource-light approach for measuring semantic similarity between texts in different languages. To operate in the bilingual (or multilingual) space, we project continuous word vectors (i.e., word embeddings) from one language to the vector space of the other language via the linear translation model. We then align words according to the similarity of their vectors in the bilingual embedding space and investigate different unsupervised measures of semantic similarity exploiting bilingual embeddings and word alignments. Requiring only a limited-size set of word translation pairs between the languages, the proposed approach is applicable to virtually any pair of languages for which there exists a sufficiently large corpus, required to learn monolingual word embeddings. Experimental results on three different datasets for measuring semantic textual similarity show that our simple resource-light approach reaches performance close to that of supervised and resource-intensive methods, displaying stability across different language pairs. Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross-lingual plagiarism detection, and show that it yields performance comparable to those of complex resource-intensive state-of-the-art models for the respective tasks.

]]>
Research Simone Publications
news-2058 Fri, 19 Jan 2018 12:44:46 +0000 Paper accepted for the Journal of Natural Language Engineering http://dws.informatik.uni-mannheim.deen/people/professors/prof-dr-ing-margret-keuper/singleview/detail/News/paper-accepted-for-the-journal-of-natural-language-engineering/ We have a new journal paper in the Natural Language Engineering journal summarizing the findings of the first part of our DFG JOIN-T (Joining Ontologies and semantics INduced from Text) project with the colleagues of the Language Technology Group of the University of Hamburg

Chris Biemann, Stefano Faralli, Alexander Panchenko and Simone Paolo Ponzetto: A framework for enriching lexical semantic resources with distributional semantics. To appear in the Journal of Natural Language Engineering. DOI: 10.1017/S135132491700047X. A pre-print version is available here

You can find the project homepage here.

Abstract

We present an approach to combining distributional semantic representations induced from text corpora with manually constructed lexical semantic networks. While both kinds of semantic resources are available with high lexical coverage, our aligned resource combines the domain specificity and availability of contextual information from distributional models with the conciseness and high quality of manually crafted lexical networks. We start with a distributional representation of induced senses of vocabulary terms, which are accompanied with rich context information given by related lexical items. We then automatically disambiguate such representations to obtain a full-fledged proto-conceptualization, i.e. a typed graph of induced word senses. In a final step, this proto-conceptualization is aligned to a lexical ontology, resulting in a hybrid aligned resource. Moreover, unmapped induced senses are associated with a semantic type in order to connect them to the core resource. Manual evaluations against ground-truth judgments for different stages of our method as well as an extrinsic evaluation on a knowledge-based Word Sense Disambiguation benchmark all indicate the high quality of the new hybrid resource. Additionally, we show the benefits of enriching top-down lexical knowledge resources with bottom-up distributional information from text for addressing high-end knowledge acquisition tasks such as cleaning hypernym graphs and learning taxonomies from scratch.

]]>
Simone Research Publications