RSS-Feed http://example.com en-gb TYPO3 News Sun, 27 May 2018 00:04:56 +0000 Sun, 27 May 2018 00:04:56 +0000 TYPO3 EXT:news news-2105 Fri, 27 Apr 2018 09:58:42 +0000 Data Science Conference LWDA 2018 in Mannheim http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/data-science-conference-lwda-2018-in-mannheim-1/ The Data and Web Science Group is hosting the Data Science Conference LWDA 2018 in Mannheim on August 22-24, 2018.

LWDA, which expands to „Lernen, Wissen, Daten, Analysen“ („Learning, Knowledge, Data, Analytics“), covers recent research in areas such as knowledge discovery, machine learning & data mining, knowledge management, database management & information systems, information retrieval. 

The LWDA conference is organized by and brings together the various special interest groups of the Gesellschaft für Informatik (German Computer Science Society) in this area. The program comprises of joint research sessions and keynotes as well as of workshops organized by each special interest group.

Further information can be found on the conference website: https://www.uni-mannheim.de/lwda-2018/.

Download the conference poster.

]]>
Other Topics - Künstliche Intelligenz I Topics - Data Mining Topics - Decision Support Topics - Web Search and IR Chris Heiner Rainer Simone
news-2084 Mon, 12 Mar 2018 11:57:47 +0000 Third Cohort of Students starts Part-time Master in Data Science http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/third-cohort-of-students-starts-part-time-master-in-data-science/ The third cohort consisting of 32 students has started their studies in the part-time master program in Data Science that professors of the DWS group offer together with the Hochschule Albstadt-Sigmaringen.

This weekend the students of the third cohort of the master program as well as students participating in the certificate program Data Science were in Mannheim for a data mining project weekend.

The students worked in teams on two case studies, one in the area of online marketing, the other in the area of text mining. The teams were coached by Prof. Christian Bizer, Dr. Robert Meusel, and Alexander Diete and we were very happy to see an exciting competition between the teams for the best F1 scores as well as the highest raises in sales.

Additional Information:

 

]]>
Projects Chris
news-2049 Thu, 11 Jan 2018 09:42:41 +0000 38.7 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data published http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/387-billion-quads-microdata-embedded-json-ld-rdfa-and-microformat-data-published/ The DWS group is happy to announce a new release of the WebDataCommons Microdata, Embedded JSON-LD, RDFa and Microformat data corpus. The data has been extracted from the November 2017 version of the Common Crawl covering 3.2 billion HTML pages which originate from 26 million websites (pay-level domains).

In summary, we found structured data within 1.2 billion HTML pages out of the 3.2 billion pages contained in the crawl (38.9%). These pages originate from 7.4 million different pay-level domains out of the 26 million pay-level-domains covered by the crawl (28.4%). Approximately 3.7 million of these websites use Microdata, 2.6 million websites use JSON-LD, and 1.2 million websites make use of RDFa. Microformats are used by more than 3.3 million websites within the crawl.

Background:

More and more websites annotate data describing for instance products, people, organizations, places, events, reviews, and cooking recipes within their HTML pages using markup formats such as Microdata, embedded JSON-LD, RDFa and Microformat. The WebDataCommons project extracts all Microdata, JSON-LD, RDFa, and Microformat data from the Common Crawl web corpus, the largest web corpus that is available to the public, and provides the extracted data for download. In addition, we publish statistics about the adoption of the different markup formats as well as the vocabularies that are used together with each format. We run yearly extractions since 2012 and we provide the dataset series as well as the related statistics at:

http://webdatacommons.org/structureddata/

Statistics about the November 2017 Release:

Basic statistics about the November 2017 Microdata, JSON-LD, RDFa, and Microformat data sets as well as the vocabularies that are used together with each markup format are found at:

http://webdatacommons.org/structureddata/2017-12/stats/stats.html

Markup Format Adoption

The page below provides an overview of the increase in the adoption of the different markup formats as well as widely used schema.org classes from 2012 to 2017:

http://webdatacommons.org/structureddata/#toc10

Comparing the statistics from the new 2017 release to the statistics about the October 2016 release of the data sets (http://webdatacommons.org/structureddata/2016-10/stats/stats.html), we see that the adoption of structured data keeps on increasing while Microdata remains the most dominant markup syntax. The different nature of the crawling strategy that was used makes it hard to compare absolute as well as certain relative numbers between the two releases. More concretely, we observe that the November 2017 Common Crawl corpus is much deeper for certain domains like blogspot.com and wordpress.com while other domains are covered in a shallower way, with fewer URLs crawled in comparison to the October 2016 Common Crawl corpus. Nevertheless, it is clear that the growth rate of Microdata and Microformats is much higher than the one of RDFa and embedded JSON-LD. Although, the latter format is widely spread, it is mainly used to annotate metadata for search actions (80% of the domains using JSON-LD) while only a few domains use it for annotating content information such as Organizations (25% of the domains using JSON-LD), Persons (4% of the domains using JSON-LD) or Offers (0.1% of the domains using JSON-LD).

Vocabulary Adoption

Concerning the vocabulary adoption, schema.org, the vocabulary recommended by Google, Microsoft, Yahoo!, and Yandex continues to be the most dominant in the context of Microdata with 78% of the webmasters using it in comparison to its predecessor, the data-vocabulary, which is only used by 14% of the websites containing Microdata. In the context of RDFa, the Open Graph Protocol recommended by Facebook remains the most widely used vocabulary.

Parallel Usage of Multiple Formats

Analyzing topic-specific subsets, we discover some interesting trends. As observed in the previous extractions, content related information is mostly described either with the Microdata format or less frequently with the JSON-LD format, in both cases using the schema.org vocabulary. However, we find out that 30% of the websites that use JSON-LD annotations to describe product related information, make use of Microdata as well as JSON-LD to cover the same topic. This is not the case for other topics, such as Hotels or Job Postings, for which webmasters use only one format to annotate their content.

Richer Descriptions of Job Postings

Following the release of the “Google for Jobs” search vertical and the more detailed guidance by Google on how to annotate job postings (https://developers.google.com/search/docs/data-types/job-posting), we see an increase in the number of websites annotating job postings (2017: 7,023, 2016: 6,352). In addition, the job posting annotations tend to become richer in comparison to the previous years as the number of Job Posting related properties adopted by at least 30% of the websites containing job offers has increased from 4 (2016) to 7 (2017). The newly adopted properties are JobPosting/url, JobPosting/datePosted, and JobPosting/employmentType. You can find a more extended analysis concerning specific topics, like Job Posting and Product, here

http://webdatacommons.org/structureddata/2017-12/stats/schema_org_subsets.html#extendedanalysis

Download:

The overall size of the November 2017 RDFa, Microdata, Embedded JSON-LD and Microformat data sets is 38.7 billion RDF quads. For download, we split the data into 8,433 files with a total size of 858 GB.

http://webdatacommons.org/structureddata/2017-12/stats/how_to_get_the_data.html

In addition, we have created for over 40 different schema.org classes separate files, including all quads extracted from pages, using a specific schema.org class.

http://webdatacommons.org/structureddata/2017-12/stats/schema_org_subsets.html

Lots of thanks to:

  • the Common Crawl project for providing their great web crawl and thus enabling the WebDataCommons project.
  • the Any23 project for providing their great library of structured data parsers.
  • Amazon Web Services in Education Grant for supporting WebDataCommons.
  • the Ministry of Economy, Research and Arts of Baden – Württemberg which supported through the ViCE project the extraction and analysis of the November 2017 corpus.

General Information about the WebDataCommons Project:

The WebDataCommons project extracts structured data from the Common Crawl, the largest web corpus available to the public, and provides the extracted data for public download in order to support researchers and companies in exploiting the wealth of information that is available on the Web. Beside of the yearly extractions of semantic annotations from webpages, the WebDataCommons project also provides large hyperlink graphs, the largest public corpus of WebTables, a corpus of product data, as well as a collection of hypernyms extracted from billions of web pages for public download. General information about the WebDataCommons project is found at

http://webdatacommons.org/

Have fun with the new data set!

Cheers,

Anna Primpeli, Robert Meusel, and Christian Bizer

 

 

]]>
Research - Web-based Systems Topics - Linked Data Projects Chris
news-2021 Mon, 06 Nov 2017 14:11:00 +0000 Dominique Ritze defended her PhD Thesis http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/dominique-ritze-defended-her-phd-thesis/ On November 6th, Dominique Ritze successfully defended her PhD thesis Web-Scale Web Table to Knowledge Base Matching. Supervisor was Prof. Christian Bizer, second reader was Prof. Kai Eckert from Hochschule der Medien Stattgart. 

Abstract of the thesis:

Millions of relational HTML tables are found on the World Wide Web. In contrast to unstructured text, relational web tables provide a compact representation of entities described by attributes. The data within these tables covers a broad topical range. Web table data is used for question answering, augmentation of search results, and knowledge base completion. Until a few years ago, only search engines companies like Google and Microsoft owned large web crawls from which web tables are extracted. Thus, researches outside the companies have not been able to work with web tables.

In this thesis, the first publicly available web table corpus containing millions of web tables is introduced. The corpus enables interested researchers to experiment with web tables. A profile of the corpus is created to give insights to the characteristics and topics. Further, the potential of web tables for augmenting cross-domain knowledge bases is investigated. For the use case of knowledge base augmentation, it is necessary to understand the web table content. For this reason, web tables are matched to a knowledge base. The matching comprises three matching tasks: instance, property, and class matching. Existing web table to knowledge base matching systems either focus on a subset of these matching tasks or are evaluated using gold standards which also only cover a subset of the challenges that arise when matching web tables to knowledge bases.

This thesis systematically evaluates the utility of a wide range of different features for the web table to knowledge base matching task using a single gold standard. The results of the evaluation are used afterwards to design a holistic matching method which covers all matching tasks and outperforms state-of-the-art web table to knowledge base matching systems. In order to achieve these goals, we first propose the T2K Match algorithm which addresses all three matching tasks in an integrated fashion. In addition, we introduce the T2D gold standard which covers a wide variety of challenges. By evaluating T2K Match against the T2D gold standard, we identify that only considering the table content is insufficient. Hence, we include features of three categories: features found in the table, in the table context like the page title, and features that base on external resources like a synonym dictionary.

We analyze the utility of the features for each matching task. The analysis shows that certain problems cannot be overcome by matching each table in isolation to the knowledge base. In addition, relying on the features is not enough for the property matching task. Based on these findings, we extend T2K Match into T2K Match++ which exploits indirect matches to web tables about the same topic and uses knowledge derived from the knowledge base. We show that T2K Match++ outperforms all state-of-the-art web table to knowledge base matching approaches on the T2D and Limaye gold standard. Most systems show good results on one matching task but T2K Match++ is the only system that achieves F-measure scores above 0:8 for all tasks. Compared to results of the best performing system TableMiner+, the F-measure for the difficult property matching task is increased by 0:08, for the class and instance matching task by 0:05 and 0:03, respectively.

Bibliographic meta-information and download of the thesis.

 

 

]]>
Chris Research Research - Web-based Systems Publications
news-2014 Fri, 27 Oct 2017 08:41:55 +0000 SWSA Ten-Year Award won by DBpedia Paper http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/swsa-ten-year-award-won-by-dbpedia-paper/ We are happy to announce that Professor Christian Bizer has received the SWSA Ten-Year Award at the 16th International Semantic Web Conference (ISWC2017) in Vienna for the paper "DBpedia: A Nucleus for a Web of Open Data” that he co-authored in 2007.

The SWSA Ten-Year Award recognizes the highest impact papers from the ISWC proceedings ten years prior (i.e., in 2017 the award honors a paper from 2007). The decision is based primarily, but not exclusively, on the number of citations to the papers from the proceedings in the intervening decade.

DBpedia is a large-scale cross-domain knowledge base which we extract from Wikipedia and make available on the Web under an open license. DBpedia allows users to ask sophisticated queries against Wikipedia knowledge and serves as an interlinking hub in the Web of Linked Data. In addition, DBpedia is widely used as background knowledge for applications such as search, natural language understanding, and data integration.

According to Google Scholar, the paper "DBpedia: A Nucleus for a Web of Open Data” has been cited 2770 times as of October 2017.

 

]]>
Research - Web-based Systems Projects Chris
news-2007 Tue, 10 Oct 2017 12:59:52 +0000 Master Thesis: Integrating Product Data using Supervision from the Web (Bizer/Paulheim/Primpeli) http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/master-thesis-integrating-product-data-using-supervision-from-the-web-bizerpaulheimprimpeli/ A large number of e-shops have started to markup structured data about products and offers in their HTML pages using the markup standard Microdata and the schema.org vocabulary.

In the context of the WebDataCommons project we have extracted a large corpus of product data from the Common Crawl web corpus. The product data corpus is found here (682,000,000 product records, 497,000,000 offers). A relatively small number of e-shops also publish product identifiers which are indicated with one of the following schema.org properties: sku, productID, mpn, identifier, gtin14, gtin13, gtin12, and gtin8.

The aim of this thesis is to analyze and evaluate the utility of product identifiers found on the Web as supervision for matching product descriptions. More concretely, the goal of the thesis is to investigate whether it is possible to learn enough product characteristics from the small set of e-shops that do provide product identifiers in order to detect the same products on websites that do not provide identifiers.

More concretely the tasks involved in the thesis would be:

  • Analysis of Product Identifiers: Analyze the distribution of product identifiers published on the Web. This involves the identification of product entities and product categories for which identifiers are more frequently assigned.
  • Identity Resolution: Develop identity resolution methods for finding out which e-shops sell the same product. Product identifiers will be used as a source of supervision in order to learn classification models. The learned models will be evaluated in terms of how well they can generalize to products without assigned identifiers.

 

 

Your skills:

  • Preferred Expertise: Programming (Java or other language), Data Mining, NLP is a plus.
  • Relevant Lectures: IE 500 Data Mining, IE 670 Web Data Integration, IE 671 Web Mining, IE663 Information Retrieval

For more information please contact Christian Bizer, Heiko Paulheim, or Anna Primpeli.

 

References:

]]>
Thesis - Master Chris
news-1937 Tue, 11 Jul 2017 08:10:36 +0000 Paper accepted at VLDB 2017 http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/paper-accepted-at-vldb-2017/ We have a paper accepted at the 43th International Conference on Very Large Data Bases (VLDB 2017), a premier conference in the field of databases and data management. The conference takes place in Munich at the end of August 2017.

Authors:
Oliver Lehmberg, Christian Bizer 

Title:
Stitching Web Tables for Improving Matching Quality

Abstract:
HTML tables on web pages ("web tables") cover a wide variety of topics. Data from web tables can thus be useful for tasks such as knowledge base completion or ad hoc table extension. Before table data can be used for these tasks, the tables must be matched to the respective knowledge base or base table. The challenges of web table matching are the high heterogeneity and the small size of the tables.
Though it is known that the majority of web tables are very small, the gold standards that are used to compare web table matching systems mostly consist of larger tables. In this experimental paper, we evaluate T2K Match, a web table to knowledge base matching system, and COMA, a standard schema matching tool, using a sample of web tables that is more realistic than the gold standards that were previously used. We find that both systems fail to produce correct results for many of the very small tables in the sample. As a remedy, we propose to stitch (combine) the tables from each web site into larger ones and match these enlarged tables to the knowledge base or base table afterwards. For this stitching process, we evaluate different schema matching methods in combination with holistic correspondence refinement. Limiting the stitching procedure to web tables from the same web site decreases the heterogeneity and allows us to stitch tables with very high precision. Our experiments show that applying table stitching before running the actual matching method improves the matching results by 0.38 in F1-measure for T2K Match and by 0.14 for COMA. Also, stitching the tables allows us to reduce the amount of tables in our corpus from 5 million original web tables to as few as 100,000 stitched tables.

]]>
Research - Web-based Systems Publications Chris
news-1928 Fri, 23 Jun 2017 08:52:12 +0000 Web Data Integration Framework (WInte.r) released http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/web-data-integration-framework-winter-released/ We are happy to announce the release of the Web Data Integration Framework (WInte.r).

WInte.r is a Java framework for end-to-end data integration. The framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation. The methods are designed to be easily customizable by exchanging pre-defined building blocks, such as blockers, matching rules, similarity functions, and conflict resolution functions. In addition, these pre-defined building blocks can be used as foundation for implementing advanced integration methods.

The WInte.r famework forms the foundation for our research on large-scale web data integration. The framework contains an implementation of the T2K Match algorithm for matching millions of Web tables against a central knowledge base. The framework is also used in the context of the DS4DM research project for matching tabular data for data search.

Beside of being used for research, we also use the WInte.r famework for teaching. The students of our Web Data Integration course use the framework to solve the course case study. In addition, most students use the framework as foundation for their term projects.  

Detailed information about the WInte.r framework is found at

https://github.com/olehmberg/winter

The WInte.r framework can be downloaded from the same web site. The framework can be used under the terms of the Apache 2.0 License.

]]>
Research - Web-based Systems Chris Projects
news-1837 Mon, 13 Mar 2017 10:36:46 +0000 Robert Meusel defended his PhD Thesis http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/robert-meusel-defended-his-phd-thesis/ On March 10th, Robert Meusel successfully defended his PhD thesis Web-Scale Profiling of Semantic Annotations in HTML Pages. Supervisor was Prof. Christian Bizer, second reader was Prof. Wolfgang Nejdl from Leibniz Universität Hannover. 

Abstract of the thesis:

The vision of the Semantic Web was coined by Tim Berners-Lee almost two decades ago. The idea describes an extension of the existing Web in which “information is given well-defined meaning, better enabling computers and people to work in cooperation” [Berners-Lee et al., 2001]. Semantic annotations in HTML pages are one realization of this vision which was adopted by large numbers of web sites in the last years. Semantic annotations are integrated into the code of HTML pages using one of the three markup languages Microformats, RDFa, or Microdata. Major consumers of semantic annotations are the search engine companies Bing, Google, Yahoo!, and Yandex. They use semantic annotations from crawled web pages to enrich the presentation of search results and to complement their knowledge bases. However, outside the large search engine companies, little is known about the deployment of semantic annotations: How many web sites deploy semantic annotations? What are the topics covered by semantic annotations? How detailed are the annotations? Do web sites use semantic annotations correctly? Are semantic annotations useful for others than the search engine companies? And how can semantic annotations be gathered from the Web in that case? The thesis answers these questions by profiling the web-wide deployment of semantic annotations. The topic is approached in three consecutive steps: In the first step, two approaches for extracting semantic annotations from the Web are discussed. The thesis evaluates first the technique of focused crawling for harvesting semantic annotations. Afterward, a framework to extract semantic annotations from existing web crawl corpora is described. The two extraction approaches are then compared for the purpose of analyzing the deployment of semantic annotations in the Web. In the second step, the thesis analyzes the overall and markup language-specific adoption of semantic annotations. This empirical investigation is based on the largest web corpus that is available to the public. Further, the topics covered by deployed semantic annotations and their evolution over time are analyzed. Subsequent studies examine common errors within semantic annotations. In addition, the thesis analyzes the data overlap of the entities that are described by semantic annotations from the same and across different web sites. The third step narrows the focus of the analysis towards use case-specific issues. Based on the requirements of a marketplace, a news aggregator, and a travel portal the thesis empirically examines the utility of semantic annotations for these use cases. Additional experiments analyze the capability of product-related semantic annotations to be integrated into an existing product categorization schema. Especially, the potential of exploiting the diverse category information given by the web sites providing semantic annotations is evaluated.

Keywords:

Dataspace Profiling , RDFa, Microformats , Microdata , Schema.org , Crawling

Full-text:

The full-text of the thesis is available from the MADOC document server. 

 

 

]]>
Chris Research
news-1786 Tue, 17 Jan 2017 14:47:30 +0000 44.2 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data published http://dws.informatik.uni-mannheim.deen/news/singleview/detail/News/442-billion-quads-microdata-embedded-json-ld-rdfa-and-microformat-data-published/ The DWS group is happy to announce a new release of the WebDataCommons Microdata, Embedded JSON-LD, RDFa and Microformat data corpus.

The data has been extracted from the October 2016 version of the CommonCrawl covering 3.2 billion HTML pages which originate from 34 million websites (pay-level domains).

Altogether we discovered structured data within 1.2 billion HTML pages out of the 3.2 billion pages contained in the crawl (38%). These pages originate from 5.6 million different pay-level domains out of the 34 billion pay-level domains covered by the crawl (16.5%).

Approximately 2.5 million of these websites use Microdata, 2.1 million websites employ JSON-LD, and 938 thousand websites use RDFa. Microformats are used by over 1.6 million websites within the crawl.

 

Background: 

More and more websites annotate structured data within their HTML pages using markup formats such as RDFa, Microdata, embedded JSON-LD and Microformats. The annotations  cover topics such as products, reviews, people, organizations, places, events, and cooking  recipes.

The WebDataCommons project extracts all Microdata, RDFa data, and Microformat data, and since 2015 also embedded JSON-LD data from the Common Crawl web corpus, the largest and most up-to-date web corpus that is available to the public, and provides the extracted data for download. In addition, we publish statistics about the adoption of the different markup formats as well as the vocabularies that are used together with each format. 

Besides the markup data, the WebDataCommons project also provides large web table corpora and web graphs for download. General information about the WebDataCommons project is found at 

webdatacommons.org 


Data Set Statistics: 

Basic statistics about the October 2016 Microdata, Embedded JSON-LD, RDFa  
and Microformat data sets as well as the vocabularies that are used together with each 
markup format are found at: 

webdatacommons.org/structureddata/2016-10/stats/stats.html

Comparing the statistics to the statistics about the November 2015 release of the data sets

 

webdatacommons.org/structureddata/2015-11/stats/stats.html

we see that the Microdata syntax remains the most dominant annotation format. Although it is hard to compare the adoption of the syntax between the two years in absolute numbers, as the October 2016 crawl corpus is almost double the size of the November 2015 one, a relative increase can be observed: In the October 2016 corpus over 44% of the pay-level domains containing markup data make use of the Microdata syntax in comparison to 40% one year earlier. Even though the absolute numbers concerning the RDFa markup syntax adoption rise, the relative increase does not follow up the increase of the corpus size indicating that RDFa is less used by the websites. Similar to the 2015 release, the adoption of embedded JSON-LD has considerably increased, even though the main focus of the annotation remains the search action offered by the websites (70%).

As already observed in the previous years, the schema.org vocabulary is most frequently used in the context of Microdata while the adoption of its predecessor, the data vocabulary, continues to decrease. In the context of RDFa, we still find the Open Graph Protocol recommended by Facebook to be the most widely used vocabulary.

Topic-wise the trends identified in the former extractions continue. We see that beside of navigational, blog and CMS related meta-information, many websites annotate e-commerce related data (Products, Offers, and Reviews) as well as contact information (LocalBusiness, Organization, PostalAddress). More concretely, the October 2016 corpus includes more than 682 million product records originating from 249 thousand websites which use the schema.org vocabulary. The new release contains postal address data for more than 291 million entities originating from 338 thousand websites. Furthermore, the content describing hotels has doubled in size in this release, with a total of 61 million hotel descriptions.

Visualizations of the main adoption trends concerning the different annotation formats, popular schema.org, as well as RDFa classes within the time span 2012 to 2016 are found at

webdatacommons.org/structureddata/

 

Download:

The overall size of the October 2016 Microdata, RDFa, Embedded JSON-LD, and Microformat data sets is 44.2 billion RDF quads. For download, we split the data into 9,661 files with a total size of 987 GB. 

webdatacommons.org/structureddata/2016-10/stats/how_to_get_the_data.html

In addition, we have created for over 40 different schema.org classes separate files, including all quads from pages, deploying at least once the specific class. 

webdatacommons.org/structureddata/2016-10/stats/schema_org_subsets.html

 

Lots of thanks to: 

+ the Common Crawl project for providing their great web crawl and thus enabling the WebDataCommons project. 
+ the Any23 project for providing their great library of structured data parsers. 
+ Amazon Web Services in Education Grant for supporting WebDataCommons. 
+ the Ministry of Economy, Research and Arts of Baden – Württemberg which supported by means of the ViCe project the extraction and analysis of the October 2016 corpus.


Have fun with the new data set. 

Anna Primpeli, Robert Meusel and Chris Bizer

]]>
Research - Data Mining and Web Mining Research - Data Analytics Topics - Data Mining Topics - Linked Data Projects Chris