SWIS: Sophisticated Web Information Service

The SWIS project has enhanced LuxActive's scraper and crawler to efficiently search the internet for tourist data and store it in a knowledge graph. The processed data and services are accessible to third parties and can be integrated into data markets or used directly.

Short Description

Data have become a valuable raw material these days for many industries, as their availability and quality often give companies a competitive advantage over their competitors.

The scraper and crawler from LuxActive was expanded and developed through the SWIS project. SWIS scours the World Wide Web using state of the art machine and deep learning methods, originally for the purposes of extracting tourism data from websites and documents. The extracted data are enriched with geo-positions, descriptive texts, categories and opening hours and saved in the form of a knowledge graph. The described data and other services are now accessible to third parties for the first time through LuxActive. In addition to the data extracted and processed from the World Wide Web, the project also provides services that have been developed as part of the scraper and crawler and are used within it.

Specifically, this means:

  1. The scraper and crawler has been improved significantly in terms of performance and now processes 400 to 800 million new data records per day and per server.
  2. Individual services such as address extraction, opening hours extraction or geocoding can now be purchased and used separately from the scraper and crawler by third parties.
  3. The scraper and crawler architecture has been extended to include a loosely coupled extraction service. This is built around a database similar to a multilayered data warehouse. The bottom layer contains the SWIS graph with all of the raw and extraction data. The enrichment and analysis take place in the layer located above this in order to create the target data sets for a wide range of use cases (e.g. a list of doctors or shops). For example, it can be used to identify all shops within a certain sector (e.g. sports shops) and within a radius of 50 kilometres, which makes it much easier for a chain of sports shops to plan locations for new sites, as it provides a comprehensive picture of competitors.
  4. All SWIS services use a rate limit as well as a payment and billing service, so that these services can also be used by third parties.
  5. All APIs and services are prepared in such a way that they can be easily integrated into a data market (such as Data Market Austria), but can still be acquired and used in parallel without a data market.

The following is an example of a text clearing and address extraction service from transmitted text. In the text, different parts of the address are displayed in different colours. These address parts can now be identified by transmitting preferred texts via API service and are therefore easily accessible to third parties.

Publications

Brochure: Digital Technologies (2024)

Ready for the Future: Smart, Green and Visionary. Project Highlights of the Years 2016 to 2021. FFG: Olaf Hartmann, Anita Hipfinger, Peter Kerschl
Publisher: Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation, and Technology
English, 72 Seiten

Publication Downloads

Project Partners

Project management

  • LuxActive KG