About BOSS

The workshop BOSS'21 will be held in conjunction with the

47th International Conference on
Very Large Data Bases
Held in a hybrid format, August 16 - 20, 2021

Message on Covid-19 (SARS-CoV-2) and BOSS 2021

Due to the current situation with COVID-19, the VLDB Conference 2021 will have a hybrid format. See https://vldb.org/2021/?info-covid19 for more information. We are holding BOSS 21 in a hybrid format.

Workshop Date
  August 16th, 2021

Following the great success of the previous BOSS workshops collocated with VLDB since 2015, the seventh Workshop on Big Data Open Source Systems (BOSS'21) will again give a deep-dive introduction into several active, publicly available and open-source systems. This year we are especially interested in systems that focus on the interoperability between big data systems and components that can be used for building other data systems.

  • The systems will be presented in tutorials by experts in the presented systems.
  • The tutorials will give details on installation and non-trivial usage and examples of the presented system.

Workshop Program


PST time CEST time Activity description
00:45 - 01:00 09:45 - 10:00 Welcome and introduction
01:00 - 02:30 10:00 - 11:30 Tutorial: Apache Wayang: A Big Data Cross-Platform System

Zoi Kaoudi, TU Berlin
Bertty Contreras Rojas, Scalytics Inc.
Rodrigo Pardo Meza, Scalytics Inc.

02:30 - 04:00 11:30 - 13:00 Tutorial: Apache Calcite

Abstract: Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives. In this tutorial, the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research.

Julian Hyde, Google
Stamatis Zampetakis, Cloudera

05:00 - 06:30 14:00 - 15:30 Tutorial: Apache Arrow

Wes McKinney, Ursa Computing
David Li, Ursa Computing

06:30 - 08:00 15:30 - 17:00 Tutorial: Geospatial data management and analysis with Apache AsterixDB

Abstract: There is an enormous increase in the volume of geospatial data, and geospatial data analysis is an essential task to unveil its potential. However, it is expensive to manage or analyse the geospatial data due to the complex representation of spatial objects and computationally heavy operations. This tutorial provides hands-on experience on how Apache AsterixDB integrates geospatial support into its system components at all levels, including flexible data model, SQL++ query language, distributed internal and external storage engine, secondary indexes, fast data ingestion layer, and scalable and data-parallel query execution. Attendees will learn the topics from geospatial dataset management to execute advanced spatial queries on Apache AsterixDB.

Ahmed Eldawy, University of California, Riverside
Akil Sevim, University of California, Riverside
Ian Maxon, University of California, Irvine
Mehnaz Tabassum Mahin, University of California, Riverside
Michael Carey, University of California, Irvine
Tin Vu, University of California, Riverside
Vassilis Tsotras, University of California, Riverside

08:00 - 09:00 17:00 - 18:00 Keynote: Lessons learned from building and growing Apache Spark

Abstract: Started at UC Berkeley over a decade ago, Apache Spark has become one of the most successful projects in the data space. It's widely adopted and the foundational technology underpinning many data platform companies, including Databricks. In this talk, I will discuss the journey and some of the lessons learned in building this open source project.

Reynold Xin, Databricks

Bio: Reynold is a cofounder at Databricks, where he works on realizing the Lakehouse vision, including driving Spark development. He got involved when his advisor Mike Franklin and Ion Stoica wanted a PhD student to build a SQL engine on top of Spark (little did they know that Reynold had never taken a database class and didn't even know what an operator was). He has contributed to the project in various ways, as an evangelist (gave ~50 talks in one year), an architect (incorporated all the database goodies to make the VLDB community happy), and a code monkey (#1 in project commits).

Workshop Organization

Workshop Chairs:
  • Jorge-Arnulfo Quiané-Ruiz, TU Berlin, jorge.quiane@tu-berlin.de
  • Aaron J. Elmore, University of Chicago, aelmore@cs.uchicago.edu

Advisory Committee:
  • Tilmann Rabl, HPI
  • Michael Carey, UC Irvine
  • Volker Markl, TU Berlin

Call for tutorials
  • Important Dates:

    Proposals for tutorials are accepted until June 6, 2021

    Accepted presenters will be notified by June 27, 2021

    In order to propose a tutorial, please email
    • a short abstract with a brief description of the system,
    • an outline of the planned tutorial,
    • the technology used for the hands on tutorial,
    • a list of presenters involved,
    • and a link to the website of your system

    to vldb.boss.workshop@gmail.com

    Note that the standard tutorial duration is 1.5 to 2 hours.
Selection Process for Tutorials
The proposals will be evaluated by the chairs and the advisory committee for the system readiness, relevance, timeliness, and perceived interests from the conference participants.

Previous Editions

BOSS'15 on September 4, 2015, in conjunction with VLDB 2015

BOSS'16 on September 9, 2016, in conjunction with VLDB 2016

BOSS'17 on September 1, 2017, in conjunction with VLDB 2017

BOSS'18 on August 27, 2018, in conjunction with VLDB 2018

BOSS'19 on August 26, 2019, in conjunction with VLDB 2019

BOSS'20 on September 4, 2020, in conjunction with VLDB 2020