Boss Workshop 2021

About BOSS

The workshop BOSS'21 will be held in conjunction with the

47^th International Conference on
Very Large Data Bases
Held in a hybrid format, August 16 - 20, 2021

Message on Covid-19 (SARS-CoV-2) and BOSS 2021

Due to the current situation with COVID-19, the VLDB Conference 2021 will have a hybrid format. See https://vldb.org/2021/?info-covid19 for more information. We are holding BOSS 21 in a hybrid format.

Workshop Date

August 16^th, 2021

Following the great success of the previous BOSS workshops collocated with VLDB since 2015, the seventh Workshop on Big Data Open Source Systems (BOSS'21) will again give a deep-dive introduction into several active, publicly available and open-source systems. This year we are especially interested in systems that focus on the interoperability between big data systems and components that can be used for building other data systems.

The systems will be presented in tutorials by experts in the presented systems.
The tutorials will give details on installation and non-trivial usage and examples of the presented system.

Workshop Program

PST time	CEST time	Activity description
00:45 - 01:00	09:45 - 10:00	Welcome and introduction
01:00 - 02:30	10:00 - 11:30	Tutorial: Apache Wayang: A Big Data Cross-Platform System Presenters: Zoi Kaoudi, TU Berlin Bertty Contreras Rojas, Scalytics Inc. Rodrigo Pardo Meza, Scalytics Inc.
02:30 - 04:00	11:30 - 13:00	Tutorial: Apache Calcite Abstract: Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives. In this tutorial, the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research. Presenters: Julian Hyde, Google Stamatis Zampetakis, Cloudera

05:00 - 06:30	14:00 - 15:30	Tutorial: Apache Arrow Presenters: Wes McKinney, Ursa Computing David Li, Ursa Computing
06:30 - 08:00	15:30 - 17:00	Tutorial: Geospatial data management and analysis with Apache AsterixDB Abstract: There is an enormous increase in the volume of geospatial data, and geospatial data analysis is an essential task to unveil its potential. However, it is expensive to manage or analyse the geospatial data due to the complex representation of spatial objects and computationally heavy operations. This tutorial provides hands-on experience on how Apache AsterixDB integrates geospatial support into its system components at all levels, including flexible data model, SQL++ query language, distributed internal and external storage engine, secondary indexes, fast data ingestion layer, and scalable and data-parallel query execution. Attendees will learn the topics from geospatial dataset management to execute advanced spatial queries on Apache AsterixDB. Presenters: Ahmed Eldawy, University of California, Riverside Akil Sevim, University of California, Riverside Ian Maxon, University of California, Irvine Mehnaz Tabassum Mahin, University of California, Riverside Michael Carey, University of California, Irvine Tin Vu, University of California, Riverside Vassilis Tsotras, University of California, Riverside
08:00 - 09:00	17:00 - 18:00	Keynote: Lessons learned from building and growing Apache Spark Abstract: Started at UC Berkeley over a decade ago, Apache Spark has become one of the most successful projects in the data space. It's widely adopted and the foundational technology underpinning many data platform companies, including Databricks. In this talk, I will discuss the journey and some of the lessons learned in building this open source project. Speaker: Reynold Xin, Databricks Bio: Reynold is a cofounder at Databricks, where he works on realizing the Lakehouse vision, including driving Spark development. He got involved when his advisor Mike Franklin and Ion Stoica wanted a PhD student to build a SQL engine on top of Spark (little did they know that Reynold had never taken a database class and didn't even know what an operator was). He has contributed to the project in various ways, as an evangelist (gave ~50 talks in one year), an architect (incorporated all the database goodies to make the VLDB community happy), and a code monkey (#1 in project commits).

Workshop Organization

Workshop Chairs:

Jorge-Arnulfo Quiané-Ruiz, TU Berlin, jorge.quiane@tu-berlin.de
Aaron J. Elmore, University of Chicago, aelmore@cs.uchicago.edu

Advisory Committee:

Tilmann Rabl, HPI
Michael Carey, UC Irvine
Volker Markl, TU Berlin

Call for tutorials

Important Dates:
Proposals for tutorials are accepted until June 6, 2021

Accepted presenters will be notified by June 27, 2021
In order to propose a tutorial, please email
- a short abstract with a brief description of the system,
- an outline of the planned tutorial,
- the technology used for the hands on tutorial,
- a list of presenters involved,
- and a link to the website of your system
to vldb.boss.workshop@gmail.com

Note that the standard tutorial duration is 1.5 to 2 hours.

Selection Process for Tutorials

The proposals will be evaluated by the chairs and the advisory committee for the system readiness, relevance, timeliness, and perceived interests from the conference participants.

Previous Editions

⇒ BOSS'15 on September 4, 2015, in conjunction with VLDB 2015

⇒ BOSS'16 on September 9, 2016, in conjunction with VLDB 2016

⇒ BOSS'17 on September 1, 2017, in conjunction with VLDB 2017

⇒ BOSS'18 on August 27, 2018, in conjunction with VLDB 2018

⇒ BOSS'19 on August 26, 2019, in conjunction with VLDB 2019

⇒ BOSS'20 on September 4, 2020, in conjunction with VLDB 2020