The workshop BOSS'21 will be held in conjunction with the
47th International Conference on
Very Large Data Bases
Held in a hybrid format, August 16 - 20, 2021
Due to the current situation with COVID-19, the VLDB Conference 2021 will have a hybrid format. See https://vldb.org/2021/?info-covid19 for more information. We are holding BOSS 21 in a hybrid format.
Following the great success of the previous BOSS workshops collocated with VLDB since 2015, the seventh Workshop on Big Data Open Source Systems (BOSS'21) will again give a deep-dive introduction into several active, publicly available and open-source systems. This year we are especially interested in systems that focus on the interoperability between big data systems and components that can be used for building other data systems.
PST time | CEST time | Activity description | ||
00:45 - 01:00 | 09:45 - 10:00 | Welcome and introduction | ||
01:00 - 02:30 | 10:00 - 11:30 |
Tutorial: Apache Wayang: A Big Data Cross-Platform System
Presenters: Zoi Kaoudi, TU Berlin Bertty Contreras Rojas, Scalytics Inc. Rodrigo Pardo Meza, Scalytics Inc. |
||
02:30 - 04:00 | 11:30 - 13:00 |
Tutorial: Apache Calcite
Abstract: Apache Calcite is a dynamic data management framework. Think of it as a toolkit for building databases: it has an industry-standard SQL parser, validator, highly customizable optimizer (with pluggable transformation rules and cost functions, relational algebra, and an extensive library of rules), but it has no preferred storage primitives. In this tutorial, the attendees will use Apache Calcite to build a fully fledged query processor from scratch with very few lines of code. This processor is a full implementation of SQL over an Apache Lucene storage engine. (Lucene does not support SQL queries and lacks a declarative language for performing complex operations such as joins or aggregations.) Attendees will also learn how to use Calcite as an effective tool for research. Presenters: Julian Hyde, Google Stamatis Zampetakis, Cloudera |
||
05:00 - 06:30 | 14:00 - 15:30 |
Tutorial: Apache Arrow
Presenters: Wes McKinney, Ursa Computing David Li, Ursa Computing |
||
06:30 - 08:00 | 15:30 - 17:00 |
Tutorial: Geospatial data management and analysis with Apache AsterixDB
Abstract: There is an enormous increase in the volume of geospatial data, and geospatial data analysis is an essential task to unveil its potential. However, it is expensive to manage or analyse the geospatial data due to the complex representation of spatial objects and computationally heavy operations. This tutorial provides hands-on experience on how Apache AsterixDB integrates geospatial support into its system components at all levels, including flexible data model, SQL++ query language, distributed internal and external storage engine, secondary indexes, fast data ingestion layer, and scalable and data-parallel query execution. Attendees will learn the topics from geospatial dataset management to execute advanced spatial queries on Apache AsterixDB. Presenters: Ahmed Eldawy, University of California, Riverside Akil Sevim, University of California, Riverside Ian Maxon, University of California, Irvine Mehnaz Tabassum Mahin, University of California, Riverside Michael Carey, University of California, Irvine Tin Vu, University of California, Riverside Vassilis Tsotras, University of California, Riverside |
||
08:00 - 09:00 | 17:00 - 18:00 |
Keynote: Lessons learned from building and growing Apache Spark
Abstract: Started at UC Berkeley over a decade ago, Apache Spark has become one of the most successful projects in the data space. It's widely adopted and the foundational technology underpinning many data platform companies, including Databricks. In this talk, I will discuss the journey and some of the lessons learned in building this open source project. Speaker: Reynold Xin, Databricks Bio: Reynold is a cofounder at Databricks, where he works on realizing the Lakehouse vision, including driving Spark development. He got involved when his advisor Mike Franklin and Ion Stoica wanted a PhD student to build a SQL engine on top of Spark (little did they know that Reynold had never taken a database class and didn't even know what an operator was). He has contributed to the project in various ways, as an evangelist (gave ~50 talks in one year), an architect (incorporated all the database goodies to make the VLDB community happy), and a code monkey (#1 in project commits). |
Proposals for tutorials are accepted until June 6, 2021
Accepted presenters will be notified by June 27, 2021
⇒ BOSS'15 on September 4, 2015, in conjunction with VLDB 2015
⇒ BOSS'16 on September 9, 2016, in conjunction with VLDB 2016
⇒ BOSS'17 on September 1, 2017, in conjunction with VLDB 2017
⇒ BOSS'18 on August 27, 2018, in conjunction with VLDB 2018
⇒ BOSS'19 on August 26, 2019, in conjunction with VLDB 2019
⇒ BOSS'20 on September 4, 2020, in conjunction with VLDB 2020