Skip to main content

This Week in RisingWave #8

· 5 min read
xxchan
Tao Wu
Yuhao Su
Yuanxin Cao

This week, we celebrated RisingWave's v0.18.0 release and it's first birthday! We talked about (many) new functions, system functions, generated columns, the dedup operator, and more.

birthday.png

RisingWave’s v0.18.0 release & first birthday 🎉🎉🎉

I'm excited to let you know that RisingWave 0.18.0 has been released this week! And it's even more special because it's also the 1st birthday of RisingWave!

In this new version, we've added several critical features like UDF support, the JSONB type, and a bunch of new SQL functions. Check out the release note to see what's new: Release v0.18.0

Features Updates 🌟

(Many!) new functions

This week, we added several new functions to catch up with the feature set of mainstream SQL databases. The expanded SQL functionality allows users to easily migrate their Postgres queries to RisingWave.

This had been made easier due to to last week’s refactoring of the expression framework. It indeed significantly increased productivity and also generated interest among people to contribute by adding more functions. This is a great opportunity to start contributing to RisingWave. We welcome you to join forces with us!

Generated columns

A new mind-blowing SQL feature! 🤯😆

A generated column is a computed column that is not inserted manually. This feature allows users to create columns that are calculated based on other existing columns or generate new ones independently (one important use case to support this feature is proc_time()).

Here’s an example:

CREATE TABLE t1(v1 int, v2 int as v1+1);

PostgreSQL has two types of generated columns: STORE and VIRTUAL. In RisingWave, generated columns on TABLE and SOURCE correspond to these two types respectively. (Learn more about RisingWave’s TABLE and SOURCE here)

When users create a TABLE with generated columns, all generated values are stored; When users create a SOURCE with generated columns, the generated values are not stored. Instead, a Project is created when the source is queried.

Dedup operator

Previously, we plan Dedup (i.e., DISTINCT ON SQL queries) to something like a GroupTopN with limit 1. While this approach is generally acceptable, there is room for improvement. For example, an append-only is a very important attribute in streaming context. An append-only stream can be much cheaper than a updatable stream. A GroupTopN stream is (unfortunately) undateable — the largest/smallest items need to be updated when new data arrives. However, deduplicating elements (by selecting the first arrived element) is actually append-only!

Therefore, we introduced a new database operator — dedup operator, to optimize DISTINCT ON SQL queries. The dedup operator maintains the stream’s append-only attribute, enabling watermark and better downstream performance.

Table size system functions

Currently we have a Grafana dashboard that shows the size of MVs, but in some cases, it might be more user-friendly to expose these functionalities as SQL functions. @erichgess, one of our new and very active contributors, is working on this feature.

Despite my repeated mentions in previous newsletters, system tables and functions remain challenging. And we encountered a new difficulty related with this feature. As a distributed database, RisingWave’s meta node manages many important information about the system (i.e., the system catalog), and the frontend node subscribes to it for the changes of the system catalog. However, compute nodes and specifically the expression framework lack access to this system catalog. Typically, support for system functions involves querying the system catalog directly in frontend's binder and "inline" the results. Unfortunately, some complex queries must be performed within the expression framework itself which is currently not possible.

Reliability Improvements 🚀

Sink validation

feat(sink): reject invalid options when creating sink by xx01cyx #8757

We will implement a more rigorous validation process for user input when creating a sink. Any invalid options will be rejected instead of being ignored, and users will receive a clear error message indicating the reason for rejection.

New contributors

We have two new first-time contributors @gengteng and @lyang24 this week:

In the meantime, after his 2nd PR from last week, @broccoliSpicy submitted his 3rd PR: fix(binder): Incorrect cast when specifying columns by broccoliSpicy #8770.

Have a good time hacking RisingWave while deepening your understanding of the Streaming SQL domain throughout the process!


Finally, welcome to join the RisingWave Slack community. Also check out the good first issue and help wanted issues if you want to join the development of an open source database system!

So much for this week. See you next week! 🤗