Skip to main content

This Week in RisingWave #2

· 6 min read
xxchan

This blog series is my personal comments about (part of) the development of RisingWave.

Please take it as an unofficial and no-promise supplement.

Most exciting things 🤩

Common sub-plan sharing

We decided to refactor tree-structured plan node into a DAG some time ago (See RFC #28). The largest benefit is to enable common sub-plan sharing.

Previously we already implemented LogicalShare , which represents the plan node that will be shared, and used it to share source/subquery/CTE/view. Now comes the last piece of the puzzle: common sub-plan sharing.

Python UDF

Unlike Flink, RisingWave's uses SQL as the main interface, to make it easier for users to use. But sometimes users do want some custom logic... We are doing some interesting experiments on this topic, and this is a little milestone. Basically you can already play with Python UDF now!

feat(udf): minimal Python UDF SDK by wangrunji0408 #7943

Other notable things

Telemetry

Telemetry can greatly help us understand and improve how the system behaves in real-world scenarios!

Replace Bloom filter with XOR filter

Stream error reporting

RisingWave tolerates compute errors by default (see #4625), but previously errors are only shown in the log. Now we are trying to report the errors to users (See #7824).

Table schema change

ALTER TABLE is quite tricky for streaming... ADD COLOMN is somewhat reasonable, and we are working on it as the first step.

New major SQL features

jsonb data type

The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.

https://www.postgresql.org/docs/current/datatype-json.html

New array function: array_to_string

feat(expr): support array_to_string by fuyufjh #8027

postgres=> select array_to_string(array[1, 2, 3, NULL, 5], ',');
array_to_string
-----------------
1,2,3,5
(1 row)

postgres=> select array_to_string(array[1, 2, 3, NULL, 5], ',', '*');
array_to_string
-----------------
1,2,3,*,5
(1 row)

It's also called array_join in some other systems, and we added that alias as well.

New aggregate function: stddev / stdvar

feat: implement stddev/var function by shanicky #7952

Mind-blowing SQL surprise 🤯

I don't know that much about SQL before I becoming a database developer. Every now and then I got some new surprise from SQL...

Operator |/

fix(sqlparser): align operator precedence with PostgreSQL by xiangjinwu #8174

Do you know Postgres has a square root operator |/ ...? The operator precedence might also surprise you.

postgres=> select |/4;
?column?
----------
2
(1 row)

postgres=> select |/4+12;
?column?
----------
4
(1 row)

NULL s hurt our head -- FULL OUTER JOIN in steraming

Should we ban full outer join for streaming query? · Issue #8084

create table t (a int primary key);
insert into t values(null);
create materialized view v as select t1.* from t as t1 full join t as t2 on t1.a = t2.a;

Then v will have primary key (t1.a, t2.a) , but ...

left  side: +[null] --> Full Join -> +[null, null]
right side: +[null] --> Full Join -> +[null, null]

Rusty stuff 🦀️

We ❤️ Rust! This section is about some general Rust related issues.

Error handling

Is appropriate return a Enum Type Error? · Issue #8074

Error handling is quite a topic in Rust (or any language). We ran into this discussion again.

ChaChaRng

feat(sqlsmith): use ChaChaRng for determinism by kwannoel #8068: Use a reproducible rng so that deterministic fuzz test results can be (more) reproduceable.

Private marker trait

refactor(meta): list all implementors of MetadataModel by zwang28 #8122

Just show you the code! 😲

mod private {
/// A marker trait helps to collect all implementors of `MetadataModel` in
/// `for_all_metadata_models`. The trait should only be implemented by adding item in
/// `for_all_metadata_models`.
pub trait MetadataModelMarker {}
}

pub trait MetadataModel: std::fmt::Debug + Sized + private::MetadataModelMarker {
// ...
}

macro_rules! for_all_metadata_models {
($macro:ident) => {
$macro! {
// These items should be included in a meta snapshot.
// So be sure to update meta backup/restore when adding new items.
{ risingwave_pb::hummock::HummockVersion },
// ...
}
};
}

macro_rules! impl_metadata_model_marker {
($({ $target_type:ty },)*) => {
$(
impl private::MetadataModelMarker for $target_type {}
)*
}
}

for_all_metadata_models!(impl_metadata_model_marker);

New Contributors

As I mentioned last time, RisingWave is an open source project, and new contributors are always welcome. So happy to see we do have some new contributors this week:

Check out the good first issue and help wanted issues if you also want to join force! (They are usually carefully chosen. Not just random chore work. It's a good way to get started!)


P.S. Welcome to join the RisingWave Slack community.

So much for this week. (I also havn't fully learned the details of many of them yet...) See you next week (hopefully)! 🤗

P.P.S I'm also considering deep diving into one or few interesting things every week, instead of writing a weekly summary like this. What do you think? 🤔