Skip to main content

This Week in RisingWave #5

Β· 5 min read
xxchan

This blog series is my personal comments about (part of) the development of RisingWave.

Please take it as an unofficial and no-promise supplement.

Notable changes πŸŒŸβ€‹

Temporal join​

Lots of production scenarios contain a fact table and several dimension tables, where users want to enrich (join) their fact table with dimension tables. Different from regular stream joins, under the enrichment scenario users may want to keep the previous join outputs unaffected when the dimension table is updated. This is because we only want to enrich the fact table without duplicated outputs.

Temporal join is for this scenario. More technically speaking, it joins an append-only stream (such as Kafka) with a temporal table (aka versioned table, e.g. backed by MySQL CDC). The stream side lookups the temporal table, which means the join is driven by the stream side only.

The syntax is like:

SELECT * FROM stream LEFT JOIN versioned FOR SYSTEM_TIME AS OF NOW() ON stream.col = versioned.id

Interesting SQL features πŸ˜„β€‹

I don't know that much about SQL before I becoming a database developer. Every now and then I got some new surprise from SQL...

Server local timezone​

Do you know SQL standard has two timestamp types: timestamp with/without time zone?

Support SET TIME ZONE LOCAL syntax Β· Issue #8551

The new syntax allows us to set the server's local timezone, which is useful for local testing.

dev=> select now();
now
-------------------------------
2023-03-16 10:41:10.951+00:00
(1 row)

dev=> set time zone local;
SET_VARIABLE
dev=> select now();
now
-------------------------------
2023-03-16 11:41:36.958+01:00
(1 row)

BTW, this is done via strawlab/iana-time-zone: Rust crate to get the IANA time zone for the current system.

Intersting Bug​

Inverse of column index mapping​

fix(optimizer): fix hash join distribution by chenzl25 #8598

I talked about ColIndexMapping in This Week in RisingWave #3.

Althought mathematically simple and intuitive, it’s not easy to do such mappings correctly in programs.

Well, then we met another bug related to ColIndexMapping this week πŸ₯². (Luckily, it's not very easy to trigger it.) This time, it's about the inverse of the mapping. Shortly speaking, suppose we have an array of index pairs [(l1, r1), (l2, r2), ...], naturally we can build two mappings l -> r and r -> l. However, the inverse of l -> r is not r -> l! Can you tell why?

Reliability Improvements πŸ’ͺ​

The Great MadSim!​

fix: avoid panic when upstream input is closed for lookup #8529

This week, we identified a new bug through MadSim that deterministically shuts down and restarts nodes in a RisingWave cluster. This time, the bug was found during the execution path of the lookup executor. Thanks to MadSim, we were able to quickly identify the issue and resolve it.

Interval bugfixes and tests​

Intervals are a fundamental data type for a streaming SQL database, but they can also be sophisticated in some ways. Recently, RisingWave has enhanced its support for intervals and migrated many related tests from Postgres.

OpenDAL​

feat(test): add e2e test for OpenDAL fs backend #8528

Since February, RisingWave has been using OpenDAL as one of its underlying object storage implementations. OpenDAL greatly reduces our efforts in supporting various cloud storage systems, especially HDFS. This PR uses opendal fs engine to mock memory objects store.

By the way, OpenDAL is now an Apache Incubator project! πŸŽ‰

Rusty stuff πŸ¦€οΈβ€‹

We ❀️ Rust! This section is about some general Rust related issues.

Be more careful about error creation!​

fix(expr): do not construct error for extracting time subfield by BugenZhao #8538

Error creation can be very expensive!

In This Week in RisingWave #1, I mentioned we can use ok_or_else to create expensive error lazily. This time the errors are not actually needed. Option is enough. Basically, I mean cases like this:

// Don't do this!
fn inner() -> Result<T> {}
fn outer() -> Result<T> {
match inner() {
Ok(t) => Ok(t),
Err(_) => {
// try a different computation
...
},
}
}

My takeaway is: Think more about the definition of error types and try to keep it small. If it's unavoidably large, then we have to think more when we use it.

BTW, kudos to @BugenZhao for catching this issue (again)!

P.S., this PR brings us 1000%+ throughput improvement (🀯) on nexmark q14, which is a simple SELECT with extract(hour from date_time).

New Contributors​

Support optional parameter offset in tumble and hop by Eridanus117 #8490

This is the second PR by @Eridanus117.

feat(expr): support builtin function pi. by broccoliSpicy #8509

This is the second PR by @broccoliSpicy.

It's great to see new contributors joining in, and even better when they show interest in diving deeper and contributing continuously! πŸ₯°

CREATE SINK panic Β· Issue #8482

I remember @JuchangGit had submitted 2 issues in the past. This week he submits another one. I'd like to mention this because open source contribution is not only about code (PRs). Playing with the software and reporting issues are also very important contributions!


Finally, welcome to join the RisingWave Slack community. Also check out the good first issue and help wanted issues if you want to join the development of an open source database system!

So much for this week. See you next week (hopefully)! πŸ€—