Skip to main content

Β· 7 min read
xxchan
Tao Wu
Dylan
Bugen Zhao
StrikeW

This week, we have a lot of updates πŸ˜†! To name a few: functional index, proc_time(), int256, CREATE CONNECTION, and ... the beta release of RisingWave Cloud!

Β· 5 min read
xxchan
Tao Wu
Yuhao Su
Yuanxin Cao

This week, we celebrated RisingWave's v0.18.0 release and it's first birthday! We talked about (many) new functions, system functions, generated columns, the dedup operator, and more.

birthday.png

Β· 5 min read
xxchan
Dylan
Runji Wang

This week, we talked about psql completion, band join, significant DX improvements in the expression framework powered by procedural macros, and more.

Β· 5 min read
xxchan

This blog series is my personal comments about (part of) the development of RisingWave.

Please take it as an unofficial and no-promise supplement.

Features Updates πŸŒŸβ€‹

NULLS {FIRST | LAST}​

feat(common): support NULLS {FIRST | LAST} by richardchien Β· Pull Request #8485

NULL s can be very tricky in SQL. Do you known you can specify its ordering?

> select * from t order by x /* nulls last */; 
+--------+
| x |
|--------|
| 1 |
| 2 |
| <null> |
+--------+

> select * from t order by x nulls first;
+--------+
| x |
|--------|
| <null> |
| 1 |
| 2 |
+--------+

It was tricky to support because previously in RisingWave, we only considered ascending/descending ordering, but not NULL s’ ordering. That caused inconsistent NULL s ordering around the system. When it comes to SQL query results, the NULL values were sometimes the largest, sometimes the smallest.

This also blocked our progress to test RisingWave against other database’s test suites, because in PostgreSQL, NULL s are the largest by default when compared against non-NULL values, while in SQLite and DuckDB they are the smallest by default. With inconsistent ordering of NULL s and no support for NULLS { FIRST | LAST } in order-by clauses, we can neither align with the behavior of one of these databases, nor specify a fixed NULL ordering in test queries to align behaviors of all databases.

Recently, we did a thorough refactoring to force all components in our system to use an unified struct type representing ordering, OrderType . As a result, this week we were finally able to implement NULLS { FIRST | LAST } for order-by clauses at ease, by simply adding a new field, NullsAre , to OrderType , specifying whether NULL s are largest or smallest. By setting Largest as default value of NullsAre , we achieved the same default ordering behavior as PostgreSQL.

Rename relations​

feat: support alter rename relations including table/mview/view/sink/index by yezizp2012 Β· Pull Request #7745

As I mentioned earlier, we are working on DDL support ( ALTER TABLE ), which is quite tricky. We've already done quite some work for supporting ADD/DROP COLUMN . Here's another DDL: renaming relations is supported. This might be slightly easier than ADD/DROP COLUMN , but there are still some tricky parts to consider carefully, like updating all related relations all at the same time.

For example, you can use the command ALTER TABLE t_1 RENAME TO t_2 to rename a table and its related relations will be modified recursively, which you can check with command such as SHOW CREATE MATERIALIZED VIEW mv_x . There is no effect on the stored data.

Performance Optimizations πŸ’ͺ​

Bushy tree join ordering​

feat: Bushy tree join ordering by KveinAxel Β· Pull Request #8316

RisingWave is a streaming processing system that aims to provide real-time low latency for our users. By reducing the depth of the join tree, we can effectively minimize latency.

See the illustration below:

To reduce the distance that barriers from join inputs need to travel to the join output, we convert a left-deep tree with a height of 3 into a bushy tree with a height of 5.

https://user-images.githubusercontent.com/9352536/202991793-664ea3f9-3838-4e5f-af6c-e5416140ca40.png

https://user-images.githubusercontent.com/9352536/202991855-998a6d28-a366-4120-8765-be3d5de20474.png

Rusty stuff πŸ¦€οΈβ€‹

We ❀️ Rust! This section is about some general Rust related issues.

zld​

Cannot run risingwave binary on Mac OS M1 (when linked by zld ) Β· Issue #8608

@ahmedriza reported that he cannot run the binary on his Macbook M1 due to "symbol not found".

$ ./target/debug/risingwave
dyld[14116]: symbol not found in flat namespace (__ZN15protobuf_native2io23DeleteCodedOutputStreamEPN6google8protobuf2io17CodedOutputStreamE)
Abort trap: 6

After some investigation, he found that it's because he used zld in his global cargo config, and RisingWave has a repository-level cargo config that uses lld. Although it is still not clear whether the error is caused by zld or the conflict between the cargo configs, he could solve the problem by just using lld.

This is also one of the reasons why I enjoy open source: hackers are very willing to raise issues and even analyze and solve problems themselves. I can also learn a lot from their thoroughly analysis. From this issue, I learned:

  • How cargo config ($HOME/.cargo/config.toml, and /projects/.cargo/config.toml) works in more detail
  • There's another linker zld. (But it's already deprecated in favor of lld .)

New Contributors​

(Of course, @ahmedriza is also a great new contributor!)

feat: Add support for array_length function in psql by kamalesh0406 Β· Pull Request #8636

This week we have another first-time contributor @kamalesh0406. He said "This is my first time writing rust code" 😲.

So it seems RisingWave's good first issues are a good way to learn and practice Rust. πŸ˜„ Don't hesitate and just join us if you are also interested in the development of an open source database system, or Rust!


Finally, welcome to join the RisingWave Slack community.

So much for this week. See you next week! πŸ€—

Β· 5 min read
xxchan

This blog series is my personal comments about (part of) the development of RisingWave.

Please take it as an unofficial and no-promise supplement.

Notable changes πŸŒŸβ€‹

Temporal join​

Lots of production scenarios contain a fact table and several dimension tables, where users want to enrich (join) their fact table with dimension tables. Different from regular stream joins, under the enrichment scenario users may want to keep the previous join outputs unaffected when the dimension table is updated. This is because we only want to enrich the fact table without duplicated outputs.

Temporal join is for this scenario. More technically speaking, it joins an append-only stream (such as Kafka) with a temporal table (aka versioned table, e.g. backed by MySQL CDC). The stream side lookups the temporal table, which means the join is driven by the stream side only.

The syntax is like:

SELECT * FROM stream LEFT JOIN versioned FOR SYSTEM_TIME AS OF NOW() ON stream.col = versioned.id

Interesting SQL features πŸ˜„β€‹

I don't know that much about SQL before I becoming a database developer. Every now and then I got some new surprise from SQL...

Server local timezone​

Do you know SQL standard has two timestamp types: timestamp with/without time zone?

Support SET TIME ZONE LOCAL syntax Β· Issue #8551

The new syntax allows us to set the server's local timezone, which is useful for local testing.

dev=> select now();
now
-------------------------------
2023-03-16 10:41:10.951+00:00
(1 row)

dev=> set time zone local;
SET_VARIABLE
dev=> select now();
now
-------------------------------
2023-03-16 11:41:36.958+01:00
(1 row)

BTW, this is done via strawlab/iana-time-zone: Rust crate to get the IANA time zone for the current system.

Intersting Bug​

Inverse of column index mapping​

fix(optimizer): fix hash join distribution by chenzl25 #8598

I talked about ColIndexMapping in This Week in RisingWave #3.

Althought mathematically simple and intuitive, it’s not easy to do such mappings correctly in programs.

Well, then we met another bug related to ColIndexMapping this week πŸ₯². (Luckily, it's not very easy to trigger it.) This time, it's about the inverse of the mapping. Shortly speaking, suppose we have an array of index pairs [(l1, r1), (l2, r2), ...], naturally we can build two mappings l -> r and r -> l. However, the inverse of l -> r is not r -> l! Can you tell why?

Reliability Improvements πŸ’ͺ​

The Great MadSim!​

fix: avoid panic when upstream input is closed for lookup #8529

This week, we identified a new bug through MadSim that deterministically shuts down and restarts nodes in a RisingWave cluster. This time, the bug was found during the execution path of the lookup executor. Thanks to MadSim, we were able to quickly identify the issue and resolve it.

Interval bugfixes and tests​

Intervals are a fundamental data type for a streaming SQL database, but they can also be sophisticated in some ways. Recently, RisingWave has enhanced its support for intervals and migrated many related tests from Postgres.

OpenDAL​

feat(test): add e2e test for OpenDAL fs backend #8528

Since February, RisingWave has been using OpenDAL as one of its underlying object storage implementations. OpenDAL greatly reduces our efforts in supporting various cloud storage systems, especially HDFS. This PR uses opendal fs engine to mock memory objects store.

By the way, OpenDAL is now an Apache Incubator project! πŸŽ‰

Rusty stuff πŸ¦€οΈβ€‹

We ❀️ Rust! This section is about some general Rust related issues.

Be more careful about error creation!​

fix(expr): do not construct error for extracting time subfield by BugenZhao #8538

Error creation can be very expensive!

In This Week in RisingWave #1, I mentioned we can use ok_or_else to create expensive error lazily. This time the errors are not actually needed. Option is enough. Basically, I mean cases like this:

// Don't do this!
fn inner() -> Result<T> {}
fn outer() -> Result<T> {
match inner() {
Ok(t) => Ok(t),
Err(_) => {
// try a different computation
...
},
}
}

My takeaway is: Think more about the definition of error types and try to keep it small. If it's unavoidably large, then we have to think more when we use it.

BTW, kudos to @BugenZhao for catching this issue (again)!

P.S., this PR brings us 1000%+ throughput improvement (🀯) on nexmark q14, which is a simple SELECT with extract(hour from date_time).

New Contributors​

Support optional parameter offset in tumble and hop by Eridanus117 #8490

This is the second PR by @Eridanus117.

feat(expr): support builtin function pi. by broccoliSpicy #8509

This is the second PR by @broccoliSpicy.

It's great to see new contributors joining in, and even better when they show interest in diving deeper and contributing continuously! πŸ₯°

CREATE SINK panic Β· Issue #8482

I remember @JuchangGit had submitted 2 issues in the past. This week he submits another one. I'd like to mention this because open source contribution is not only about code (PRs). Playing with the software and reporting issues are also very important contributions!


Finally, welcome to join the RisingWave Slack community. Also check out the good first issue and help wanted issues if you want to join the development of an open source database system!

So much for this week. See you next week (hopefully)! πŸ€—

Β· 5 min read
xxchan

This blog series is my personal comments about (part of) the development of RisingWave.

Please take it as an unofficial and no-promise supplement.

Notable changes πŸŒŸβ€‹

Serial type​

As a distributed database system, RisingWave has multiple instances of operators to achieve a high degree of parallelism. Some operators require specific input distribution to ensure the correctness of the result, so data must be shuffled at that point, which is represented by the Exchange operator.

At the same time, RisingWave generates hidden _row_id columns for sources without primary keys (aka append-only sources), also for the correctness of the system. Previously the _row_id is randomly distributed, which leads to the result that we must insert a HashExchange later after the source operator to enforce its distribution.

Since _row_id is fully controlled by us, why don’t we directly generate _row_id with desired distribution? So an optimization of using a new internal type Serial for _row_id with specialized shuffling logic is proposed to remove the unnecessary Exchange operators.

Auto execution mode selection​

feat(batch): support auto execution mode #8274

Currently, RisingWave has 2 kinds of execution mode: local and distributed, which can be tuned by SET query_mode = [local | distributed] . Lower latency can be achieved by running OLTP queries in local execution mode, and running OLAP queries in distributed execution mode.

However, previously we leave the choice to users which makes our distributed execution useless to some extent, because most users don't know or understand what those options exactly mean. As a database, we have the ability and should make a good choice for users to reduce their tuning works.

feat(source): support private link for kafka connector #8247

When using a cloud environment, users may face connection issues when attempting to create a source in RisingWave due to their AWS MSK service being located in a different VPC. To solve this, AWS PrivateLink can be used to establish a connection between RisingWave's VPC and the user's VPC. Users can set up an endpoint service to expose their MSK service, allowing RisingWave to easily create an endpoint to access it.

Sink Validation​

feat(meta): introduce sink validation in meta #8417

Recently we already implemented MySQL/Postgres sink support. We are now working hard to keep improving its stability and usability. This PR Introduced validation for sink so that we can catch errors in an earlier phase and give better error messages.

Reliability Improvements πŸ’ͺ​

Fuzzing Tests​

feat(sqlsmith): Generate more join expressions #8395

This week we added more targeted testing for join expressions. We increased the number of joins, added generation for non_equi joins and added more equi join predicates.

The goal of these changes is to increase the coverage of join executors in streaming and batch, since the functionality of JOIN s are often complex.

Deterministic Tests​

During the past week, MadSim, our deterministic testing framework, identified more than five issues that we promptly resolved. We are confident in the effectiveness of our framework. Additionally, I'm eagerly anticipating @wangrunji0408's upcoming blog post about MadSim internals. 🀩

Rusty stuff πŸ¦€οΈβ€‹

We ❀️ Rust! This section is about some general Rust related issues.

Reduce debuginfo size​

Although I already briefly mentioned this in the previous blog, let me elaborate it a little here.

fix: reduce debuginfo size #8326

We noticed unexpected 2GB of memory consumption on Meta node. The cause is quite interesting.

We found the memory consumption happens when meta node writes a log with backtrace included. To get the backtrace, debuginfo in the risingwave binary is loaded (and cached!). This is somewhat reasonable, but should the debuginfo be so large?

Why are debug symbols so huge? is a good post explaining what is debuginfo. Inspired by that, we tuned the level of debuginfo. This is a trade-off between binary size and utility (we still have most useful information for debug). At the same time, we also disabled debuginfo compression. This is another trade-off between binary size and memory overhead.

Capture unrecognized fields in serde​

https://github.com/risingwavelabs/risingwave/pull/8325

By default, serde ignores unknown fields. By #[serde(deny_unknown_fields)] , serde rejects unknown fields. But what if you want to tolerate them, but produce a useful warning about it at the same time? Here’s a useful tip: By #[serde(flatten)] you can capture the unknown fields:

#[derive(Serialize, Deserialize)]
pub struct S {
// ...

#[serde(flatten)]
pub unrecognized: HashMap<String, serde_json::Value>,
}

Auto derive a prefixed alias for protobuf message types​

refactor(proto): auto derive a Pb-prefixed alias for proto message types by BugenZhao Β· Pull Request #8426 Β· risingwavelabs/risingwave

When using protobuf for communication, one thing might be defined multiple times in different places for different purposes. e.g., if we have a protobuf message Msg , a crate using Msg might have it’s own type for Msg at the same time, which might include useful methods and maybe a different data structure representation with the protobuf one. Previously, we need to manually alias it to avoid conflict. e.g.,

use pb_crate::Msg as PbMsg;

struct Msg {
// ...
}

impl Msg {
pub fn to_proto(self) -> PbMsg {
// ...
}
}

Now we came up with a method to mitigate this problem. Thanks to prost ’s ability to customize generated code for protobuf messages, we can add type alias pub type PbMsg = Msg at the same time of defining Msg . In this way, when trying to type PbMsg , the IDE will be able to find the type and automatically import it.


Finally, welcome to join the RisingWave Slack community. Also check out the good first issue and help wanted issues if you want to join the development of an open source database system!

So much for this week. See you next week (hopefully)! πŸ€—

Β· 5 min read
xxchan

This blog series is my personal comments about (part of) the development of RisingWave.

Please take it as an unofficial and no-promise supplement.

Notable changes​

Memory control policy​

Recently we have implemented memory control mechanism, and now we are introducing the policy. Since RisingWave is a "streaming database", it is crucial for us to consider both batch and streaming tasks while balancing the usage of memory between them.

Move memtable​

feat(storage): move memtable down to local state store by wenym1 #7183

Our streaming executors access storage via StateTable interface, and storage layer exposes LocalStateStore interface. This PR wants to move part of the former (memtable) to the latter. It might help to implement more features in the storage layer, and also make it easier to implement other storage engines for bench purpose.

Optimizer updates​


feat: Bushy tree join ordering by KveinAxel #8316

RisingWave is a streaming processing system which aims to provide real time low latency for our users. Making the join tree shallower helps to reduce the latency.

See the illustration below:

Left Deep Tree

Bushy Tree

Async expr​

refactor(expr): make evaluation async by wangrunji0408 #8229

... to prevent blocking when evaluting UDFs

RFC: suspend MV​

RFC: Suspend MV on Non-Recoverable Errors by hzxa21 #54

Last time, I mentioned that RisingWave currently tolerates compute errors by default and we are in the process of implementing an error reporting mechanism. However, we had previously discussed another mechanism for handling errors which involved suspending MV. This RFC reintroduces this idea.

Intersting Bug​

Injectivity of column index mapping​

bug: mv with join and duplicate output columns has row indices hidden #8216

ColIndexMapping is an important utility in optimizer used when we wants to map an input column (index) to an output column. It's a partial mapping from [0, n) to [0, m) , where n is the number of columns in the input, and m is the number of columns in the output.

When I began to work on RisingWave (one year ago 😲), I worked on column pruning, which removes unnecessary columns from the plan nodes. Naturally it involves a lot with ColIndexMapping . Althought mathematically simple and intuitive, it's not easy to do such mappings correctly in programs. I had a hard time understanding ColIndexMapping at that time, and added many comments and did some refactor to make it less confusing and error-prone.

After long, we met this new bug related to ColIndexMapping again.. Under curtain circumstances, a plan node will have a non-injective mapping and duplicate columns in the output. In this case, the duplicate columns are unexpectedly hidden after mapping. We fixed this bug by considering the injectivity of the mapping. A more systematic solution is also proposed: Proposal (frontend): Prune duplicate columns #8277.

Rusty stuff πŸ¦€οΈβ€‹

We ❀️ Rust! This section is about some general Rust related issues.

Publish await-tree​

We have a tracing mechanism called async stack trace in our system, which allows us to β€œcapture a snapshot” of where, why, and how long the async tasks are pending in real-time. It helped us locate some stuck issues, e.g., streaming deadlock, which would be very hard to debug in other ways.

Now we have published it to crates.io, and renamed it to await-tree . I already invited @BugenZhao to write a blog post to introduce it, and can't wait to see it. 😍

feat(dashboard): dump await-tree of compute nodes by BugenZhao #8330

We also integrated it into the RisingWave dashboard to make it more convenient to use. πŸ˜‹

Large memory usage by backtrace and debuginfo​

fix: reduce debuginfo size by fuyufjh #8326

We noticed unexpected memory consumption on Meta node. And the cause is (again) backtrace...

New contributor​

So happy to see another 3 first-time contributors this week! 😲πŸ₯³

(BTW, I'm also quite interested in where they come from. Are they encouraged by my posts? If so, please let me know! πŸ€”πŸ€£)

@erichgess submitted 2 PRs. I'm quite impressed by the thoroughness of their communication in issue and PR discussions regarding the problem's situation, cause, and solution. This is one of the biggest reasons why I love open source or working in public; transparent over-communication benefits both current collaborators' reviews and future contributors' learning. πŸ€—


Check out the good first issue and help wanted issues if you also want to join force! (They are usually carefully chosen. Not just random chore work. It's a good way to get started!)


P.S. Welcome to join the RisingWave Slack community.

So much for this week. See you next week (hopefully)!

Β· 6 min read
xxchan

This blog series is my personal comments about (part of) the development of RisingWave.

Please take it as an unofficial and no-promise supplement.

Most exciting things πŸ€©β€‹

Common sub-plan sharing​

We decided to refactor tree-structured plan node into a DAG some time ago (See RFC #28). The largest benefit is to enable common sub-plan sharing.

Previously we already implemented LogicalShare , which represents the plan node that will be shared, and used it to share source/subquery/CTE/view. Now comes the last piece of the puzzle: common sub-plan sharing.

Python UDF​

Unlike Flink, RisingWave's uses SQL as the main interface, to make it easier for users to use. But sometimes users do want some custom logic... We are doing some interesting experiments on this topic, and this is a little milestone. Basically you can already play with Python UDF now!

feat(udf): minimal Python UDF SDK by wangrunji0408 #7943

Other notable things​

Telemetry​

Telemetry can greatly help us understand and improve how the system behaves in real-world scenarios!

Replace Bloom filter with XOR filter​

Stream error reporting​

RisingWave tolerates compute errors by default (see #4625), but previously errors are only shown in the log. Now we are trying to report the errors to users (See #7824).

Table schema change​

ALTER TABLE is quite tricky for streaming... ADD COLOMN is somewhat reasonable, and we are working on it as the first step.

New major SQL features​

jsonb data type​

The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed. jsonb also supports indexing, which can be a significant advantage.

https://www.postgresql.org/docs/current/datatype-json.html

New array function: array_to_string​

feat(expr): support array_to_string by fuyufjh #8027

postgres=> select array_to_string(array[1, 2, 3, NULL, 5], ',');
array_to_string
-----------------
1,2,3,5
(1 row)

postgres=> select array_to_string(array[1, 2, 3, NULL, 5], ',', '*');
array_to_string
-----------------
1,2,3,*,5
(1 row)

It's also called array_join in some other systems, and we added that alias as well.

New aggregate function: stddev / stdvar​

feat: implement stddev/var function by shanicky #7952

Mind-blowing SQL surprise πŸ€―β€‹

I don't know that much about SQL before I becoming a database developer. Every now and then I got some new surprise from SQL...

Operator |/​

fix(sqlparser): align operator precedence with PostgreSQL by xiangjinwu #8174

Do you know Postgres has a square root operator |/ ...? The operator precedence might also surprise you.

postgres=> select |/4;
?column?
----------
2
(1 row)

postgres=> select |/4+12;
?column?
----------
4
(1 row)

NULL s hurt our head -- FULL OUTER JOIN in steraming​

Should we ban full outer join for streaming query? Β· Issue #8084

create table t (a int primary key);
insert into t values(null);
create materialized view v as select t1.* from t as t1 full join t as t2 on t1.a = t2.a;

Then v will have primary key (t1.a, t2.a) , but ...

left  side: +[null] --> Full Join -> +[null, null]
right side: +[null] --> Full Join -> +[null, null]

Rusty stuff πŸ¦€οΈβ€‹

We ❀️ Rust! This section is about some general Rust related issues.

Error handling​

Is appropriate return a Enum Type Error? Β· Issue #8074

Error handling is quite a topic in Rust (or any language). We ran into this discussion again.

ChaChaRng​

feat(sqlsmith): use ChaChaRng for determinism by kwannoel #8068: Use a reproducible rng so that deterministic fuzz test results can be (more) reproduceable.

Private marker trait​

refactor(meta): list all implementors of MetadataModel by zwang28 #8122

Just show you the code! 😲

mod private {
/// A marker trait helps to collect all implementors of `MetadataModel` in
/// `for_all_metadata_models`. The trait should only be implemented by adding item in
/// `for_all_metadata_models`.
pub trait MetadataModelMarker {}
}

pub trait MetadataModel: std::fmt::Debug + Sized + private::MetadataModelMarker {
// ...
}

macro_rules! for_all_metadata_models {
($macro:ident) => {
$macro! {
// These items should be included in a meta snapshot.
// So be sure to update meta backup/restore when adding new items.
{ risingwave_pb::hummock::HummockVersion },
// ...
}
};
}

macro_rules! impl_metadata_model_marker {
($({ $target_type:ty },)*) => {
$(
impl private::MetadataModelMarker for $target_type {}
)*
}
}

for_all_metadata_models!(impl_metadata_model_marker);

New Contributors​

As I mentioned last time, RisingWave is an open source project, and new contributors are always welcome. So happy to see we do have some new contributors this week:

Check out the good first issue and help wanted issues if you also want to join force! (They are usually carefully chosen. Not just random chore work. It's a good way to get started!)


P.S. Welcome to join the RisingWave Slack community.

So much for this week. (I also havn't fully learned the details of many of them yet...) See you next week (hopefully)! πŸ€—

P.P.S I'm also considering deep diving into one or few interesting things every week, instead of writing a weekly summary like this. What do you think? πŸ€”

Β· 7 min read
xxchan

RisingWave is a distributed SQL database for stream processing written in Rust. As a developer of RisingWave, I'm always excited (and also a little bit overwhelmed) about its progress every day.

So why not share the excitement with more people (and also help myself to get a better understanding)? That's why I decided to write this blog post about what's happening in the project. Hope you will enjoy it!

This blog series is my personal comments about (part of) the development of RisingWave.

Please take it as an unofficial and no-promise supplement.

Most exciting things πŸ€©β€‹

Huge reduce of CI time​

As my last post mentioned, last week I found some [stupidly effective ways to optimize Rust compile time]({% post_url 2023-02-17-optimize-rust-comptime-en %}). I managed to reduce the CI time from main 40min/PR 25min30s to main 28min/PR 16-19min, and it looks good this week!

It's quite a DX improvement. I'm very happy and would like to quote matklad's blog again:

Compilation time is a multiplier for basically everything. Whether you want to ship more features, to make code faster, to adapt to a change of requirements, or to attract new contributors, build time is a factor in that.

It also is a non-linear factor. Just waiting for the compiler is the smaller problem. The big one is losing the state of the flow or (worse) mental context switch to do something else while the code is compiling. One minute of work for the compiler wastes more than one minute of work for the human.

Let's take some time to prevent "broken windows". The effort would pay off!

DDL UX improvements​

DDL can take very long time if there are already a lot of existing data to consume. Previously you can only sit there and wait. But now, you can:

dev=> select * from rw_catalog.rw_ddl_progress;
ddl_id | ddl_statement | progress
--------+-------------------------------+----------
1026 | CREATE INDEX idx ON sbtest1(c) | 69.02%
(1 row)

Most intersting things πŸ˜„β€‹

OpenDAL into RisingWave!​

OpenDAL is a unified data access layer which aims to help access data freely, painlessly, and efficiently.

Previously, RisingWave only supports s3 as a storage backend (and also support other s3-compatible storage). Recently we are trying to add more storage backends.

Last week, we used OpenDAL to add support for using HDFS as a storage backend. This week, we tried more things:

  • Tried using google cloud storage in RisingWave (by wcy-fdu #7920). In our initial benchmark, it seems OpenDAL can be faster than s3-compatible protocol!
  • Changed the implementation for oss from s3-compatible mode to OpenDAL (by wcy-fdu #7969). S3-compatible mode for oss doesn't support delete_objects, and also suffers from some unstable issues.

It seems that OpenDAL is quite promising!

Rusty stuff πŸ¦€οΈβ€‹

We ❀️ Rust! This section is about some general Rust related issues.

clippy::or_fun_call ​

Compared with or , The or_else methods on Option and Result can help to avoid expensive computation in the None or Err case. (i.e., lazy evaluation!) And the clippy::or_fun_call lint is for detecting that. It used to be warn by default, but unfortunately it seems to have a high false positive rate, and is now allow by default.

But we met a case where it's very wanted. In a function, we used ok_or::<RwError> 9 times, so 9 RwError is created regardless of whether it's needed. But constructing RwError is very expensive because it captures the backtrace...

What's worse, on M1 Mac, capturing backtrace is VERY SLOW (~300ms #6131) and can even cause SEGFAULT (#6205)! πŸ˜‡ It's not completely resolved yet. We mitigated it by reducing unncessary backtrace capturing, but it's still a problem.

So for this issue, of course we should use ok_or_else instead (by xxchan #7945).

zip_eq & ExactSizeIterator ​

zip_eq is a safer version of zip which checks if the two iterators have the same length. However, it's notoriously slow (see #6856 by wangrunji0408 for a benchmark), because in the naive implementation, every item in the iterator is checked.

There's a ExactSizeIterator trait and naturally it can have an optimized implementation. Such speciallized implementation does not exist because Rust doesn't support specialization yet. (See tracking issue for specialization (RFC 1210))

So last week I added two new separated traits to replace zip_eq : (code here)

  • zip_eq_debug: uses zip_eq when debug_assertions is enabled, otherwise use zip. It's a good trade-off between safety and performance, and can be a drop-in replacement for zip_eq.
  • zip_eq_fast: speciallized implementation for ExactSizeIterator. (Actually zip_eq_debug can be good enough, but I still added this after playing with specialization for a while. Just for fun...!πŸ˜„)

This week, ExactSizeIterator are implemented for more our iterator implementations, so we can use zip_eq_fast more often... (by BugenZhao #7939)

Other notable things​

System Parameters​

Recently we are doing a large refactoring for system parameters in order to achive consistency and mutability for the cluster configurations (Tracking Issue #7381). This is a serious issue as RisingWave grows mature.

This week, the meta part for ALTER SYTEM is implemented by Gun9niR #7954.

pg_catalog​

Since RisingWave is PostgreSQL compatible, I had thought that supporting other database tools would be easy. But it turns out to be very hard 😭. The largest obstacles is that database tools ususally rely heavily on the system tables in pg_catalog to get the metadata of the database so that they can provide a better UX. But there are so many features in the system tables!

We have been constantly making efforts to support more and more system tables in order to integrate RisingWave into other tools. This week, we:

  • Added pg_catalog.pg_conversion by yezizp2012 #7964. This is for DBeaver support.
  • Found it's not possible to add typarray column in pg_catalog.pg_type #7555, which affects sqlalchemy support. But sqlalchemy-risingwave can be used as a substitute for the PG native plugin.

EXPLAIN format​

EXPLAIN format is another thing I had thought would be easy. Although current situation is not so bad, it's still far from perfect.

This week, we tried to use a "global expression id" to make the plan more readable, but it still has some problems:

By the way, here's another intersting stuff: rewrite the EXPLAIN implementation using the (modified) Wadler-style algebraic pretty printer! See the RFC by ice1000

Query optimizer​

Although OLAP batch queries are not the main focus of RisingWave, we still want to make it better. This week, we have these optimizer improvements to make batch queries faster:

We are also trying to add constant relation in streaming (#7854). As a first step, we are doing "plan-level constant folding", e.g., merge Union with Values inputs into Values by jon-chuang #7923

Last but not least​

By the way, welcome to join the RisingWave Slack community. Also check out the good first issue and help wanted issues if you want to join the development of an open source database system!

So much for this week. See you next week (hopefully)! πŸ€—