This week, we have a lot of updates π! To name a few: functional index, proc_time()
, int256
, CREATE CONNECTION
, and ... the beta release of RisingWave Cloud!
This Week in RisingWave #8
This week, we celebrated RisingWave's v0.18.0 release and it's first birthday! We talked about (many) new functions, system functions, generated columns, the dedup operator, and more.
This Week in RisingWave #7
This week, we talked about psql
completion, band join, significant DX improvements in the expression framework powered by procedural macros, and more.
This Week in RisingWave #6
This blog series is my personal comments about (part of) the development of RisingWave.
Please take it as an unofficial and no-promise supplement.
Features Updates πβ
NULLS {FIRST | LAST}
β
feat(common): support NULLS {FIRST | LAST}
by richardchien Β· Pull Request #8485
NULL
s can be very tricky in SQL. Do you known you can specify its ordering?
> select * from t order by x /* nulls last */;
+--------+
| x |
|--------|
| 1 |
| 2 |
| <null> |
+--------+
> select * from t order by x nulls first;
+--------+
| x |
|--------|
| <null> |
| 1 |
| 2 |
+--------+
It was tricky to support because previously in RisingWave, we only considered ascending/descending ordering, but not NULL
sβ ordering. That caused inconsistent NULL
s ordering around the system. When it comes to SQL query results, the NULL
values were sometimes the largest, sometimes the smallest.
This also blocked our progress to test RisingWave against other databaseβs test suites, because in PostgreSQL, NULL
s are the largest by default when compared against non-NULL values, while in SQLite and DuckDB they are the smallest by default. With inconsistent ordering of NULL
s and no support for NULLS { FIRST | LAST }
in order-by clauses, we can neither align with the behavior of one of these databases, nor specify a fixed NULL
ordering in test queries to align behaviors of all databases.
Recently, we did a thorough refactoring to force all components in our system to use an unified struct type representing ordering, OrderType
. As a result, this week we were finally able to implement NULLS { FIRST | LAST }
for order-by clauses at ease, by simply adding a new field, NullsAre
, to OrderType
, specifying whether NULL
s are largest or smallest. By setting Largest
as default value of NullsAre
, we achieved the same default ordering behavior as PostgreSQL.
Rename relationsβ
As I mentioned earlier, we are working on DDL support ( ALTER TABLE
), which is quite tricky. We've already done quite some work for supporting ADD/DROP COLUMN
. Here's another DDL: renaming relations is supported. This might be slightly easier than ADD/DROP COLUMN
, but there are still some tricky parts to consider carefully, like updating all related relations all at the same time.
For example, you can use the command ALTER TABLE t_1 RENAME TO t_2
to rename a table and its related relations will be modified recursively, which you can check with command such as SHOW CREATE MATERIALIZED VIEW mv_x
. There is no effect on the stored data.
Performance Optimizations πͺβ
Bushy tree join orderingβ
feat: Bushy tree join ordering by KveinAxel Β· Pull Request #8316
RisingWave is a streaming processing system that aims to provide real-time low latency for our users. By reducing the depth of the join tree, we can effectively minimize latency.
See the illustration below:
To reduce the distance that barriers from join inputs need to travel to the join output, we convert a left-deep tree with a height of 3 into a bushy tree with a height of 5.
Rusty stuff π¦οΈβ
We β€οΈ Rust! This section is about some general Rust related issues.
zldβ
Cannot run risingwave binary on Mac OS M1 (when linked by zld
) Β· Issue #8608
@ahmedriza reported that he cannot run the binary on his Macbook M1 due to "symbol not found".
$ ./target/debug/risingwave
dyld[14116]: symbol not found in flat namespace (__ZN15protobuf_native2io23DeleteCodedOutputStreamEPN6google8protobuf2io17CodedOutputStreamE)
Abort trap: 6
After some investigation, he found that it's because he used zld
in his global cargo config, and RisingWave has a repository-level cargo config that uses lld
. Although it is still not clear whether the error is caused by zld
or the conflict between the cargo configs, he could solve the problem by just using lld
.
This is also one of the reasons why I enjoy open source: hackers are very willing to raise issues and even analyze and solve problems themselves. I can also learn a lot from their thoroughly analysis. From this issue, I learned:
- How cargo config (
$HOME/.cargo/config.toml
, and/projects/.cargo/config.toml
) works in more detail - There's another linker
zld
. (But it's already deprecated in favor oflld
.)
New Contributorsβ
(Of course, @ahmedriza is also a great new contributor!)
feat: Add support for array_length function in psql by kamalesh0406 Β· Pull Request #8636
This week we have another first-time contributor @kamalesh0406. He said "This is my first time writing rust code" π².
So it seems RisingWave's good first issues are a good way to learn and practice Rust. π Don't hesitate and just join us if you are also interested in the development of an open source database system, or Rust!
Finally, welcome to join the RisingWave Slack community.
So much for this week. See you next week! π€
This Week in RisingWave #5
This blog series is my personal comments about (part of) the development of RisingWave.
Please take it as an unofficial and no-promise supplement.
Notable changes πβ
Temporal joinβ
- Tracking: Process time temporal join Β· Issue #8123
- RFC: Temporal Join by chenzl25 #49 Β· risingwavelabs/rfcs
Lots of production scenarios contain a fact table and several dimension tables, where users want to enrich (join) their fact table with dimension tables. Different from regular stream joins, under the enrichment scenario users may want to keep the previous join outputs unaffected when the dimension table is updated. This is because we only want to enrich the fact table without duplicated outputs.
Temporal join is for this scenario. More technically speaking, it joins an append-only stream (such as Kafka) with a temporal table (aka versioned table, e.g. backed by MySQL CDC). The stream side lookups the temporal table, which means the join is driven by the stream side only.
The syntax is like:
SELECT * FROM stream LEFT JOIN versioned FOR SYSTEM_TIME AS OF NOW() ON stream.col = versioned.id
Interesting SQL features πβ
I don't know that much about SQL before I becoming a database developer. Every now and then I got some new surprise from SQL...
Server local timezoneβ
Do you know SQL standard has two timestamp types: timestamp with/without time zone
?
Support SET TIME ZONE LOCAL
syntax Β· Issue #8551
The new syntax allows us to set the server's local timezone, which is useful for local testing.
dev=> select now();
now
-------------------------------
2023-03-16 10:41:10.951+00:00
(1 row)
dev=> set time zone local;
SET_VARIABLE
dev=> select now();
now
-------------------------------
2023-03-16 11:41:36.958+01:00
(1 row)
BTW, this is done via strawlab/iana-time-zone: Rust crate to get the IANA time zone for the current system.
Intersting Bugβ
Inverse of column index mappingβ
fix(optimizer): fix hash join distribution by chenzl25 #8598
I talked about ColIndexMapping
in This Week in RisingWave #3.
Althought mathematically simple and intuitive, itβs not easy to do such mappings correctly in programs.
Well, then we met another bug related to ColIndexMapping
this week π₯². (Luckily, it's not very easy to trigger it.) This time, it's about the inverse of the mapping. Shortly speaking, suppose we have an array of index pairs [(l1, r1), (l2, r2), ...]
, naturally we can build two mappings l -> r
and r -> l
. However, the inverse of l -> r
is not r -> l
! Can you tell why?
Reliability Improvements πͺβ
The Great MadSim!β
fix: avoid panic when upstream input is closed for lookup #8529
This week, we identified a new bug through MadSim that deterministically shuts down and restarts nodes in a RisingWave cluster. This time, the bug was found during the execution path of the lookup executor. Thanks to MadSim, we were able to quickly identify the issue and resolve it.
Interval bugfixes and testsβ
- fix(common): interval overflow panic / wrap during comparison and justify #8556
- fix(common): interval should have microsecond precision #8501
- test(regress): enable interval #8438
Intervals are a fundamental data type for a streaming SQL database, but they can also be sophisticated in some ways. Recently, RisingWave has enhanced its support for intervals and migrated many related tests from Postgres.
OpenDALβ
feat(test): add e2e test for OpenDAL fs backend #8528
Since February, RisingWave has been using OpenDAL as one of its underlying object storage implementations. OpenDAL greatly reduces our efforts in supporting various cloud storage systems, especially HDFS. This PR uses opendal fs engine to mock memory objects store.
By the way, OpenDAL is now an Apache Incubator project! π
Rusty stuff π¦οΈβ
We β€οΈ Rust! This section is about some general Rust related issues.
Be more careful about error creation!β
fix(expr): do not construct error for extracting time subfield by BugenZhao #8538
Error creation can be very expensive!
In This Week in RisingWave #1, I mentioned we can use ok_or_else
to create expensive error lazily. This time the errors are not actually needed. Option
is enough. Basically, I mean cases like this:
// Don't do this!
fn inner() -> Result<T> {}
fn outer() -> Result<T> {
match inner() {
Ok(t) => Ok(t),
Err(_) => {
// try a different computation
...
},
}
}
My takeaway is: Think more about the definition of error types and try to keep it small. If it's unavoidably large, then we have to think more when we use it.
BTW, kudos to @BugenZhao for catching this issue (again)!
P.S., this PR brings us 1000%+ throughput improvement (π€―) on nexmark q14, which is a simple SELECT
with extract(hour from date_time)
.
New Contributorsβ
Support optional parameter offset in tumble and hop by Eridanus117 #8490
This is the second PR by @Eridanus117.
feat(expr): support builtin function pi. by broccoliSpicy #8509
This is the second PR by @broccoliSpicy.
It's great to see new contributors joining in, and even better when they show interest in diving deeper and contributing continuously! π₯°
CREATE SINK panic Β· Issue #8482
I remember @JuchangGit had submitted 2 issues in the past. This week he submits another one. I'd like to mention this because open source contribution is not only about code (PRs). Playing with the software and reporting issues are also very important contributions!
Finally, welcome to join the RisingWave Slack community. Also check out the good first issue and help wanted issues if you want to join the development of an open source database system!
So much for this week. See you next week (hopefully)! π€
This Week in RisingWave #4
This blog series is my personal comments about (part of) the development of RisingWave.
Please take it as an unofficial and no-promise supplement.
Notable changes πβ
Serial typeβ
- feat(common): Add support for DataType::Serial
- RFC: add serial type
- remove the redundant exchange after append-only source executor
As a distributed database system, RisingWave has multiple instances of operators to achieve a high degree of parallelism. Some operators require specific input distribution to ensure the correctness of the result, so data must be shuffled at that point, which is represented by the Exchange
operator.
At the same time, RisingWave generates hidden _row_id
columns for sources without primary keys (aka append-only sources), also for the correctness of the system. Previously the _row_id
is randomly distributed, which leads to the result that we must insert a HashExchange
later after the source operator to enforce its distribution.
Since _row_id
is fully controlled by us, why donβt we directly generate _row_id
with desired distribution? So an optimization of using a new internal type Serial
for _row_id
with specialized shuffling logic is proposed to remove the unnecessary Exchange
operators.
Auto execution mode selectionβ
feat(batch): support auto execution mode #8274
Currently, RisingWave has 2 kinds of execution mode: local and distributed, which can be tuned by SET query_mode = [local | distributed]
. Lower latency can be achieved by running OLTP queries in local execution mode, and running OLAP queries in distributed execution mode.
However, previously we leave the choice to users which makes our distributed execution useless to some extent, because most users don't know or understand what those options exactly mean. As a database, we have the ability and should make a good choice for users to reduce their tuning works.
AWS Private link supportβ
feat(source): support private link for kafka connector #8247
When using a cloud environment, users may face connection issues when attempting to create a source in RisingWave due to their AWS MSK service being located in a different VPC. To solve this, AWS PrivateLink can be used to establish a connection between RisingWave's VPC and the user's VPC. Users can set up an endpoint service to expose their MSK service, allowing RisingWave to easily create an endpoint to access it.
Sink Validationβ
feat(meta): introduce sink validation in meta #8417
Recently we already implemented MySQL/Postgres sink support. We are now working hard to keep improving its stability and usability. This PR Introduced validation for sink so that we can catch errors in an earlier phase and give better error messages.
Reliability Improvements πͺβ
Fuzzing Testsβ
feat(sqlsmith): Generate more join expressions #8395
This week we added more targeted testing for join
expressions. We increased the number of joins, added generation for non_equi
joins and added more equi
join predicates.
The goal of these changes is to increase the coverage of join
executors in streaming and batch, since the functionality of JOIN
s are often complex.
Deterministic Testsβ
During the past week, MadSim, our deterministic testing framework, identified more than five issues that we promptly resolved. We are confident in the effectiveness of our framework. Additionally, I'm eagerly anticipating @wangrunji0408's upcoming blog post about MadSim internals. π€©
Rusty stuff π¦οΈβ
We β€οΈ Rust! This section is about some general Rust related issues.
Reduce debuginfo sizeβ
Although I already briefly mentioned this in the previous blog, let me elaborate it a little here.
fix: reduce debuginfo size #8326
We noticed unexpected 2GB of memory consumption on Meta node. The cause is quite interesting.
We found the memory consumption happens when meta node writes a log with backtrace included. To get the backtrace, debuginfo in the risingwave binary is loaded (and cached!). This is somewhat reasonable, but should the debuginfo be so large?
Why are debug symbols so huge? is a good post explaining what is debuginfo. Inspired by that, we tuned the level of debuginfo. This is a trade-off between binary size and utility (we still have most useful information for debug). At the same time, we also disabled debuginfo compression. This is another trade-off between binary size and memory overhead.
Capture unrecognized fields in serdeβ
https://github.com/risingwavelabs/risingwave/pull/8325
By default, serde ignores unknown fields. By #[serde(deny_unknown_fields)]
, serde rejects unknown fields. But what if you want to tolerate them, but produce a useful warning about it at the same time? Hereβs a useful tip: By #[serde(flatten)]
you can capture the unknown fields:
#[derive(Serialize, Deserialize)]
pub struct S {
// ...
#[serde(flatten)]
pub unrecognized: HashMap<String, serde_json::Value>,
}
Auto derive a prefixed alias for protobuf message typesβ
When using protobuf for communication, one thing might be defined multiple times in different places for different purposes. e.g., if we have a protobuf message Msg
, a crate using Msg
might have itβs own type for Msg
at the same time, which might include useful methods and maybe a different data structure representation with the protobuf one. Previously, we need to manually alias it to avoid conflict. e.g.,
use pb_crate::Msg as PbMsg;
struct Msg {
// ...
}
impl Msg {
pub fn to_proto(self) -> PbMsg {
// ...
}
}
Now we came up with a method to mitigate this problem. Thanks to prost
βs ability to customize generated code for protobuf messages, we can add type alias pub type PbMsg = Msg
at the same time of defining Msg
. In this way, when trying to type PbMsg
, the IDE will be able to find the type and automatically import it.
Finally, welcome to join the RisingWave Slack community. Also check out the good first issue and help wanted issues if you want to join the development of an open source database system!
So much for this week. See you next week (hopefully)! π€
This Week in RisingWave #3
This blog series is my personal comments about (part of) the development of RisingWave.
Please take it as an unofficial and no-promise supplement.
Notable changesβ
Memory control policyβ
- feat(memory): introduce memory control policy for computing tasks by xx01cyx #7767
- Tracking: refactor user-configurable memory control policy #8228
- refactor: introduce memory control policy abstraction by xx01cyx #8253
Recently we have implemented memory control mechanism, and now we are introducing the policy. Since RisingWave is a "streaming database", it is crucial for us to consider both batch and streaming tasks while balancing the usage of memory between them.
Move memtableβ
feat(storage): move memtable down to local state store by wenym1 #7183
Our streaming executors access storage via StateTable
interface, and storage layer exposes LocalStateStore
interface. This PR wants to move part of the former (memtable) to the latter. It might help to implement more features in the storage layer, and also make it easier to implement other storage engines for bench purpose.
Optimizer updatesβ
- refactor(optimizer): divide logical optimizer into one for batch and one for streaming. by chenzl25 #8192
- perf(agg): reuse existing agg calls while building
LogicalAgg
by richardchien #8200
feat: Bushy tree join ordering by KveinAxel #8316
RisingWave is a streaming processing system which aims to provide real time low latency for our users. Making the join tree shallower helps to reduce the latency.
See the illustration below:
Async exprβ
refactor(expr): make evaluation async by wangrunji0408 #8229
... to prevent blocking when evaluting UDFs
RFC: suspend MVβ
RFC: Suspend MV on Non-Recoverable Errors by hzxa21 #54
Last time, I mentioned that RisingWave currently tolerates compute errors by default and we are in the process of implementing an error reporting mechanism. However, we had previously discussed another mechanism for handling errors which involved suspending MV. This RFC reintroduces this idea.
Intersting Bugβ
Injectivity of column index mappingβ
bug: mv with join and duplicate output columns has row indices hidden #8216
ColIndexMapping
is an important utility in optimizer used when we wants to map an input column (index) to an output column. It's a partial mapping from [0, n)
to [0, m)
, where n
is the number of columns in the input, and m
is the number of columns in the output.
When I began to work on RisingWave (one year ago π²), I worked on column pruning, which removes unnecessary columns from the plan nodes. Naturally it involves a lot with ColIndexMapping
. Althought mathematically simple and intuitive, it's not easy to do such mappings correctly in programs. I had a hard time understanding ColIndexMapping
at that time, and added many comments and did some refactor to make it less confusing and error-prone.
After long, we met this new bug related to ColIndexMapping
again.. Under curtain circumstances, a plan node will have a non-injective mapping and duplicate columns in the output. In this case, the duplicate columns are unexpectedly hidden after mapping. We fixed this bug by considering the injectivity of the mapping. A more systematic solution is also proposed: Proposal (frontend): Prune duplicate columns #8277.
Rusty stuff π¦οΈβ
We β€οΈ Rust! This section is about some general Rust related issues.
Publish await-tree
β
- refactor: switch
async_stack_trace
to the crates.io version ofawait-tree
by BugenZhao #8254 - await-tree - crates.io: Rust Package Registry
We have a tracing mechanism called async stack trace in our system, which allows us to βcapture a snapshotβ of where, why, and how long the async tasks are pending in real-time. It helped us locate some stuck issues, e.g., streaming deadlock, which would be very hard to debug in other ways.
Now we have published it to crates.io, and renamed it to await-tree
. I already invited @BugenZhao to write a blog post to introduce it, and can't wait to see it. π
feat(dashboard): dump await-tree of compute nodes by BugenZhao #8330
We also integrated it into the RisingWave dashboard to make it more convenient to use. π
Large memory usage by backtrace and debuginfoβ
fix: reduce debuginfo size by fuyufjh #8326
We noticed unexpected memory consumption on Meta node. And the cause is (again) backtrace...
New contributorβ
So happy to see another 3 first-time contributors this week! π²π₯³
(BTW, I'm also quite interested in where they come from. Are they encouraged by my posts? If so, please let me know! π€π€£)
- feat(expr): support
array_distinct
by snipekill #8315 - fix(parser): disable single-quoted strings as aliases for column or table by Eridanus117 #8338
@erichgess submitted 2 PRs. I'm quite impressed by the thoroughness of their communication in issue and PR discussions regarding the problem's situation, cause, and solution. This is one of the biggest reasons why I love open source or working in public; transparent over-communication benefits both current collaborators' reviews and future contributors' learning. π€
- fix(
exp
PG compatibility): if a very large or very small operand is used, thenexp
errors by erichgess #8309 - bug(unit tests): Cache unit tests fail when
/tmp
is atmpfs
drive #8278
Check out the good first issue and help wanted issues if you also want to join force! (They are usually carefully chosen. Not just random chore work. It's a good way to get started!)
P.S. Welcome to join the RisingWave Slack community.
So much for this week. See you next week (hopefully)!
This Week in RisingWave #2
This blog series is my personal comments about (part of) the development of RisingWave.
Please take it as an unofficial and no-promise supplement.
Most exciting things π€©β
Common sub-plan sharingβ
We decided to refactor tree-structured plan node into a DAG some time ago (See RFC #28). The largest benefit is to enable common sub-plan sharing.
Previously we already implemented LogicalShare
, which represents the plan node that will be shared, and used it to share source/subquery/CTE/view. Now comes the last piece of the puzzle: common sub-plan sharing.
- feat(optimizer): Common sub-plan detection. by wsx-ucb #7865
- chore(streaming): rewrite nexmark q5 to benefit from common sub-plan sharing by chenzl25 #8159
Python UDFβ
Unlike Flink, RisingWave's uses SQL as the main interface, to make it easier for users to use. But sometimes users do want some custom logic... We are doing some interesting experiments on this topic, and this is a little milestone. Basically you can already play with Python UDF now!
feat(udf): minimal Python UDF SDK by wangrunji0408 #7943
Other notable thingsβ
Telemetryβ
Telemetry can greatly help us understand and improve how the system behaves in real-world scenarios!
Replace Bloom filter with XOR filterβ
- perf(bloom filter): replace bloom filter with xor filter by soundOfDestiny #8081
- We are also investigating Ribbon filter Β· Issue #7406
Stream error reportingβ
RisingWave tolerates compute errors by default (see #4625), but previously errors are only shown in the log. Now we are trying to report the errors to users (See #7824).
- feat(stream): Report
compute_error_count
to prometheus by jon-chuang #7832 - feat(stream):
source_error_count
reporting to prometheus by jon-chuang #7877 - feat(stream):
ErrorSuppressor
for user compute errors by jon-chuang #8132 - test(stream): Test reporting of stream errors in e2e setting Β· Issue #8037
Table schema changeβ
ALTER TABLE
is quite tricky for streaming... ADD COLOMN
is somewhat reasonable, and we are working on it as the first step.
- feat(streaming): support output indices in dispatchers by BugenZhao #8094: This ensures existing downstream MV to receive old output columns and new downstream MV receive all columns. It also enables a perf optimization for MV on MV.
- feat: adding column of table schema change by BugenZhao #8063
New major SQL featuresβ
jsonb
data typeβ
The
json
data type stores an exact copy of the input text, which processing functions must reparse on each execution; whilejsonb
data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed.jsonb
also supports indexing, which can be a significant advantage.
- feat(sqlparser): support jsonb operator
->
and->>
by xiangjinwu #8144 - Tracking: jsonb operations Β· Issue #7714
New array function: array_to_string
β
feat(expr): support array_to_string
by fuyufjh #8027
postgres=> select array_to_string(array[1, 2, 3, NULL, 5], ',');
array_to_string
-----------------
1,2,3,5
(1 row)
postgres=> select array_to_string(array[1, 2, 3, NULL, 5], ',', '*');
array_to_string
-----------------
1,2,3,*,5
(1 row)
It's also called array_join
in some other systems, and we added that alias as well.
New aggregate function: stddev
/ stdvar
β
feat: implement stddev/var function by shanicky #7952
Mind-blowing SQL surprise π€―β
I don't know that much about SQL before I becoming a database developer. Every now and then I got some new surprise from SQL...
Operator |/
β
fix(sqlparser): align operator precedence with PostgreSQL by xiangjinwu #8174
Do you know Postgres has a square root operator |/
...? The operator precedence might also surprise you.
postgres=> select |/4;
?column?
----------
2
(1 row)
postgres=> select |/4+12;
?column?
----------
4
(1 row)
NULL
s hurt our head -- FULL OUTER JOIN in steramingβ
Should we ban full outer join for streaming query? Β· Issue #8084
create table t (a int primary key);
insert into t values(null);
create materialized view v as select t1.* from t as t1 full join t as t2 on t1.a = t2.a;
Then v
will have primary key (t1.a, t2.a)
, but ...
left side: +[null] --> Full Join -> +[null, null]
right side: +[null] --> Full Join -> +[null, null]
Rusty stuff π¦οΈβ
We β€οΈ Rust! This section is about some general Rust related issues.
Error handlingβ
Is appropriate return a Enum Type Error? Β· Issue #8074
Error handling is quite a topic in Rust (or any language). We ran into this discussion again.
ChaChaRng
β
feat(sqlsmith): use ChaChaRng
for determinism by kwannoel #8068: Use a reproducible rng so that deterministic fuzz test results can be (more) reproduceable.
Private marker traitβ
refactor(meta): list all implementors of MetadataModel by zwang28 #8122
Just show you the code! π²
mod private {
/// A marker trait helps to collect all implementors of `MetadataModel` in
/// `for_all_metadata_models`. The trait should only be implemented by adding item in
/// `for_all_metadata_models`.
pub trait MetadataModelMarker {}
}
pub trait MetadataModel: std::fmt::Debug + Sized + private::MetadataModelMarker {
// ...
}
macro_rules! for_all_metadata_models {
($macro:ident) => {
$macro! {
// These items should be included in a meta snapshot.
// So be sure to update meta backup/restore when adding new items.
{ risingwave_pb::hummock::HummockVersion },
// ...
}
};
}
macro_rules! impl_metadata_model_marker {
($({ $target_type:ty },)*) => {
$(
impl private::MetadataModelMarker for $target_type {}
)*
}
}
for_all_metadata_models!(impl_metadata_model_marker);
New Contributorsβ
As I mentioned last time, RisingWave is an open source project, and new contributors are always welcome. So happy to see we do have some new contributors this week:
- fix(executor): exit early when limit = 0 by Dousir9 #8013
- fix(parser): disallow empty object names by broccoliSpicy #8171
- fix(sqlparser): disallow JOIN without CROSS/ON/USING by sun-jacobi #7693
Check out the good first issue and help wanted issues if you also want to join force! (They are usually carefully chosen. Not just random chore work. It's a good way to get started!)
P.S. Welcome to join the RisingWave Slack community.
So much for this week. (I also havn't fully learned the details of many of them yet...) See you next week (hopefully)! π€
P.P.S I'm also considering deep diving into one or few interesting things every week, instead of writing a weekly summary like this. What do you think? π€
This Week in RisingWave #1
RisingWave is a distributed SQL database for stream processing written in Rust. As a developer of RisingWave, I'm always excited (and also a little bit overwhelmed) about its progress every day.
So why not share the excitement with more people (and also help myself to get a better understanding)? That's why I decided to write this blog post about what's happening in the project. Hope you will enjoy it!
This blog series is my personal comments about (part of) the development of RisingWave.
Please take it as an unofficial and no-promise supplement.
Most exciting things π€©β
Huge reduce of CI timeβ
As my last post mentioned, last week I found some [stupidly effective ways to optimize Rust compile time]({% post_url 2023-02-17-optimize-rust-comptime-en %}). I managed to reduce the CI time from main 40min/PR 25min30s to main 28min/PR 16-19min, and it looks good this week!
It's quite a DX improvement. I'm very happy and would like to quote matklad's blog again:
Compilation time is a multiplier for basically everything. Whether you want to ship more features, to make code faster, to adapt to a change of requirements, or to attract new contributors, build time is a factor in that.
It also is a non-linear factor. Just waiting for the compiler is the smaller problem. The big one is losing the state of the flow or (worse) mental context switch to do something else while the code is compiling. One minute of work for the compiler wastes more than one minute of work for the human.
Let's take some time to prevent "broken windows". The effort would pay off!
DDL UX improvementsβ
DDL can take very long time if there are already a lot of existing data to consume. Previously you can only sit there and wait. But now, you can:
- Show DDL's progress: (by chenzl25 #7914)
dev=> select * from rw_catalog.rw_ddl_progress;
ddl_id | ddl_statement | progress
--------+-------------------------------+----------
1026 | CREATE INDEX idx ON sbtest1(c) | 69.02%
(1 row)
- and cancel streaming jobs by ctrl-c! (by yezizp2012 #7917)
Most intersting things πβ
OpenDAL into RisingWave!β
OpenDAL is a unified data access layer which aims to help access data freely, painlessly, and efficiently.
Previously, RisingWave only supports s3 as a storage backend (and also support other s3-compatible storage). Recently we are trying to add more storage backends.
Last week, we used OpenDAL to add support for using HDFS as a storage backend. This week, we tried more things:
- Tried using google cloud storage in RisingWave (by wcy-fdu #7920). In our initial benchmark, it seems OpenDAL can be faster than s3-compatible protocol!
- Changed the implementation for oss from s3-compatible mode to OpenDAL (by wcy-fdu #7969). S3-compatible mode for oss doesn't support
delete_objects
, and also suffers from some unstable issues.
It seems that OpenDAL is quite promising!
Rusty stuff π¦οΈβ
We β€οΈ Rust! This section is about some general Rust related issues.
clippy::or_fun_call
β
Compared with or
, The or_else
methods on Option
and Result
can help to avoid expensive computation in the None
or Err
case. (i.e., lazy evaluation!) And the clippy::or_fun_call
lint is for detecting that. It used to be warn
by default, but unfortunately it seems to have a high false positive rate, and is now allow
by default.
But we met a case where it's very wanted. In a function, we used ok_or::<RwError>
9 times, so 9 RwError
is created regardless of whether it's needed. But constructing RwError
is very expensive because it captures the backtrace...
What's worse, on M1 Mac, capturing backtrace is VERY SLOW (~300ms #6131) and can even cause SEGFAULT (#6205)! π It's not completely resolved yet. We mitigated it by reducing unncessary backtrace capturing, but it's still a problem.
So for this issue, of course we should use ok_or_else
instead (by xxchan #7945).
zip_eq
& ExactSizeIterator
β
zip_eq
is a safer version of zip
which checks if the two iterators have the same length. However, it's notoriously slow (see #6856 by wangrunji0408 for a benchmark), because in the naive implementation, every item in the iterator is checked.
There's a ExactSizeIterator
trait and naturally it can have an optimized implementation. Such speciallized implementation does not exist because Rust doesn't support specialization yet. (See tracking issue for specialization (RFC 1210))
So last week I added two new separated traits to replace zip_eq
: (code here)
zip_eq_debug
: useszip_eq
whendebug_assertions
is enabled, otherwise usezip
. It's a good trade-off between safety and performance, and can be a drop-in replacement forzip_eq
.zip_eq_fast
: speciallized implementation forExactSizeIterator
. (Actuallyzip_eq_debug
can be good enough, but I still added this after playing with specialization for a while. Just for fun...!π)
This week, ExactSizeIterator
are implemented for more our iterator implementations, so we can use zip_eq_fast
more often... (by BugenZhao #7939)
Other notable thingsβ
System Parametersβ
Recently we are doing a large refactoring for system parameters in order to achive consistency and mutability for the cluster configurations (Tracking Issue #7381). This is a serious issue as RisingWave grows mature.
This week, the meta part for ALTER SYTEM
is implemented by Gun9niR #7954.
pg_catalog
β
Since RisingWave is PostgreSQL compatible, I had thought that supporting other database tools would be easy. But it turns out to be very hard π. The largest obstacles is that database tools ususally rely heavily on the system tables in pg_catalog
to get the metadata of the database so that they can provide a better UX. But there are so many features in the system tables!
We have been constantly making efforts to support more and more system tables in order to integrate RisingWave into other tools. This week, we:
- Added
pg_catalog.pg_conversion
by yezizp2012 #7964. This is for DBeaver support. - Found it's not possible to add
typarray
column inpg_catalog.pg_type
#7555, which affects sqlalchemy support. But sqlalchemy-risingwave can be used as a substitute for the PG native plugin.
EXPLAIN
formatβ
EXPLAIN
format is another thing I had thought would be easy. Although current situation is not so bad, it's still far from perfect.
This week, we tried to use a "global expression id" to make the plan more readable, but it still has some problems:
- feat: try to improve explain result by fuyufjh #7953
- New project explain format makes planner test unstable #8005
- refactor(optimizer): reset expression display id for explain by chenzl25 #8006
By the way, here's another intersting stuff: rewrite the EXPLAIN
implementation using the (modified) Wadler-style algebraic pretty printer! See the RFC by ice1000
Query optimizerβ
Although OLAP batch queries are not the main focus of RisingWave, we still want to make it better. This week, we have these optimizer improvements to make batch queries faster:
- feat(optimizer): index accelerating
TopN
by Eurekaaw #7726 - feat(optimizer): support like expression rewrite by chenzl25 #7982
We are also trying to add constant relation in streaming (#7854). As a first step, we are doing "plan-level constant folding", e.g., merge Union
with Values
inputs into Values
by jon-chuang #7923
Last but not leastβ
- fix: fix idle exit in playground mode of compactor by yezizp2012 #8014. During an HTTP server's graceful shutdown (
serve_with_shutdown
), it will wait for all the connections to be closed. We ran into a situation where some connections were not closed and the server waits forever... - Tracking: Monitoring, logging and tooling improvements #8018. Observability matters!
- Support error line/column number reporting during parsing #7863. This is a lovely feature to increase UX, but it will require hard work :)
By the way, welcome to join the RisingWave Slack community. Also check out the good first issue and help wanted issues if you want to join the development of an open source database system!
So much for this week. See you next week (hopefully)! π€