Google Summer of Code notes
Personal stash of notes I taken to work on improving pattern matching for gccrs.
Compiler Glossary (i.e. terminologies that I encountered for the first time)
-
ADT: Algebric data types. Refers to structs and enums.
-
Variants: A enum type has multiple variants defined. Variants refer to the kind of types an enum can hold. Structs only have 1 variant.
-
Scrutinee: The expression that is getting matched. e.g.:
match x { ... }, x is the scrutinee. -
Discriminant: Some data used by enums to stored to distinguish different variants of enums. Since structs also share the same ADT representation as enums in
gccrs, they also have a discriminant value though it should be set to 0 always.
Classes involved in pattern compilation
Rust::Resolver::*&Rust::Resolver2_0::*: They are namespaces for classes that handles name resolution. Admittedly I am not very familiar with this topic, but name resolution is carried out on the AST before lowering to HIR.Rust::HIR::ASTLoweringPattern: Translates patterns from AST to HIR.Rust::Compile::CompilePatternCheckExpr: Compiles the checking expression for every match arm of a pattern.Rust::Compile::CompilePatternBindings: Compiles the bindings of variables (e.g.(x, y, z) => { /* use x, y & z here */ }andFoo::Bar(i) => { /* use i here */ }).Rust::Compile::CompilePatternLet: Compiles let statements likelet <PATTERN> = my_expression(e.g. forTuplePattern,let (x, y, z) = my_tensor).
Debugging the codebase
Important pre-requisite: Ensure that gccrs is compiled with -O0 -g flags. I used the following ../configure command for my development:
../gccrs/configure CXXFLAGS="-O0 -g" CFLAGS="-O0 -g" --enable-languages=rust --disable-bootstrap --prefix=$HOME/gccrs-install --disable-bootstrap --enable-multilib --enable-checking=gimple,tree,types
Dump everything when compiling with crab1 executable: The -fdump-tree-all -frust-debug -frust-dump-all -frust-dump-ast-pretty flags are all pretty useful for checking:
- Whether AST is parsed properly.
- Whether HIR is lowered properly from AST.
- Whether the resultant GIMPLE IR file is correct.
This is the shell function I put in my shell environment, so that I can quickly call crabtest /PATH/TO/SOURCE/FILE.rs to compile a source file:
crabtest() {
/home/hyacinth/gccrs-build/gcc/crab1 "$1" -fdump-tree-all -frust-debug -frust-dump-all -frust-dump-ast-pretty -Warray-bounds -dumpbase test.rs -mtune=generic -march=x86-64 -O0 -version -fdump-tree-gimple -o test.s -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib64 -frust-incomplete-and-experimental-compiler-do-not-use
}
Print debugging
- To show what a
treeholds when compiling a.rsfile, thedebug_tree(tree)function can be used to print out the value. debug_tree(tree)can also be used ingdbduring runtime by executingprint debug_tree(my_tree_var).rust_debug(fmt_string)can also be used for simpler print debugging for other kinds of variables (i.e. printing out a string value from anmy_object.as_string().c_str()call).
IdentifierPattern matching
rust-compile-pattern.h: CompilePatternCheckExpr's void visit (HIR::IdentifierPattern &) override is implemented as always returning boolean_true_node, even in the case where a subpattern is specified with @. The function has to be modified to return an expression check if a subpattern is specified.
Dumping hir-pretty shows that the IdentifierPattern's subpattern is not parsed properly. Given the following simple source:
fn main() {
let x = 2;
match x {
a @ 2 => {}
}
}
We get the following AST, and the following HIR,
fn main() {
let x = 2;
match x {
a @ 2 => {
}
,
} /* tail expr */
}
MatchExpr [
inner_attrs: empty
mapping: [C: 0 Nid: 20 Hid: 28]
outer_attributes: empty
branch_value:
PathInExpression [
segments: x,
mapping: [C: 0 Nid: 12 Hid: 23]
outer_attributes: empty
has_opening_scope_resolution: 0
] // PathInExpression
match_arms {
MatchCase [
arm {
MatchArm [
match_arm_patterns {
IdentifierPattern [
variable_ident: a
is_ref: 0
mut: 0
to_bind: none
] // IdentifierPattern
} // match_arm_patterns
] // MatchArm
} // arm
expr:
BlockExpr [
mapping: [C: 0 Nid: 18 Hid: 25]
outer_attributes: empty
inner_attrs: empty
tail_reachable: 1
statements: empty
] // BlockExpr
] // MatchCase
} // match_arms
] // MatchExpr
Despite the subpattern showing up on the AST, the subpattern is not lowered to the HIR.
Strategy
Pattern lowering (I'm guessing rust-ast-lower-pattern.cc: void ASTLoweringPattern::visit (AST::IdentifierPattern &pattern)) has to be updated to recognize subpatterns of an IdentifierPattern. [DONE ✅]
Afterwards, reimplement void visit (HIR::IdentifierPattern &) override to compile check expressions if to_bind subpattern is present (TO ASK: Whether we should rename this to (subpattern instead?).to_bind was renamed to subpattern). [DONE ✅]
There still exists other areas where subpattern support is missing (e.g. under CompileVarDecl, ClosureParamInfer), and those should be implemented with a lower priority. [DONE ✅]
TupleStructPattern matching
CompilePatternCheckExpr::visit (HIR::TupleStructPattern &pattern): payload_ref is of type INTEGER_TYPE (verified with GDB), which fails the check in the next line when passed back as parameter into Backend::struct_field_expression. This problem is a result of the code assuming that all TupleStructPattern-s being compiled here are enums, which have a different data representation from normal TupleStructPattern-s.
Strategy
I have to add a new branch in the compilation logic for TupleStructPattern for non-enum tuple structs. The compilation code for this should follow that of TuplePattern's. [DONE ✅]
TuplePattern w/ RestPattern
Type checking and compilation of check expressions are not implemented.
Strategy
Simply complete the functions for ItemType::RANGED: cases.
- Type checking (
TypeCheckPattern::visit(HIR::TuplePattern)) [DONE ✅] - Check expression compilation (
CompilePatternCheckExpr::visit(HIR::TuplePattern)) [DONE ✅]
SlicePattern
Type checking and compilation of check expression are not implemented.
Strategy
Simply complete the functions.
- Type checking (
TypeCheckPattern::visit(HIR::SlicePattern)) [DONE ✅] - Check expression compilation (
CompilePatternCheckExpr::visit(HIR::SlicePattern), currently just returns a always-true node) [DONE ✅]
SlicePattern w/ RestPattern
AST Lowering for RestPattern is not properly implemented - the RestPattern is translated into a nullptr, likely due to the absense of an AST Lowering function that translates it. Safe to assume that type checking and compilation of check expression are also unimplemented.
Strategy
- Implement AST Lowering for
RestPatterninSlicePattern... [DONE ✅] - Followed by type checking (should be easy)... [DONE ✅]
- And finally, full compilation support in
CompilePatternCheckExpr::visit(HIR::SlicePattern). [DONE ✅]
Week 1: Debugging changes made for IdentifierPattern support
Friday (2025-06-06) log:
Made some changes to support lowering of IdentifierPattern subpattern from AST to HIR. After finding out that compilation of subpatterns is faulty and doing a lot of GDB-ing, I'm still not too sure what is causing the following check to fail, which causes the expression of the subpattern to be compiled into error_mark tree node...
// gcc/rust/backend/rust-compile-expr.cc
...
void
CompileExpr::visit (HIR::LiteralExpr &expr)
{
TyTy::BaseType *tyty = nullptr;
if (!ctx->get_tyctx ()->lookup_type (expr.get_mappings ().get_hirid (),
&tyty))
return;
...
Gonna spend the weekend figuring out the type-check system in order to find out what change I should be making to make the type check context do this lookup properly.
Sunday (2025-06-08) log:
Turns out I needed to add respective type-checking in gcc/rust/typecheck/rust-hir-type-check-pattern.cc (Thanks Owen Avery for pointing that out!) - have to keep this in mind when implementing support for other patterns.
Week 2: Debugging bindings for IdentifierPattern
Tuesday (2025-06-10) log:
I have to make the compiler also compile the bindings of the subpattern to the match scrutinee expression, but it seems to be not as simple as adding this under CompilePatternBindings for IdentifierPattern:
if (pattern.has_subpattern ())
{
CompilePatternBindings::Compile (pattern.get_subpattern (),
match_scrutinee_expr, ctx);
}
Did a HIR dump comparison between subpatterned and non-subpatterned patterns, for now I am unable to see why the short snipplet does not compile the bindings of subpatterns properly. I have to look at GDB again...
Friday (2025-06-13) log:
I had successfully implemented and tested subpattern bindings for IdentifierPattern, turns out that the codebase did not support name resolution of subpatterns, which I had added in my pull request (Thanks Pierre-Emmanuel Patry for the guidance!).
While there are still more scenarios that IdentifierPattern subpatterns are unsupported (like this example of subpattern being used in a let statement), I think it is more important to move on to implementing minimal support for other patterns first. To start investigating TupleStructPattern compilation errors tomorrow.
Week 3: Fixing compilation for TupleStructPattern
A pretty unproductive week as I was busy spending time with family and friends. With better understanding of how pattern compilation works in gccrs, I was able to deduce that the failures from compilation of TupleStructPattern comes from an oversight that assumes all TupleStructPattern-s are enums - this is partly due to enums being translated to TupleStructPattern-s when being lowered from AST to HIR. PR fixing this issue.
Week 4: Making RestPattern work
Thursday (2025-06-26) log:
As mentioned in my initial proposal, type checking support was not implemented for RestPattern and SlicePattern. I went around implementing type checking support for RestPattern for tuples (known as ItemType::RANGE in the codebase), which is a derivation of the default tuple type checking, splitted into more steps to type check the lower and upper patterns (i.e. patterns to the left and right of ..) separately. PR implementing this type checking.
Sunday (2025-06-29) log:
Full compilation support for RestPattern for tuples was implemented (Link to PR).
Subsequently while waiting for the PR to be reviewed, I started looking into SlicePattern, and had identified the functions that requires implementation as highlighted above in my notes.
Week 5: Type checking SlicePattern (Part 1)
Tiring week, didn't get to work on gccrs much after my work hours during weekdays. Slept through Saturday too.
SlicePattern type checking is more tricky than it seems.
rustcreference function (check_pat_slice)- Issue: HIR implementation for
ArrayTypein gccrs does not have known capacity during compile-time - capacity is compiled to an expression instead. This was actually aFIXMEinArrayType's code, maybe a future challenge for me.- Looking at
rustc's code, I don't think there is a need to check capacity against aSliceTypeparent...?
- Looking at
- To-do: Loop through all elements in the
SlicePatternand type check it against the base type of theArrayType. This probably can also be done onSliceType. - To-do: Set inferred type to be the
ArrayTypeorSliceTypedepending on the parent. - Future challenge: AST Lowering must be updated to accomodate for RestPattern's presense in the
SlicePatternto split into lower and upper patterns similarly toTuplePattern- a problem for my future me to solve since I don't have experience with the AST Lowering codebase yet~
Week 6: Type checking SlicePattern (Part 2, mid-term evaluation due end of the week)
Was not diligent in updating logs for weeks 6-8, so the logs for the 3 weeks are quick summaries of what I've done within the period. PRs merged for this week:
- Basic SlicePattern type checking - Included a fix to
LiteralPatterntype checking to resolve an edge case. - Add size checking to SlicePattern - This was made possible by Philip implementing const folding properly for
ArrayType, thanks!
Week 7: Codegen for SlicePattern matching (Part 1)
Worked on implementing compilation for SlicePattern matching against ArrayType scrutinee. Didn't work on SliceType scrutinee as I'm still not confident, I had to ask Philip about it and take a bit more time to figure it out myself.
Week 8: Codegen for SlicePattern matching (Part 2)
Only worked on the PR to implement compilation for SlicePattern matching against SliceType scrutinee. The main challenging part is to create a new helper function in the backend code to generate code that accesses slice elements. It took me a long while to figure out how SliceType is represented as gcc trees within gccrs, though it's mostly thanks to the very helpful debug function in rust-gcc.cc, and I'm glad to implement it properly with the feedback and help from Philip as well!
Week 9: RestPattern support for SlicePattern
The plan is to update the parser code and every stage after it to add support for RestPattern for SlicePattern. I forsee that it will also take a while due to needing to orientate myself in the code for the parser which I never touched before.
The other thing I noticed is code duplication in TupleStructItems, TuplePatternItems and (soon) SlicePatternItems. I plan to ask Philip about whether it would be wise to refactor those code into 1 common PatternItems class that stores multi-patterns list after I'm done with the above. If things go smoothly, my project should reach its conclusion by the end of Week 10/start of Week 11.
The remaining weeks?
Pretty much just quietly finished up my work without the need to ponder as much, as I got a lot more familiar with the codebase for pattern matching.