Google Summer of Code notes
Personal stash of notes I taken to work on improving pattern matching for gccrs
.
Compiler Glossary (i.e. terminologies that I encountered for the first time)
-
ADT: Algebric data types. Refers to structs and enums.
-
Variants: A enum type has multiple variants defined. Variants refer to the kind of types an enum can hold. Structs only have 1 variant.
-
Scrutinee: The expression that is getting matched. e.g.:
match x { ... }
, x is the scrutinee. -
Discriminant: Some data used by enums to stored to distinguish different variants of enums. Since structs also share the same ADT representation as enums in
gccrs
, they also have a discriminant value though it should be set to 0 always.
Classes involved in pattern compilation
Rust::Resolver::*
&Rust::Resolver2_0::*
: They are namespaces for classes that handles name resolution. Admittedly I am not very familiar with this topic, but name resolution is carried out on the AST before lowering to HIR.Rust::HIR::ASTLoweringPattern
: Translates patterns from AST to HIR.Rust::Compile::CompilePatternCheckExpr
: Compiles the checking expression for every match arm of a pattern.Rust::Compile::CompilePatternBindings
: Compiles the bindings of variables (e.g.(x, y, z) => { /* use x, y & z here */ }
andFoo::Bar(i) => { /* use i here */ }
).Rust::Compile::CompilePatternLet
: Compiles let statements likelet <PATTERN> = my_expression
(e.g. forTuplePattern
,let (x, y, z) = my_tensor
).
IdentifierPattern
matching
rust-compile-pattern.h
: CompilePatternCheckExpr
's void visit (HIR::IdentifierPattern &) override
is implemented as always returning boolean_true_node
, even in the case where a subpattern is specified with @
. The function has to be modified to return an expression check if a subpattern is specified.
Dumping hir-pretty
shows that the IdentifierPattern
's subpattern is not parsed properly. Given the following simple source:
fn main() {
let x = 2;
match x {
a @ 2 => {}
}
}
We get the following AST, and the following HIR,
fn main() {
let x = 2;
match x {
a @ 2 => {
}
,
} /* tail expr */
}
MatchExpr [
inner_attrs: empty
mapping: [C: 0 Nid: 20 Hid: 28]
outer_attributes: empty
branch_value:
PathInExpression [
segments: x,
mapping: [C: 0 Nid: 12 Hid: 23]
outer_attributes: empty
has_opening_scope_resolution: 0
] // PathInExpression
match_arms {
MatchCase [
arm {
MatchArm [
match_arm_patterns {
IdentifierPattern [
variable_ident: a
is_ref: 0
mut: 0
to_bind: none
] // IdentifierPattern
} // match_arm_patterns
] // MatchArm
} // arm
expr:
BlockExpr [
mapping: [C: 0 Nid: 18 Hid: 25]
outer_attributes: empty
inner_attrs: empty
tail_reachable: 1
statements: empty
] // BlockExpr
] // MatchCase
} // match_arms
] // MatchExpr
Despite the subpattern showing up on the AST, the subpattern is not lowered to the HIR.
Strategy
Pattern lowering (I'm guessing rust-ast-lower-pattern.cc: void ASTLoweringPattern::visit (AST::IdentifierPattern &pattern)
) has to be updated to recognize subpatterns of an IdentifierPattern
. [DONE ✅]
Afterwards, reimplement void visit (HIR::IdentifierPattern &) override
to compile check expressions if to_bind
subpattern is present (TO ASK: Whether we should rename this to (subpattern
instead?).to_bind
was renamed to subpattern
). [DONE ✅]
There still exists other areas where subpattern support is missing (e.g. under CompileVarDecl
, ClosureParamInfer
), and those should be implemented with a lower priority. [TO-DO ❌]
TupleStructPattern
matching
CompilePatternCheckExpr::visit (HIR::TupleStructPattern &pattern)
: payload_ref
is of type INTEGER_TYPE
(verified with GDB), which fails the check in the next line when passed back as parameter into Backend::struct_field_expression
. This problem is a result of the code assuming that all TupleStructPattern
-s being compiled here are enums, which have a different data representation from normal TupleStructPattern
-s.
Strategy
I have to add a new branch in the compilation logic for TupleStructPattern
for non-enum tuple structs. The compilation code for this should follow that of TuplePattern
's. [DONE ✅]
TuplePattern
w/ RestPattern
Type checking and compilation of check expressions are not implemented.
Strategy
Simply complete the functions for ItemType::RANGED:
cases.
- Type checking (
TypeCheckPattern::visit(HIR::TuplePattern)
) [DONE ✅ | Backlog TO-DO ❌: Update the type checker to continue type checking even after the size check fails] - Check expression compilation (
CompilePatternCheckExpr::visit(HIR::TuplePattern)
) [UNDER REVIEW 🚧]
SlicePattern
Type checking and compilation of check expression are not implemented.
Strategy
Simply complete the functions.
- Type checking (
TypeCheckPattern::visit(HIR::SlicePattern)
) [ON-GOING 🚧] - Check expression compilation (
CompilePatternCheckExpr::visit(HIR::SlicePattern)
, currently just returns a always-true node) [TO-DO ❌]
SlicePattern
w/ RestPattern
AST Lowering for RestPattern
is not properly implemented - the RestPattern
is translated into a nullptr
, likely due to the absense of an AST Lowering function that translates it. Safe to assume that type checking and compilation of check expression are also unimplemented.
Strategy
- Implement AST Lowering for
RestPattern
inSlicePattern
... [TO-DO ❌] - Followed by type checking (should be easy)... [TO-DO ❌]
- And finally, full compilation support in
CompilePatternCheckExpr::visit(HIR::SlicePattern)
. [TO-DO ❌]
Week 1: Debugging changes made for IdentifierPattern
support
Friday (2025-06-06) log:
Made some changes to support lowering of IdentifierPattern
subpattern from AST to HIR. After finding out that compilation of subpatterns is faulty and doing a lot of GDB-ing, I'm still not too sure what is causing the following check to fail, which causes the expression of the subpattern to be compiled into error_mark
tree node...
// gcc/rust/backend/rust-compile-expr.cc
...
void
CompileExpr::visit (HIR::LiteralExpr &expr)
{
TyTy::BaseType *tyty = nullptr;
if (!ctx->get_tyctx ()->lookup_type (expr.get_mappings ().get_hirid (),
&tyty))
return;
...
Gonna spend the weekend figuring out the type-check system in order to find out what change I should be making to make the type check context do this lookup properly.
Sunday (2025-06-08) log:
Turns out I needed to add respective type-checking in gcc/rust/typecheck/rust-hir-type-check-pattern.cc
(Thanks Owen Avery for pointing that out!) - have to keep this in mind when implementing support for other patterns.
Week 2: Debugging bindings for IdentifierPattern
Tuesday (2025-06-10) log:
I have to make the compiler also compile the bindings of the subpattern to the match scrutinee expression, but it seems to be not as simple as adding this under CompilePatternBindings
for IdentifierPattern
:
if (pattern.has_subpattern ())
{
CompilePatternBindings::Compile (pattern.get_subpattern (),
match_scrutinee_expr, ctx);
}
Did a HIR dump comparison between subpatterned and non-subpatterned patterns, for now I am unable to see why the short snipplet does not compile the bindings of subpatterns properly. I have to look at GDB again...
Friday (2025-06-13) log:
I had successfully implemented and tested subpattern bindings for IdentifierPattern
, turns out that the codebase did not support name resolution of subpatterns, which I had added in my pull request (Thanks Pierre-Emmanuel Patry for the guidance!).
While there are still more scenarios that IdentifierPattern subpatterns are unsupported (like this example of subpattern being used in a let
statement), I think it is more important to move on to implementing minimal support for other patterns first. To start investigating TupleStructPattern compilation errors tomorrow.
Week 3: Fixing compilation for TupleStructPattern
A pretty unproductive week as I was busy spending time with family and friends. With better understanding of how pattern compilation works in gccrs, I was able to deduce that the failures from compilation of TupleStructPattern
comes from an oversight that assumes all TupleStructPattern
-s are enums - this is partly due to enums being translated to TupleStructPattern
-s when being lowered from AST to HIR. PR fixing this issue.
Week 4: Making RestPattern
work
Thursday (2025-06-26) log:
As mentioned in my initial proposal, type checking support was not implemented for RestPattern
and SlicePattern
. I went around implementing type checking support for RestPattern
for tuples (known as ItemType::RANGE
in the codebase), which is a derivation of the default tuple type checking, splitted into more steps to type check the lower and upper patterns (i.e. patterns to the left and right of ..
) separately. PR implementing this type checking.
Sunday (2025-06-29) log:
Full compilation support for RestPattern
for tuples was implemented (Link to PR).
Subsequently while waiting for the PR to be reviewed, I started looking into SlicePattern
, and had identified the functions that requires implementation as highlighted above in my notes.
Week 5: Type checking SlicePattern
(Part 1)
Tiring week, didn't get to work on gccrs much after my work hours during weekdays. Slept through Saturday too.
SlicePattern
type checking is more tricky than it seems.
rustc
reference function (check_pat_slice
)- Issue: HIR implementation for
ArrayType
in gccrs does not have known capacity during compile-time - capacity is compiled to an expression instead. This was actually aFIXME
inArrayType
's code, maybe a future challenge for me.- Looking at
rustc
's code, I don't think there is a need to check capacity against aSliceType
parent...?
- Looking at
- To-do: Loop through all elements in the
SlicePattern
and type check it against the base type of theArrayType
. This probably can also be done onSliceType
. - To-do: Set inferred type to be the
ArrayType
orSliceType
depending on the parent. - Future challenge: AST Lowering must be updated to accomodate for RestPattern's presense in the
SlicePattern
to split into lower and upper patterns similarly toTuplePattern
- a problem for my future me to solve since I don't have experience with the AST Lowering codebase yet~
Week 6: Type checking SlicePattern
(Part 2, mid-term evaluation due end of the week)
Was not diligent in updating logs for weeks 6-8, so the logs for the 3 weeks are quick summaries of what I've done within the period. PRs merged for this week:
- Basic SlicePattern type checking - Included a fix to
LiteralPattern
type checking to resolve an edge case. - Add size checking to SlicePattern - This was made possible by Philip implementing const folding properly for
ArrayType
, thanks!
Week 7: Codegen for SlicePattern
matching (Part 1)
Worked on implementing compilation for SlicePattern matching against ArrayType scrutinee. Didn't work on SliceType scrutinee as I'm still not confident, I had to ask Philip about it and take a bit more time to figure it out myself.
Week 8: Codegen for SlicePattern
matching (Part 2)
Only worked on the PR to implement compilation for SlicePattern matching against SliceType scrutinee. The main challenging part is to create a new helper function in the backend code to generate code that accesses slice elements. It took me a long while to figure out how SliceType
is represented as gcc trees within gccrs
, though it's mostly thanks to the very helpful debug
function in rust-gcc.cc
, and I'm glad to implement it properly with the feedback and help from Philip as well!
Week 9: RestPattern
support for SlicePattern
The plan is to update the parser code and every stage after it to add support for RestPattern
for SlicePattern
. I forsee that it will also take a while due to needing to orientate myself in the code for the parser which I never touched before.
The other thing I noticed is code duplication in TupleStructItems
, TuplePatternItems
and (soon) SlicePatternItems
. I plan to ask Philip about whether it would be wise to refactor those code into 1 common PatternItems
class that stores multi-patterns list after I'm done with the above. If things go smoothly, my project should reach its conclusion by the end of Week 10/start of Week 11.