If you are new to Semgrep I recommend checking out my previous post where we cover the basics and create a new rule step by step.
In this post we will tackle the challenge of flagging scenarios in our codebase where developers forgot to check the origin of a message consumed through postMessage. This rule will be used to mitigate one of the security concerns mentioned within the MDN documentation, the origin
validation.
Any window (including, for example, http://evil.example.com) can send a message to any other window, and you have no guarantees that an unknown sender will not send malicious messages.
Here’s an example of this vulnerability being exploited.
I’ve also submitted a Pull Request to the semgrep-rules repository and by the end of this post we will understand every aspect of it. :)
With the context in place, let’s start!
Patterns to flag
Before we start writing our new rule it helps to document the patterns we want to match. Besides having a clear spec to work with it also makes it easier to reason about which operators we will need.
// Inline function without origin check
window.addEventListener("message", function(evt){
console.log('No origin check!');
});
// The following line creates a function that will
// be used as a handler by our addEventListener. This
// handler should also have an origin check.
function receiveMessage(evt) {
console.log('No origin check!');
}
window.addEventListener("message", receiveMessage, false);
// Inline arrow function without origin check
window.addEventListener('message', (evt) => {
console.log('No origin check!');
});
// The following line creates a function using the
// arrow function pattern which will be used as a
// handler by our addEventListener. This
// handler should also have an origin check.
const arrowHandler = (evt) => {
console.log('No origin check!');
};
window.addEventListener("message", arrowHandler, false);
Knowing that we need to match inline
or external
declarations gives us an idea that we will need to rely on the patterns-either operator.
We might be tempted to write four different patterns for this scenario, but since Semgrep is semantic we only need to write two! This means that patterns matching and old function
declaration will also match code using arrow functions
.
It’s worth repeating this:
Since Semgrep is semantic, patterns matching and old
function
declaration will also match code usingarrow functions
Now, which patterns do we consider safe for origin checking?
Safe patterns
If the function specified above contains any of the following patterns we should not flag them:
if (evt.origin == "http://example.com") { ... }
if (evt.origin === "http://example.com") { ... }
if (evt.origin != "http://example.com") { ... }
if (evt.origin !== "http://example.com") { ... }
if (someRegex.test(evt.origin)) { ... }
We have our spec, we know what to flag and more importantly, what not to. Time to start working on our rule!
Writing our rule
Let’s take care of the inline
functions first since they don’t need external context:
Inline functions
patterns:
- pattern: |
window.addEventListener('message', $FUNC, ...)
$FUNC
will capture the inline function into a metavariable. Now we can use the handy metavariable-pattern to exclude our safe patterns.
patterns:
- pattern: |
window.addEventListener('message', $FUNC, ...)
- metavariable-pattern:
metavariable: $FUNC
patterns:
- pattern: |
function($OBJ) { ... }
- pattern-not: |
function($OBJ){ ... if ($OBJ.origin == $X) ... }
- pattern-not: |
function($OBJ){ ... if ($OBJ.origin === $X) ... }
- pattern-not: |
function($OBJ){ ... if ($OBJ.origin != $X) ... }
- pattern-not: |
function($OBJ){ ... if ($OBJ.origin !== $X) ... }
- pattern-not: |
function($OBJ){ ... if ($REGEX.test($OBJ.origin)) ... }
We want to match function declarations (pattern
) that do not contain any of our safe patterns (pattern-not
). Note the ellipsis surrounding our if
statements, they are there to guarantee that we are matching any line within the function, not just the first line.
A few other things are worth noting here:
- Our metavariable is matching on the context captured by
$FUNC
- We need the the
function($OBJ) { ... }
check to exclude external function declarations (our second scenario) - The
$OBJ
metavariable is matching the function parameter and making sure the.origin
is being called on it
I like how explicit the rule currently is, but we will see an alternative to simplify our pattern-not
by the end of this post. Bear with me for now if this is bothering you. :)
Time to match the external functions!
External functions
pattern-either:
- pattern: |
function $FNAME(...) { $CTX }
...
window.addEventListener('message', $FNAME,...)
- pattern: |
$FNAME = (...) => { $CTX }
...
window.addEventListener('message', $FNAME,...)
This is the only situation where I couldn’t use the same pattern to match arrow functions
and normal function
declarations. Let me know if I could simplify this pattern.
Here we are making sure that we can match any of the function declaration types with pattern-either
and capturing the function names during declaration with $FNAME
. Note that we use the same $FNAME
metavariable to ensure that the same name is being used within addEventListener
. The ellipsis (...
) are there to match anything in-between those two lines.
We are also capturing the context of the function body itself with $CTX
and we will use it to exclude our safe patterns. Let’s get to it:
pattern-either:
- pattern: |
function $FNAME(...) { $CTX }
...
window.addEventListener('message', $FNAME,...)
- pattern: |
$FNAME = (...) => { $CTX }
...
window.addEventListener('message', $FNAME,...)
- metavariable-pattern:
metavariable: $CTX
patterns:
- pattern-not: |
... if ($OBJ.origin == $X) ...
- pattern-not: |
... if ($OBJ.origin === $X) ...
- pattern-not: |
... if ($OBJ.origin != $X) ...
- pattern-not: |
... if ($OBJ.origin !== $X) ...
- pattern-not: |
... if ($REGEX.test($OBJ.origin)) ...
Nothing new here, we are ensuring that we ignore matches if they contain any of the if
patterns declared by our pattern-not
operators. These patterns are being checked on the context of $CTX
which is our function body. We are also making sure to match these patterns in any line by surrounding our patterns with ellipsis.
With those two cases being taken care of separately we can move to our final rule declaration.
Final pattern
In order to make this work we need to glue both scenario using pattern-either
. Let’s check our final rule in the playground:
And with this new rule we can match:
- Inline function declarations in any format
- External function declarations in any format
- Create a list with patterns we consider to be safe for origin checking
This rule achieves exactly what we set ourselves to do and we could just stop here, but as I’ve hinted earlier we can simplify things. Let’s investigate!
Alternative
Instead of explicitly checking for each equality operator in our pattern-not
we could have used the deep-expression-operator which basically lets you say:
“I don’t know what exactly happens in here, but I want to enforce that this variable is used somehow”.
Let’s rewrite our rule above using this operator and see how it simplifies our metavariable-pattern
declaration:
Pretty cool, right?!
The downside of this approach is that any reference to $OBJ.origin
in a if statement
will match, even if it doesn’t perform the sort of validation that makes it safe. In this particular case I wanted to be strict and enforce the presence of certain equality operators, but that changes on a case-by-case basis.
As a rule of thumb if you just want to check that something is being called/referenced use the deep-expression-operator
, otherwise stick with explicit pattern matching.
And with this we have reached the end of this blog post. Thanks for reading and let me know if you have any tips or questions by reaching out to me on Twitter or by email and I will be happy to chat about it!