步骤 1 - 数据流与污点追踪分析

1.1 Source

污点数据的流源应该是到达约束验证的bean属性。在代码中，体现为ConstraintValidator.isValid的第一个参数。

预期有6个结果

要求：

确保只捕获了ConstraintValidator.isValid接口定义的方法的实现
只获取和源代码相关的结果
对无法控制的Bean属性如程序配置文件等剔除，只留下用户控制的流源

首先先定义这个接口以及方法实现，方便后续约束：

import java
import semmle.code.java.dataflow.DataFlow
 
class TypeConstraintValidator extends Interface {
    TypeConstraintValidator() {
        this.hasQualifiedName("javax.validation", "ConstraintValidator")
    }
 
    Method getIsValidMehotd() {
        result.getDeclaringType() = this and
        result.hasName("isValid")
    }
}
 
class ConstraintValidatorIsValidMethod extends Method {
    ConstraintValidatorIsValidMethod() {
        this.overridesOrInstantiates*(any(TypeConstraintValidator t).getIsValidMehotd())
    }
}

注意这里类型直接限定死了接口类，然后通过Method限定实现了该接口的方法；这样我们完成了第一个要求。

接下来注意第二个要求，只能获取和源代码相关的结果；这里可以直接通过fromSource()来判断了，那么先忽略掉最后一个要求就是：

class BeanValidationSource extends DataFlow::Node {
    BeanValidationSource() {
        exists(ConstraintValidatorIsValidMethod m | 
            this.asParameter() = m.getParameter(0) and 
            m.fromSource()
        )
    }
}

快速执行，确实找到了6个参数：

800

附加条件

接下来我们考虑一下如何约束他成为用户的输入，而不是其他来源。

首先用户的输入要流向这里的source，那么他应该是一个RemoteFlowSource，但是RemoteFlowSource显然不能直接作为这里的约束，因为用户的输入被验证时，只需要添加注解@XXX即可；这在codeql中是无法找到这样的流的；所以我们可以做这样的约束：

流从用户输入到任意字段
这个字段被注解或者这个字段的所属类被注解
注解通过validatedBy指定验证器
验证器的第一个参数为sink

首先我们要先知道Validator是怎么实现的，如下：

public class SchedulingConstraintSetValidator implements ConstraintValidator<SchedulingConstraintSetValidator.SchedulingConstraintSet, Container> {
 
    @Target({TYPE})
    @Retention(RetentionPolicy.RUNTIME)
    @Documented
    @Constraint(validatedBy = {SchedulingConstraintSetValidator.class})
    public @interface SchedulingConstraintSet {
 
        String message() default "{SoftAndHardConstraint.message}";
 
        Class<?>[] groups() default {};
 
        Class<? extends Payload>[] payload() default {};
    }
 
    @Override
    public void initialize(SchedulingConstraintSet constraintAnnotation) {
    }
 
    @Override
    public boolean isValid(Container container, ConstraintValidatorContext context) {
        if (container == null) {
            return true;
        }
        Set<String> common = new HashSet<>(container.getSoftConstraints().keySet());
        common.retainAll(container.getHardConstraints().keySet());
 
        if (common.isEmpty()) {
            return true;
        }
 
        context.buildConstraintViolationWithTemplate(
                "Soft and hard constraints not unique. Shared constraints: " + common
        ).addConstraintViolation().disableDefaultConstraintViolation();
        return false;
    }
}

一个Validator需要有一个Annotation和一个对应的实现类，Annotation可以单独写，也可以像上面一样写在实现类的内部；从上面的代码可以明确一点，即使用@SchedulingConstraintSetValidator.SchedulingConstraintSet注解的字段或类，将通过SchedulingConstraintSetValidator进行验证。

也就是说，我们需要知道哪个字段或类被这个注解标记了，并且这个字段或类从用户输入可以流入。

验证器类型首先要满足：

这个注解被@Constraint注解了
@Constraint注解的validatedBy是验证类，验证类必须属于为ConstraintValidator

那么我们可以这么写：

module bonus {
    class TypeConstraint extends Interface {
        TypeConstraint() {
            this.hasQualifiedName("javax.validation", "Constraint")
        }
    }
 
    class ConstraintAnnotation extends Annotation {
        ConstraintAnnotation() {
            this.getType() instanceof TypeConstraint
        }
 
        predicate isValidatedBy(RefType validator) {
            this.getValue("validatedBy").(ArrayInit).getAnInit().(TypeLiteral).getTypeName().getType() = validator
        }
    }
}

这里唯一需要解释的就是匹配validator的谓词，先看AST：

400

我们可以看到对于这里的@Constraint，先转换为ArrayInit，然后匹配任意的Init，再转换为TypeLiteral，接着getTypeName其实返回的是一个Expr，将其再getType才获取到了最终的Type；然后比对。

接下来考虑一下被一个被注解对象是否经过验证器验证，这里注意ele.getAnAnotation().getType().getAnAnotation()，是因为被注解的对象应用的应该是验证器的注解，而验证器的注解又被Constraint注解，所以这里是这么写的：

    predicate validatedConstraint(Annotatable ele, ConstraintAnnotation c, RefType validatorType) {
        ele.getAnAnnotation().getType().getAnAnnotation() = c and
        c.isValidatedBy(validatorType)
    }

那么考虑一下如何跟踪流：

    module UserInputToValidatedFieldConfig implements DataFlow::ConfigSig {
        predicate isSource(DataFlow::Node source) {
            source instanceof RemoteFlowSource
        }
 
        predicate isSink(DataFlow::Node sink) {
            sink.asExpr() = any(Field f).getAnAssignedValue()
        }
    }
    module UserInputToValidatedFieldFlow = TaintTracking::Global<UserInputToValidatedFieldConfig>;

这样我们就获取了从用户输入到字段赋值的流，接下来组合一下：

    predicate validatesUserControlledBeanProperty(ConstraintValidatorIsValidMethod method, Field f, RefType validatorType, RemoteFlowSource source) {
        method.getDeclaringType() = validatorType and
        validatedConstraint(f, _, validatorType) and 
        UserInputToValidatedFieldFlow::flow(source, DataFlow::exprNode(f.getAnAssignedValue()))
    }

但是在这里还有一个问题，即如果这个注解是给类的，那么其字段也可能被验证，所以还需要改成这样：

    predicate validatesUserControlledBeanProperty(ConstraintValidatorIsValidMethod method, Field f, RefType validatorType, RemoteFlowSource source) {
        method.getDeclaringType() = validatorType and
        (validatedConstraint(f, _, validatorType) or validatedConstraint(f.getDeclaringType(), _, validatorType)) and 
        UserInputToValidatedFieldFlow::flow(source, DataFlow::exprNode(f.getAnAssignedValue()))
    }

到这里其实就差不多了，在上面的source再加一下就好了：

class BeanValidationSource extends DataFlow::Node {
    BeanValidationSource() {
        exists(ConstraintValidatorIsValidMethod m | 
            this.asParameter() = m.getParameter(0) and 
            m.fromSource() 
            and bonus::validatesUserControlledBeanProperty(m, _, _, _)
        )
    }
}

在增加附加条件后筛选出了4个输入：

600

1.2 Sink

Sink是调用ConstraintValidatorContext.buildConstraintViolationWithTemplate的第一个参数。

预期有5个结果

这个参考上面的，也很简单，没啥说的：

class TypeConstraintValidatorContext extends Interface {
    TypeConstraintValidatorContext() {
        this.hasQualifiedName("javax.validation", "ConstraintValidatorContext")
    }
}
 
class BuildConstraintViolationWithTemplateMethod extends Method {
    BuildConstraintViolationWithTemplateMethod() {
        this.getDeclaringType().getASupertype*() instanceof TypeConstraintValidatorContext and
        this.hasName("buildConstraintViolationWithTemplate")
    }
}
 
class Sink extends DataFlow::Node {
    Sink() {
        exists(BuildConstraintViolationWithTemplateMethod m, MethodCall c | 
            c.getMethod() = m and
            this.asExpr() = c.getArgument(0)
        )
    }
}

1.3 污点追踪配置

在开始前官方建议我们检查Source以及Sink都与SchedulingConstraintSetValidator.java中的问题是匹配的；不过在我们增加附加条件后，Source已经不包含它了，实际上这是由于Source中我们应用了流，通过流判断用户输入，当流在某个地方断掉时，Source就不会再包含了，我们需要自己去设置AdditionalStep来将其连上；在这里我们先去掉附加条件的判定，然后进行配置。

module ValidatorVul implements DataFlow::ConfigSig {
    predicate isSource(DataFlow::Node node) {
        node instanceof BeanValidationSource
    }
 
    predicate isSink(DataFlow::Node node) {
        node instanceof Sink
    }
}
 
module Flow = TaintTracking::Global<ValidatorVul>;
 
 
from Flow::PathNode source, Flow::PathNode sink
where Flow::flowPath(source, sink)
select sink, source, sink, "Custom constraint error message contains unsanitized user data"

当然了，这里返回的结果会是空的，这实际上就是污点追踪的流断掉了。

这个结果并不出奇，官方实际在这给出的也是0个。

1.4 修复流

按照之前的学习，使用部分流来完成进行调试：

int explorationLimit() { result = 20 }
module PartialFlow = Flow::FlowExplorationFwd<explorationLimit/0>;
import PartialFlow::PartialPathGraph
 
from PartialFlow::PartialPathNode source, PartialFlow::PartialPathNode node
where PartialFlow::partialFlow(source, node, _)
select node.getNode(), source, node, "From " + source.toString() + " to " + node.toString()

注意为查询添加元数据@kind path-problem，这样才能在alerts里查看路径。

这里输出的结果非常多，需要自己进行分析。

1.5 & 1.6 确认丢失污点传播路径

从上面的部分流分析中，我们从题目得到提示，可以再稍微精确一点；官方题目声称CodeQL不会getter来传播污点，譬如container.getSoftConstraints()和container.getHardConstraints()，我们从代码中已经知道了这其实是存在可能的污点传播的；因此我们来着重分析它。

通过修改查询来精确化的得到结果：

from PartialFlow::PartialPathNode source, PartialFlow::PartialPathNode node, Parameter p, int dist
where PartialFlow::partialFlow(source, node, dist) and
    p = source.getNode().asParameter() and
    p.getName() = "container"
select node.getNode(), source, node, "From " + source.toString() + " to " + node.toString() + " dist = " + dist

注意这里筛选了所有名为container的参数作为source，这次我们得到的结果就限定在了Container.java和SchedulingConstraintSetValidator之中了。

官方说会从getter断开，但是我现在困惑的来了：

600

那这是什么？象征性地为他加一个AdditionalStep：

class GetterTaintStep extends TaintTracking::AdditionalTaintStep {
    override predicate step(DataFlow::Node n1, DataFlow::Node n2) {
        exists(MethodCall mc | 
            (
                mc.getMethod() instanceof GetterMethod or mc.getMethod().getName().matches("get%")
            ) and
            n1.asExpr() = mc.getQualifier() and
            n2.asExpr() = mc
        )
    }
}

此时我们再次运行，会发现结果没有任何变化，与我上面的部分流判断非常一致，因为污点从container透传到了getSoftConstraints的返回值上，所以我初步判断是CodeQL在迭代更新中已经能自动识别这种getter了？

那么先不管这个，我们可以看到在下方的传播路径，从Set中的元素到HashSet中的元素，那么这里我们发现一个断点，即污点没有传播到keySet的返回值上，而是在其返回值的内部元素上。

给它加上，再次运行：

class KeysetTaintStep extends TaintTracking::AdditionalTaintStep {
    override predicate step(DataFlow::Node n1, DataFlow::Node n2) {
        exists(MethodCall mc | 
            mc.getMethod().(MapMethod).getName() = "keySet" and
            n1.asExpr() = mc.getQualifier() and
            n2.asExpr() = mc
        )
    }
}

此时我们能够看到已经传播到了keySet的返回值上：

600

1.7 为构造函数添加污点路径

从上面的结果我们能进一步分析，在new HashSet<String>，污点只在其内部的元素传播，从而使得在最终的方法参数处common被视为未污染的，也就是说我们需要将HashSet的构造方法也加入路径中：

 
class HashSetConstructorCall extends Call {
    HashSetConstructorCall() {
        this.(ConstructorCall).getConstructedType().getSourceDeclaration().hasQualifiedName("java.util", "HashSet")
    }
}
 
class HashSetTaintStep extends TaintTracking::AdditionalTaintStep {
    override predicate step(DataFlow::Node n1, DataFlow::Node n2) {
        exists(HashSetConstructorCall mcc | 
            n1.asExpr() = mcc.getAnArgument() and
            n2.asExpr() = mcc
        )
    }
}

再次执行，我们能发现已经到达Sink了：

600

1.8 完整查询

我们退回之前的污点分析查询，现在应该能得到一个结果了：

600

答案中retain应该是不需要的了。

2 解决同样的问题

在SchedulingConstraintValidator.java中还有同样的问题，让我们来解决它。

构建查询：

int explorationLimit() { result = 5000 }
module PartialFlow = Flow::FlowExplorationFwd<explorationLimit/0>;
import PartialFlow::PartialPathGraph
 
from PartialFlow::PartialPathNode source, PartialFlow::PartialPathNode node, int dist
where PartialFlow::partialFlow(source, node, dist) and
    source.getNode().getLocation().getFile().getBaseName() = "SchedulingConstraintValidator.java"
select node.getNode(), source, node, "From " + source.toString() + " to " + node.toString() + " dist = " + dist

结果：

600

很显然第一个断流的地方在stream方法的调用，未透传至其返回值，那么：

class StreamTaintStep extends TaintTracking::AdditionalTaintStep {
    override predicate step(DataFlow::Node n1, DataFlow::Node n2) {
        exists(MethodCall mc | 
            mc.getMethod().(CollectionMethod).getName() = "stream" and
            n1.asExpr() = mc.getQualifier() and
            n2.asExpr() = mc
        )
    }
}

其实下面的map和collect都是需要的：

class StreamMethod extends Method {
    StreamMethod() {
        this.getDeclaringType().getASourceSupertype*().hasQualifiedName("java.util.stream", "Stream")
    }
}
 
class MapTaintStep extends TaintTracking::AdditionalTaintStep {
    override predicate step(DataFlow::Node n1, DataFlow::Node n2) {
        exists(MethodCall mc | 
            mc.getMethod().(StreamMethod).getName() = "map" and
            n1.asExpr() = mc.getQualifier() and
            n2.asExpr() = mc
        )
    }
}
 
class CollectTaintStep extends TaintTracking::AdditionalTaintStep {
    override predicate step(DataFlow::Node n1, DataFlow::Node n2) {
        exists(MethodCall mc | 
            mc.getMethod().(StreamMethod).getName() = "collect" and
            n1.asExpr() = mc.getQualifier() and
            n2.asExpr() = mc
        )
    }
}

到达sink：

600

当然这里我们看参考答案的话，会发现和参考答案有一点点差异，不过从思路上来说都是一样的，只是参考答案将一些东西定义为了Class：

/* Step 2 */
/** A call to the method `stream` declared in a collection type. */
class CollectionStreamCall extends MethodAccess {
  CollectionStreamCall() { this.getMethod().(CollectionMethod).getName() = "stream" }
}
 
/** Track taint from `x` to `x.stream()` where `x` is a collection. */
class CollectionStreamTaintStep extends TaintTracking::AdditionalTaintStep {
  override predicate step(DataFlow::Node n1, DataFlow::Node n2) {
    exists(CollectionStreamCall call |
      n1.asExpr() = call.getQualifier() and
      n2.asExpr() = call
    )
  }
}
 
/** The interface `java.util.stream.Stream`. */
class TypeStream extends Interface {
  TypeStream() { this.hasQualifiedName("java.util.stream", "Stream") }
}
 
/** A method declared in a stream type, that is, a subtype of `java.util.stream.Stream`. */
class StreamMethod extends Method {
  StreamMethod() { this.getDeclaringType().getASourceSupertype+() instanceof TypeStream }
}
 
/** A call to the method `map` declared in a stream type. */
class StreamMapCall extends MethodAccess {
  StreamMapCall() { this.getMethod().(StreamMethod).getName() = "map" }
}
 
/** Track taint from `stream` to `stream.map(lambda)`. */
class StreamMapTaintStep extends TaintTracking::AdditionalTaintStep {
  override predicate step(DataFlow::Node n1, DataFlow::Node n2) {
    exists(StreamMapCall call |
      n1.asExpr() = call.getQualifier() and
      n2.asExpr() = call
    )
  }
}
 
/** A call to the method `collect` declared in a stream type. */
class StreamCollectCall extends MethodAccess {
  StreamCollectCall() { this.getMethod().(StreamMethod).getName() = "collect" }
}
 
/** Track taint from `stream` to `stream.collect()`. */
class StreamCollectTaintStep extends TaintTracking::AdditionalTaintStep {
  override predicate step(DataFlow::Node n1, DataFlow::Node n2) {
    exists(StreamCollectCall call |
      n1.asExpr() = call.getQualifier() and
      n2.asExpr() = call
    )
  }
}

Evalexp's Digital Garden

探索

Titus分析