R to @WholeMarsBlog: We need to predict what pedestrians will do based on their behavior, including limb angle & direction of sight.
FSD currently sees all pedestrians as cuboids, so is overly cautious.
Also, diffusion seems to be more compute-efficient than transformers for vision.