Skip to content

QDQ transformer incorrectly fuses Softmax/LeakyRelu to QLinear ops for DML EP nodes, causing crash #26882

@linzj

Description

@linzj

Describe the issue

Describe the bug

When loading an INT8 quantized ONNX model with DirectML Execution Provider, the session crashes because the QDQ transformer fuses DequantizeLinear → Softmax → QuantizeLinear into QLinearSoftmax for nodes that are assigned to DML EP. However, DML does not have a QLinearSoftmax kernel implementation, causing a nullptr dereference crash.

System information

  • OS: Windows
  • ONNX Runtime Version: 1.23.2
  • Execution Provider: DirectML (DmlExecutionProvider)

To reproduce

  1. Load an INT8 quantized ONNX model containing a Softmax node with QDQ pattern (DequantizeLinear → Softmax → QuantizeLinear)
  2. Create inference session with DML EP enabled
  3. Crash occurs during session initialization

Expected behavior

The model should load successfully. Either:

  1. The QDQ fusion should not apply to nodes assigned to EPs that don't support the fused QLinear op, OR
  2. The Softmax should fall back to CPU EP

Root Cause Analysis

The issue is in onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc, function UnaryOpQDQRules():

  void UnaryOpQDQRules(SelectorActionRegistry& qdq_selector_action_registry) {
    // ...
    std::vector<const char*> providers = {kCpuExecutionProvider, kDmlExecutionProvider};  // <-- BUG HERE
    std::unique_ptr<NodeSelector> selector = std::make_unique<QDQ::UnarySelector>(providers);
    qdq_selector_action_registry.RegisterSelectorAndAction(action_name,
                                                           {{"AveragePool", {}},
                                                            {"LeakyRelu", {}},
                                                            {"GlobalAveragePool", {}},
                                                            {"Sigmoid", {}},
                                                            {"Softmax", {}}},  // <-- Softmax included
                                                           std::move(selector),
                                                           std::move(action));
  }

The problem:

  1. GetCapability phase: DML EP claims Softmax node (DML has Softmax kernel)
  2. QDQ transformer runs AFTER GetCapability: sees node is assigned to DML EP (which is in the providers list), so it fuses DQ → Softmax → Q into QLinearSoftmax
  3. The new QLinearSoftmax node inherits the DML EP assignment
  4. BuildPartitions phase: DML tries to find kernel for QLinearSoftmax → nullptr → crash

DML kernel availability:

  • QLinearSigmoid ✓ (registered in OperatorRegistration.cpp)
  • QLinearAveragePool ✓
  • QLinearGlobalAveragePool ✓
  • QLinearSoftmax ✗ NOT IMPLEMENTED
  • QLinearLeakyRelu ✗ NOT IMPLEMENTED

Suggested Fix

Split UnaryOpQDQRules registration based on actual DML kernel availability:

  void UnaryOpQDQRules(SelectorActionRegistry& qdq_selector_action_registry) {
    // Ops that both CPU and DML support
    {
      const std::string action_name{"1DQ"};
      std::unique_ptr<Action> action = std::make_unique<QDQ::UnaryReplaceWithQLinear>(kMSDomain);
      std::vector<const char*> providers = {kCpuExecutionProvider, kDmlExecutionProvider};
      std::unique_ptr<NodeSelector> selector = std::make_unique<QDQ::UnarySelector>(providers);
      qdq_selector_action_registry.RegisterSelectorAndAction(action_name,
                                                             {{"AveragePool", {}},
                                                              {"GlobalAveragePool", {}},
                                                              {"Sigmoid", {}}},
                                                             std::move(selector),
                                                             std::move(action));
    }

    // Ops that only CPU supports (DML has NO QLinearSoftmax or QLinearLeakyRelu)
    {
      const std::string action_name{"1DQ_CPU"};
      std::unique_ptr<Action> action = std::make_unique<QDQ::UnaryReplaceWithQLinear>(kMSDomain);
      std::vector<const char*> providers = {kCpuExecutionProvider};
      std::unique_ptr<NodeSelector> selector = std::make_unique<QDQ::UnarySelector>(providers);
      qdq_selector_action_registry.RegisterSelectorAndAction(action_name,
                                                             {{"LeakyRelu", {}},
                                                              {"Softmax", {}}},
                                                             std::move(selector),
                                                             std::move(action));
    }
  }

To reproduce

  1. Load an INT8 quantized ONNX model containing a Softmax node with QDQ pattern (DequantizeLinear → Softmax → QuantizeLinear)
  2. Create inference session with DML EP enabled
  3. Crash occurs during session initialization

Urgency

No response

Platform

Windows

OS Version

Windows 10

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.23.2

ONNX Runtime API

C++

Architecture

X64

Execution Provider

DirectML

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:DMLissues related to the DirectML execution providerquantizationissues related to quantization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions