-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
ep:DMLissues related to the DirectML execution providerissues related to the DirectML execution providerquantizationissues related to quantizationissues related to quantization
Description
Describe the issue
Describe the bug
When loading an INT8 quantized ONNX model with DirectML Execution Provider, the session crashes because the QDQ transformer fuses DequantizeLinear → Softmax → QuantizeLinear into QLinearSoftmax for nodes that are assigned to DML EP. However, DML does not have a QLinearSoftmax kernel implementation, causing a nullptr dereference crash.
System information
- OS: Windows
- ONNX Runtime Version: 1.23.2
- Execution Provider: DirectML (DmlExecutionProvider)
To reproduce
- Load an INT8 quantized ONNX model containing a Softmax node with QDQ pattern (DequantizeLinear → Softmax → QuantizeLinear)
- Create inference session with DML EP enabled
- Crash occurs during session initialization
Expected behavior
The model should load successfully. Either:
- The QDQ fusion should not apply to nodes assigned to EPs that don't support the fused QLinear op, OR
- The Softmax should fall back to CPU EP
Root Cause Analysis
The issue is in onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc, function UnaryOpQDQRules():
void UnaryOpQDQRules(SelectorActionRegistry& qdq_selector_action_registry) {
// ...
std::vector<const char*> providers = {kCpuExecutionProvider, kDmlExecutionProvider}; // <-- BUG HERE
std::unique_ptr<NodeSelector> selector = std::make_unique<QDQ::UnarySelector>(providers);
qdq_selector_action_registry.RegisterSelectorAndAction(action_name,
{{"AveragePool", {}},
{"LeakyRelu", {}},
{"GlobalAveragePool", {}},
{"Sigmoid", {}},
{"Softmax", {}}}, // <-- Softmax included
std::move(selector),
std::move(action));
}The problem:
- GetCapability phase: DML EP claims Softmax node (DML has Softmax kernel)
- QDQ transformer runs AFTER GetCapability: sees node is assigned to DML EP (which is in the providers list), so it fuses DQ → Softmax → Q into QLinearSoftmax
- The new QLinearSoftmax node inherits the DML EP assignment
- BuildPartitions phase: DML tries to find kernel for QLinearSoftmax → nullptr → crash
DML kernel availability:
- QLinearSigmoid ✓ (registered in OperatorRegistration.cpp)
- QLinearAveragePool ✓
- QLinearGlobalAveragePool ✓
- QLinearSoftmax ✗ NOT IMPLEMENTED
- QLinearLeakyRelu ✗ NOT IMPLEMENTED
Suggested Fix
Split UnaryOpQDQRules registration based on actual DML kernel availability:
void UnaryOpQDQRules(SelectorActionRegistry& qdq_selector_action_registry) {
// Ops that both CPU and DML support
{
const std::string action_name{"1DQ"};
std::unique_ptr<Action> action = std::make_unique<QDQ::UnaryReplaceWithQLinear>(kMSDomain);
std::vector<const char*> providers = {kCpuExecutionProvider, kDmlExecutionProvider};
std::unique_ptr<NodeSelector> selector = std::make_unique<QDQ::UnarySelector>(providers);
qdq_selector_action_registry.RegisterSelectorAndAction(action_name,
{{"AveragePool", {}},
{"GlobalAveragePool", {}},
{"Sigmoid", {}}},
std::move(selector),
std::move(action));
}
// Ops that only CPU supports (DML has NO QLinearSoftmax or QLinearLeakyRelu)
{
const std::string action_name{"1DQ_CPU"};
std::unique_ptr<Action> action = std::make_unique<QDQ::UnaryReplaceWithQLinear>(kMSDomain);
std::vector<const char*> providers = {kCpuExecutionProvider};
std::unique_ptr<NodeSelector> selector = std::make_unique<QDQ::UnarySelector>(providers);
qdq_selector_action_registry.RegisterSelectorAndAction(action_name,
{{"LeakyRelu", {}},
{"Softmax", {}}},
std::move(selector),
std::move(action));
}
}To reproduce
- Load an INT8 quantized ONNX model containing a Softmax node with QDQ pattern (DequantizeLinear → Softmax → QuantizeLinear)
- Create inference session with DML EP enabled
- Crash occurs during session initialization
Urgency
No response
Platform
Windows
OS Version
Windows 10
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.23.2
ONNX Runtime API
C++
Architecture
X64
Execution Provider
DirectML
Execution Provider Library Version
No response
Metadata
Metadata
Assignees
Labels
ep:DMLissues related to the DirectML execution providerissues related to the DirectML execution providerquantizationissues related to quantizationissues related to quantization