Skip to content

Conversation

@wujingyue
Copy link
Collaborator

@wujingyue wujingyue commented Jan 2, 2026

For #5308

Summary

  • make GetCurrentStream produce an explicit stream output and wire it through passes
  • treat SetCurrentStream as consuming a stream input and improve its printing

Testing

  • not run

@wujingyue wujingyue requested a review from mdavis36 January 2, 2026 07:58
@github-actions
Copy link

github-actions bot commented Jan 2, 2026

Description

  • Refactor GetCurrentStream to explicitly output a stream parameter instead of creating it internally

  • Update SetCurrentStream to consume stream as input rather than attribute

  • Improve IR node printing with cleaner inline format for SetCurrentStream

  • Reduce multidevice communication test sizes to decrease runtime

Changes walkthrough

Relevant files
Enhancement
ir.cpp
Update IR node implementations and printing                           

csrc/host_ir/ir.cpp

  • Remove LaunchKernel::toInlineString() method
  • Update SetCurrentStream constructor to take stream as input and
    improve toString() format
  • Refactor GetCurrentStream to accept stream parameter and output it
    instead of creating internally
  • Remove toInlineString() and sameAs() methods from SetCurrentStream
  • +7/-22   
    ir.h
    Update IR node interface definitions                                         

    csrc/host_ir/ir.h

  • Remove LaunchKernel::toInlineString() declaration
  • Update SetCurrentStream to access stream via inputs() instead of
    attributes()
  • Modify GetCurrentStream constructor to accept stream parameter and
    access via outputs()
  • Remove toInlineString() and sameAs() method declarations from
    SetCurrentStream
  • +3/-7     
    stream_parallel_type.cpp
    Update stream management pass for new GetCurrentStream interface

    csrc/host_ir/pass/stream_parallel_type.cpp

  • Create Stream object externally before passing to GetCurrentStream
    constructor
  • Update GetCurrentStream creation to use new constructor signature with
    stream parameter
  • +3/-2     
    Tests
    test_multidevice_lower_communication_cuda.cpp
    Reduce multidevice communication test sizes                           

    tests/cpp/test_multidevice_lower_communication_cuda.cpp

  • Remove 128MB and 256MB test cases from parameter combinations
  • Keep only 2MB, 8MB, and 32MB test sizes to reduce runtime
  • +1/-3     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Stream Creation Logic

    The code creates a new Stream object and immediately passes it to GetCurrentStream, which seems counterintuitive. GetCurrentStream should retrieve the current/active stream, not take an arbitrary stream as input. This suggests the semantics may be incorrect or the variable naming is misleading.

    hir::Stream* original_stream = IrBuilder::create<hir::Stream>();
    auto* get_current_stream =
        IrBuilder::create<hir::GetCurrentStream>(original_stream);
    new_top_level_exprs.push_back(get_current_stream);
    Constructor Interface Change

    GetCurrentStream constructor now requires a Stream* parameter, changing the interface from the previous parameterless constructor. This is a breaking change that requires verification that all call sites have been updated accordingly.

    GetCurrentStream(IrBuilderPasskey passkey, Stream* stream);
    Removed Methods Impact

    SetCurrentStream had toInlineString() and sameAs() methods removed. These methods may be used by serialization, comparison, or debugging systems. Their removal should be verified to not break existing functionality.

    std::string SetCurrentStream::toString(int indent_size) const {
      std::stringstream ss;
      indent(ss, indent_size) << "SetCurrentStream(" << stream()->toInlineString()
                              << ")" << std::endl;
      return ss.str();
    }
    
    GetCurrentStream::GetCurrentStream(IrBuilderPasskey passkey, Stream* stream)
        : Expr(passkey, {}, {stream}, {}) {
      NVF_ERROR(passkey.ir_container_ != nullptr);
      NVF_ERROR(passkey.ir_container_->isA<HostIrContainer>());
    }
    
    NVFUSER_DEFINE_CLONE_AND_CREATE(GetCurrentStream)
    
    std::string GetCurrentStream::toString(int indent_size) const {
      std::stringstream ss;
      indent(ss, indent_size) << stream()->toInlineString()
                              << " = GetCurrentStream()" << std::endl;
      return ss.str();
    }

    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Unrelated. I'll do it in another PR

    @greptile-apps
    Copy link
    Contributor

    greptile-apps bot commented Jan 2, 2026

    Greptile Summary

    This PR refines the host stream IR nodes to properly represent stream semantics in the intermediate representation. The key changes make GetCurrentStream produce an explicit stream output (via outputs list) and SetCurrentStream consume a stream input (via inputs list), replacing their previous attribute-based storage. This enables proper data flow tracking through IR passes. Additionally, unused stub methods (toInlineString, sameAs) are removed and string representations are improved for better readability. The PR also trims multidevice test sizes to reduce runtime without sacrificing coverage.

    Changes:

    • SetCurrentStream: Now takes stream as input (not attribute), removed toInlineString() and sameAs() stubs, improved string formatting
    • GetCurrentStream: Now takes stream parameter in constructor and outputs it (not stored as attribute)
    • Removed LaunchKernel::toInlineString() dead method
    • Updated stream_parallel_type.cpp to pass stream explicitly when creating GetCurrentStream
    • Reduced test parameters from 5 sizes to 3 (removed 128 MB and 256 MB)

    Confidence Score: 5/5

    • This PR is safe to merge with no concerns. All changes are well-scoped refactoring of stream IR semantics with proper backward compatibility maintained.
    • Score of 5 reflects: (1) Clean refactoring that properly aligns stream operations with SSA-like semantics; (2) All usages updated correctly - GetCurrentStream instantiation in stream_parallel_type.cpp matches new constructor signature; (3) No breaking changes to public API - stream() accessor methods remain compatible; (4) Removed methods (toInlineString, sameAs) were dead code with explicit TODO comments indicating they were stubs; (5) LaunchKernel::toInlineString() removal is safe - no callsites found in the codebase; (6) Test size trimming is a reasonable optimization with no functional impact; (7) String representations improved for better IR readability.
    • No files require special attention

    Important Files Changed

    Filename Overview
    csrc/host_ir/ir.h Refactored stream IR nodes to use proper input/output semantics. SetCurrentStream now consumes stream as input (not attribute), GetCurrentStream produces stream as output (not attribute), and removed stub toInlineString/sameAs methods. Changes align stream operations with SSA-like IR semantics.
    csrc/host_ir/ir.cpp Implemented refactored stream node constructors and string representations. SetCurrentStream now uses stream as input and has improved printing format. GetCurrentStream takes stream as constructor parameter and outputs it. Removed unused LaunchKernel::toInlineString method. All changes maintain backward compatibility with accessor methods.
    csrc/host_ir/pass/stream_parallel_type.cpp Updated GetCurrentStream instantiation to match new constructor signature. Stream is now created separately before GetCurrentStream construction and passed as a parameter, enabling explicit stream output wiring through the IR pass.
    tests/cpp/test_multidevice_lower_communication_cuda.cpp Trimmed test case sizes by removing 128 MB and 256 MB test parameters, keeping only 2 MB, 8 MB, and 32 MB sizes. This reduces test runtime without affecting test coverage of different size ranges.

    Sequence Diagram

    sequenceDiagram
        participant IR as IR Builder
        participant GCS as GetCurrentStream
        participant SCS as SetCurrentStream
        participant Pass as Stream Parallel Pass
        
        Pass->>IR: Create Stream object
        IR-->>Pass: stream instance
        
        Pass->>IR: Create GetCurrentStream(stream)
        IR->>GCS: stream as output
        GCS-->>Pass: GetCurrentStream expr
        Pass->>Pass: Add to top_level_exprs
        
        Pass->>IR: Create SetCurrentStream(new_stream)
        IR->>SCS: stream as input
        SCS-->>Pass: SetCurrentStream expr
        Pass->>Pass: Add to loop body
        
        Note over GCS,SCS: Stream flows through IR as input/output<br/>enabling proper data dependency tracking
    
    Loading

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants