Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Dec 30, 2025

⚡️ This pull request contains optimizations for PR #10820

If you approve this dependent PR, these changes will be merged into the original PR branch cz/add-logs-feature.

This PR will be automatically closed if the original PR is merged.


📄 14% (0.14x) speedup for TransactionLogsResponse.serialize_inputs in src/backend/base/langflow/services/database/models/transactions/model.py

⏱️ Runtime : 4.58 milliseconds 4.01 milliseconds (best of 51 runs)

📝 Explanation and details

The optimized code achieves a 14% speedup (from 4.58ms to 4.01ms) through strategic short-circuit optimizations in frequently-called serialization paths:

Key Optimizations

1. Fast-path for primitives in serialize()

The optimized version adds an early exit for common primitive types before expensive dispatcher logic:

if obj is None or isinstance(obj, (str, int, float, bool)):
    return obj

This avoids calling _serialize_dispatcher() for the most common data types. Since serialization often processes nested dictionaries containing many primitive values, this check eliminates significant overhead.

2. Reordered checks in sanitize_data()

The original checks if data is None first, then if not isinstance(data, dict). The optimized version reverses this:

if not isinstance(data, dict):
    return data
if data is None:
    return None

Since None is a valid non-dict type that would be caught by the isinstance check anyway, checking for dict-ness first is more efficient. This also adds an early return for empty dicts (if not data: return {}), avoiding unnecessary calls to _sanitize_dict({}).

Why This Matters

Based on the test suite, the code frequently serializes:

  • Large nested structures with many primitive values (strings, ints, bools)
  • Lists of dictionaries containing both sensitive and non-sensitive data
  • Mixed-type data with primitives alongside complex objects

The primitive fast-path optimization is particularly effective here because:

  • Every nested dict/list traversal hits multiple primitive values
  • Tests like test_serialize_inputs_large_list_of_sensitive_dicts (100 items) and test_serialize_inputs_performance_large (500 users) show the multiplicative benefit of avoiding dispatcher overhead on each primitive

The sanitize_data() optimization helps in edge cases with empty dicts or None values, providing small but consistent gains across the test suite.

Impact Assessment

The 14% speedup compounds when serialize_inputs() is called repeatedly in transaction logging workflows. Since this appears to be a database model for transaction logs, these functions likely execute in high-volume scenarios where even microsecond improvements per call translate to meaningful latency reductions at scale.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 44 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import re
# Patch the imported get_settings_service in the serialize module
import sys
import types
from datetime import datetime, timedelta, timezone
from typing import Any
from uuid import UUID, uuid4

# imports
import pytest
from langflow.serialization.serialization import (get_max_items_length,
                                                  get_max_text_length,
                                                  serialize)
from langflow.services.database.models.transactions.model import \
    TransactionLogsResponse

# ------------------- UNIT TESTS BELOW -------------------

@pytest.fixture
def basic_instance():
    return TransactionLogsResponse(
        id=uuid4(),
        timestamp=datetime.now(timezone.utc),
        vertex_id="vertex123",
        target_id="target456",
        inputs=None,
        outputs=None,
        status="success"
    )

# 1. BASIC TEST CASES



def test_serialize_inputs_no_sensitive_keys(basic_instance):
    # Should serialize a dict with no sensitive keys unchanged
    data = {"foo": "bar", "number": 123}
    codeflash_output = basic_instance.serialize_inputs(data)

def test_serialize_inputs_simple_sensitive_key(basic_instance):
    # Should mask sensitive keys (e.g., 'password')
    data = {"password": "mysecretpassword"}
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output

def test_serialize_inputs_short_sensitive_value(basic_instance):
    # Should fully mask short sensitive values (<12 chars)
    data = {"token": "short"}
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output



def test_serialize_inputs_list_of_dicts_with_sensitive(basic_instance):
    # Should mask sensitive keys in list of dicts
    data = {
        "users": [
            {"username": "bob", "password": "hunter2"},
            {"username": "alice", "password": "supersecretpassword"}
        ]
    }
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output
    pw = result["users"][1]["password"]

def test_serialize_inputs_mixed_types(basic_instance):
    # Should handle ints, floats, bools, None, etc.
    data = {
        "int": 1,
        "float": 2.5,
        "bool": True,
        "none": None,
        "list": [1, 2, 3]
    }
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output

# 2. EDGE TEST CASES

def test_serialize_inputs_sensitive_key_variants(basic_instance):
    # Should match keys like 'api_key', 'API_KEY', 'api-key', 'apiKey', 'my_api_key'
    data = {
        "API_KEY": "ABCDEF1234567890",
        "api-key": "1234567890ABCDEF",
        "my_api_key": "ZZZZZZZZZZZZZZZZ",
        "notakey": "should not mask"
    }
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output
    # All but 'notakey' should be masked
    for k in ["API_KEY", "api-key", "my_api_key"]:
        v = result[k]


def test_serialize_inputs_sensitive_key_with_empty_value(basic_instance):
    # Should mask empty sensitive values as "***REDACTED***"
    data = {"password": ""}
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output

def test_serialize_inputs_sensitive_key_with_none_value(basic_instance):
    # Should mask None sensitive values as "***REDACTED***"
    data = {"token": None}
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output

def test_serialize_inputs_sensitive_key_with_non_str_value(basic_instance):
    # Should mask non-string sensitive values as "***REDACTED***"
    data = {"api_key": 123456789}
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output

def test_serialize_inputs_truncates_long_strings(basic_instance):
    # Should truncate long strings according to max_text_length
    data = {"foo": "x" * 100}
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output

def test_serialize_inputs_truncates_long_lists(basic_instance):
    # Should truncate lists according to max_items_length
    data = {"lst": list(range(50))}
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output

def test_serialize_inputs_excludes_code_key_nested(basic_instance):
    # Should exclude 'code' key even if nested
    data = {"outer": {"code": "should not appear", "foo": "bar"}}
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output

# 3. LARGE SCALE TEST CASES


def test_serialize_inputs_large_list_of_sensitive_dicts(basic_instance):
    # Should mask all sensitive keys and truncate list
    data = {"users": [{"password": f"pass{i:02d}secret"} for i in range(100)]}
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output
    for user_dict in result["users"]:
        pw = user_dict["password"]

def test_serialize_inputs_large_nested_structure(basic_instance):
    # Deeply nested structure with sensitive keys
    data = {
        "level1": [
            {"level2": {"api_key": f"KEY{i:04d}SECRET"}} for i in range(20)
        ]
    }
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output
    for d in result["level1"]:
        masked = d["level2"]["api_key"]

def test_serialize_inputs_performance_large(basic_instance):
    # This is not a timing test, but ensures no crash and correct truncation/masking for near-limit size
    data = {
        f"user{i}": {
            "username": f"user{i}",
            "password": f"password{i}withlongtail"
        }
        for i in range(500)
    }
    codeflash_output = basic_instance.serialize_inputs(data); result = codeflash_output
    for v in result.values():
        pw = v["password"]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re
# Patch the imported get_settings_service in the module where serialize is defined
import sys
import types
from datetime import datetime, timedelta, timezone
from typing import Any
from uuid import UUID, uuid4

# imports
import pytest
from langflow.services.database.models.transactions.model import \
    TransactionLogsResponse

# --- Unit Tests ---

# Basic test cases

To edit these changes git checkout codeflash/optimize-pr10820-2025-12-30T19.18.44 and push.

Codeflash

Cristhianzl and others added 20 commits December 1, 2025 16:50
The optimized code achieves a **14% speedup** (from 4.58ms to 4.01ms) through strategic short-circuit optimizations in frequently-called serialization paths:

## Key Optimizations

### 1. **Fast-path for primitives in `serialize()`**
The optimized version adds an early exit for common primitive types before expensive dispatcher logic:
```python
if obj is None or isinstance(obj, (str, int, float, bool)):
    return obj
```
This avoids calling `_serialize_dispatcher()` for the most common data types. Since serialization often processes nested dictionaries containing many primitive values, this check eliminates significant overhead.

### 2. **Reordered checks in `sanitize_data()`**
The original checks `if data is None` first, then `if not isinstance(data, dict)`. The optimized version reverses this:
```python
if not isinstance(data, dict):
    return data
if data is None:
    return None
```
Since `None` is a valid non-dict type that would be caught by the `isinstance` check anyway, checking for dict-ness first is more efficient. This also adds an early return for empty dicts (`if not data: return {}`), avoiding unnecessary calls to `_sanitize_dict({})`.

## Why This Matters

Based on the test suite, the code frequently serializes:
- **Large nested structures** with many primitive values (strings, ints, bools)
- **Lists of dictionaries** containing both sensitive and non-sensitive data
- **Mixed-type data** with primitives alongside complex objects

The primitive fast-path optimization is particularly effective here because:
- Every nested dict/list traversal hits multiple primitive values
- Tests like `test_serialize_inputs_large_list_of_sensitive_dicts` (100 items) and `test_serialize_inputs_performance_large` (500 users) show the multiplicative benefit of avoiding dispatcher overhead on each primitive

The `sanitize_data()` optimization helps in edge cases with empty dicts or `None` values, providing small but consistent gains across the test suite.

## Impact Assessment

The 14% speedup compounds when `serialize_inputs()` is called repeatedly in transaction logging workflows. Since this appears to be a database model for transaction logs, these functions likely execute in high-volume scenarios where even microsecond improvements per call translate to meaningful latency reductions at scale.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 30, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the community Pull Request from an external contributor label Dec 30, 2025
@codecov
Copy link

codecov bot commented Dec 30, 2025

Codecov Report

❌ Patch coverage is 61.86047% with 82 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.33%. Comparing base (9ce7d84) to head (3880e92).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
src/frontend/src/modals/flowLogsModal/index.tsx 0.00% 25 Missing ⚠️
...rc/modals/flowLogsModal/config/flowLogsColumns.tsx 0.00% 21 Missing ⚠️
...odals/flowLogsModal/components/LogDetailViewer.tsx 0.00% 14 Missing ⚠️
...low/services/database/models/transactions/model.py 90.27% 7 Missing ⚠️
src/lfx/src/lfx/graph/utils.py 30.00% 5 Missing and 2 partials ⚠️
src/lfx/src/lfx/graph/vertex/base.py 57.14% 4 Missing and 2 partials ⚠️
src/lfx/src/lfx/services/transaction/service.py 77.77% 2 Missing ⚠️

❌ Your project status has failed because the head coverage (39.50%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main   #11173      +/-   ##
==========================================
+ Coverage   33.23%   33.33%   +0.10%     
==========================================
  Files        1394     1399       +5     
  Lines       66068    66222     +154     
  Branches     9778     9785       +7     
==========================================
+ Hits        21956    22076     +120     
- Misses      42986    43021      +35     
+ Partials     1126     1125       -1     
Flag Coverage Δ
lfx 39.50% <62.50%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/backend/base/langflow/api/v1/monitor.py 50.00% <100.00%> (ø)
...ckend/base/langflow/serialization/serialization.py 72.28% <ø> (-0.30%) ⬇️
...flow/services/database/models/transactions/crud.py 78.78% <100.00%> (+40.85%) ⬆️
...kend/base/langflow/services/transaction/factory.py 100.00% <100.00%> (ø)
...kend/base/langflow/services/transaction/service.py 100.00% <100.00%> (ø)
src/backend/base/langflow/services/utils.py 81.09% <100.00%> (-0.39%) ⬇️
...s/API/queries/transactions/use-get-transactions.ts 0.00% <ø> (ø)
src/lfx/src/lfx/graph/vertex/vertex_types.py 43.38% <ø> (-0.30%) ⬇️
src/lfx/src/lfx/services/deps.py 59.49% <100.00%> (-3.67%) ⬇️
src/lfx/src/lfx/services/interfaces.py 100.00% <100.00%> (ø)
... and 8 more

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Base automatically changed from cz/add-logs-feature to main January 5, 2026 17:44
@codeflash-ai codeflash-ai bot closed this Jan 5, 2026
@codeflash-ai
Copy link
Contributor Author

codeflash-ai bot commented Jan 5, 2026

This PR has been automatically closed because the original PR #10820 by Cristhianzl was closed.

@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr10820-2025-12-30T19.18.44 branch January 5, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI community Pull Request from an external contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants