Skip to content

fix(lfx): read local files under S3 storage to unblock the Assistant#13862

Open
Cristhianzl wants to merge 1 commit into
release-1.11.0from
cz/fix-s3-assistant
Open

fix(lfx): read local files under S3 storage to unblock the Assistant#13862
Cristhianzl wants to merge 1 commit into
release-1.11.0from
cz/fix-s3-assistant

Conversation

@Cristhianzl

@Cristhianzl Cristhianzl commented Jun 26, 2026

Copy link
Copy Markdown
Member

What

When LANGFLOW_STORAGE_TYPE=s3 is configured, the storage-aware file readers (read_file_bytes, read_file_text, and get_file_size) incorrectly routed every path through the S3 key parser, including genuine local filesystem paths.

As a result, attempting to read an absolute local path raised:

ValueError: Invalid S3 path format: /app/.venv/.../lfx/components/_importing.py.
Expected 'flow_id/filename'

This broke the Langflow Assistant. It injects the installed lfx components directory into a Directory node (inject_lfx_components_path), which scans the local filesystem and passes absolute file paths to parse_text_file_to_data(). Since those paths were incorrectly treated as S3 object keys, the read failed with:

Error building Component Directory

Fixes #13798.

Summary by CodeRabbit

  • Bug Fixes
    • Local files can now be read and sized correctly even when storage is set to S3.
    • Absolute file paths are treated as local files, preventing invalid S3 path errors.
    • S3-style relative file keys still route to storage correctly and won’t be mistaken for local files.
    • Text parsing now works consistently for existing local files in S3 mode.

@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jun 26, 2026
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

The PR updates file-reading and file-size helpers to treat existing absolute local paths as local during S3 mode. It also adds regression tests for local-path reads, S3-key routing, and parse_text_file_to_data with a real local file.

Changes

Local files during S3 storage mode

Layer / File(s) Summary
Local-path fallback in storage helpers
src/lfx/src/lfx/base/data/storage_utils.py
Introduces _is_existing_local_file and uses it in the S3 branches of read_file_bytes and get_file_size to read existing absolute local paths from disk.
Regression coverage for S3 mode
src/lfx/tests/unit/base/data/test_storage_utils.py, src/lfx/tests/unit/base/data/test_utils.py
Adds tests for absolute local reads, S3-key routing with a colliding local file, read_file_text, get_file_size, and parse_text_file_to_data under storage_type="s3".

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 9
✅ Passed checks (9 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the core fix: allowing local files to be read while S3 storage is configured.
Linked Issues check ✅ Passed The changes directly address issue #13798 by routing existing absolute local paths to disk reads instead of S3 parsing.
Out of Scope Changes check ✅ Passed The code and tests stay focused on storage-aware local-file handling under S3 mode and do not introduce unrelated scope.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Coverage For New Implementations ✅ Passed Updated test_storage_utils.py and test_utils.py add real regression coverage for the S3/local-path fix, including read_file_bytes, read_file_text, get_file_size, and parse_text_file_to_data.
Test Quality And Coverage ✅ Passed New pytest regression tests cover bytes/text/size and parse_text_file_to_data, assert routing/no-S3 calls, and use proper async patterns.
Test File Naming And Structure ✅ Passed The added tests are backend pytest files named test_*.py, use clear class/test names, and include positive and negative cases with proper setup via tmp_path/patch.
Excessive Mock Usage Warning ✅ Passed Mocks are limited to settings/storage-service boundaries; the tests still exercise real tmp_path filesystem reads and stats, so mock usage isn't excessive.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cz/fix-s3-assistant

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown
Contributor

✅ Test Coverage Advisor

No source changes detected without accompanying tests. Thanks for keeping coverage up! 🎉

Advisory check only — never blocks merge.

@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jun 26, 2026
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.07%. Comparing base (dc39921) to head (f291685).
⚠️ Report is 1 commits behind head on release-1.11.0.

Files with missing lines Patch % Lines
src/lfx/src/lfx/base/data/storage_utils.py 80.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                 @@
##           release-1.11.0   #13862      +/-   ##
==================================================
- Coverage           59.65%   59.07%   -0.58%     
==================================================
  Files                2346     2347       +1     
  Lines              224374   225111     +737     
  Branches            31392    34472    +3080     
==================================================
- Hits               133841   132981     -860     
- Misses              89000    90595    +1595     
- Partials             1533     1535       +2     
Flag Coverage Δ
backend 67.24% <ø> (+0.04%) ⬆️
frontend 57.51% <ø> (-0.90%) ⬇️
lfx 56.65% <80.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/lfx/src/lfx/base/data/storage_utils.py 63.06% <80.00%> (+1.67%) ⬆️

... and 288 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lfx/src/lfx/base/data/storage_utils.py`:
- Around line 27-42: The helper _is_existing_local_file is classifying absolute
paths too narrowly by requiring the file to already exist, which lets missing
absolute local paths fall through to parse_storage_path() and surface the wrong
error. Update _is_existing_local_file so any absolute path is treated as a local
file candidate, and let the later filesystem read/stat path in the storage
helpers surface FileNotFoundError instead of ValueError. Make the same
adjustment anywhere this helper is used so absolute paths are handled
consistently before S3 key parsing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1bcfd800-c931-45b4-84dd-32bf3e60c90b

📥 Commits

Reviewing files that changed from the base of the PR and between d2d1bdc and f291685.

📒 Files selected for processing (3)
  • src/lfx/src/lfx/base/data/storage_utils.py
  • src/lfx/tests/unit/base/data/test_storage_utils.py
  • src/lfx/tests/unit/base/data/test_utils.py

Comment on lines +27 to +42
def _is_existing_local_file(file_path: str) -> bool:
"""Return True when file_path is an absolute path to a real local file.

Under S3 storage some callers still hand us genuine local paths (e.g. the
Langflow Assistant injects the installed lfx components dir into a Directory
node). A real local file is never an S3 key, so it must be read from disk
instead of being parsed as "flow_id/filename". See issue #13798.

The check is restricted to ABSOLUTE paths on purpose: S3 keys are always
relative ("flow_id/filename"), so requiring an absolute path makes it
impossible to mistake a relative S3 key for a same-named file that happens
to exist relative to the process CWD.
"""
try:
path_obj = Path(file_path)
return path_obj.is_absolute() and path_obj.is_file()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Classify absolute local paths before checking existence.

Absolute local paths are never valid S3 keys. With the current helper, a missing path like /tmp/missing.txt still falls through to parse_storage_path() and raises ValueError instead of the documented FileNotFoundError. Treat any absolute path as local here and let the filesystem read/stat surface the real error.

Suggested fix
-def _is_existing_local_file(file_path: str) -> bool:
-    """Return True when file_path is an absolute path to a real local file.
+def _is_absolute_local_path(file_path: str) -> bool:
+    """Return True when file_path is an absolute local filesystem path.
@@
     """
     try:
-        path_obj = Path(file_path)
-        return path_obj.is_absolute() and path_obj.is_file()
+        return Path(file_path).is_absolute()
     except OSError:
         return False
@@
-        if _is_existing_local_file(file_path):
+        if _is_absolute_local_path(file_path):
             return Path(file_path).read_bytes()
@@
-        if _is_existing_local_file(file_path):
+        if _is_absolute_local_path(file_path):
             return Path(file_path).stat().st_size

Also applies to: 91-93, 180-182

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lfx/src/lfx/base/data/storage_utils.py` around lines 27 - 42, The helper
_is_existing_local_file is classifying absolute paths too narrowly by requiring
the file to already exist, which lets missing absolute local paths fall through
to parse_storage_path() and surface the wrong error. Update
_is_existing_local_file so any absolute path is treated as a local file
candidate, and let the later filesystem read/stat path in the storage helpers
surface FileNotFoundError instead of ValueError. Make the same adjustment
anywhere this helper is used so absolute paths are handled consistently before
S3 key parsing.

@erichare erichare self-requested a review June 26, 2026 15:47

@erichare erichare left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions Bot added the lgtm This PR has been approved by a maintainer label Jun 26, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 44%
44.14% (59466/134705) 69.43% (8100/11665) 42.6% (1349/3166)

Unit Test Results

Tests Skipped Failures Errors Time
5083 0 💤 0 ❌ 0 🔥 15m 15s ⏱️

@itxaiohanglover

Copy link
Copy Markdown

Nice fix! Checking for existing local files before treating paths as S3 keys prevents the Assistant's local component paths from being parsed as storage keys. The absolute-path restriction is a smart guard against false positives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working lgtm This PR has been approved by a maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants