Skip to content

Replace manual JSON parsing with Instructor for structured output reliability#339

Open
pegahmansourian wants to merge 1 commit into
VectifyAI:mainfrom
pegahmansourian:fix/replace-manual-json-parsing-with-instructor
Open

Replace manual JSON parsing with Instructor for structured output reliability#339
pegahmansourian wants to merge 1 commit into
VectifyAI:mainfrom
pegahmansourian:fix/replace-manual-json-parsing-with-instructor

Conversation

@pegahmansourian

Copy link
Copy Markdown

Problem

The codebase uses manual JSON prompt instructions across 11 locations in
page_index.py, with a custom extract_json() parser in utils.py that
handles malformed output through multiple fallback strategies.

When the LLM returns slightly malformed JSON, extract_json() silently
returns {}, causing downstream KeyError crashes like:

json_content['completed']  # KeyError if extract_json returned {}

Solution

Replace manual JSON parsing with Instructor,
which is patched directly onto the existing LiteLLM client already used in the codebase.

Changes

utils.py

  • Add sync_instructor_client and async_instructor_client patched via instructor.from_litellm()
  • Add llm_structured() for sync structured calls
  • Add llm_astructured() for async structured calls
  • extract_json() and get_json_content() can be removed once all callers are updated

page_index.py

  • Add Pydantic models for all 11 structured outputs
  • Replace all extract_json() calls with llm_structured() / llm_astructured()
  • Remove "Directly return the final JSON structure" from all prompts — Instructor handles this automatically

Benefits

  • Removes 40 lines of fragile JSON parsing with multiple fallback hacks
  • Adds automatic retry on malformed output — no more silent {} returns
  • Adds type safety via Pydantic — Literal["yes", "no"] fields are always valid
  • No behavior change — same LiteLLM routing, same models, same logic
  • instructor is the only new dependency

New dependency

This PR adds one new dependency: instructor

Add to requirements.txt:

instructor

Or install directly:

pip install instructor

…iability

- Add llm_structured() and llm_astructured() in utils.py using instructor.from_litellm()
- Add Pydantic models for all 11 structured outputs in page_index.py
- Replace all extract_json() calls with typed Instructor calls
- Remove 'Directly return the final JSON structure' from all prompts
- Fixes silent KeyError crashes when extract_json() returned {} on malformed LLM output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant