Skip to content

cookbook(openai-agents): use gpt-4.1-mini + sharpen Pattern-1 prompts#40

Open
DK09876 wants to merge 1 commit into
mainfrom
cookbook/openai-agents-model-prompt-polish
Open

cookbook(openai-agents): use gpt-4.1-mini + sharpen Pattern-1 prompts#40
DK09876 wants to merge 1 commit into
mainfrom
cookbook/openai-agents-model-prompt-polish

Conversation

@DK09876

@DK09876 DK09876 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Two related changes to notebooks/10-openai-agents-memory.ipynb to make the memory demos work reliably end-to-end. Both pair with the integration PR vectorize-io/hindsight#1866.

What & why

1. Model upgrade: gpt-4o-minigpt-4.1-mini (cells 4 and 9)

Pattern 2 (memory_instructions auto-inject) was returning "I currently don't have that information" under gpt-4o-mini even though memory_instructions had injected the recalled facts into the system prompt — the model was simply not grounding its answer on them. Under gpt-4.1-mini, cell 10 now returns the recalled facts verbatim:

"You use Python and SQL as your programming languages. As for tools, you use VS Code as your code editor, and you prefer to work in dark mode."

Pattern 2 is the load-bearing demo for the auto-inject path most users will actually adopt.

2. Pattern-1 prompt sharpening (cells 7 and 8)

Same template the Claude SDK cookbook adopted (commit d92ccab373 on cookbook main): nudge the agent to call recall_memory explicitly before answering, instead of leaving the tool-use decision to the model.

  • Cell 7: "What IDE do I use? And what's my job?""Recall what you know about me first, then answer: what IDE do I use and what's my job?"
  • Cell 8: same template ("Use the recall tool to refresh what you know about me first, then recommend...").

Cell 7 stays best-effort under gpt-4.1-mini's tool-use variance — the model still sometimes asks "what's your IDE?" rather than recalling — but with the sharpened prompt the failure rate drops and the cell makes the user's intent explicit to a reader.

Relationship to other PRs

Test plan

  • Cookbook 11/11 cells execute against canonical PR #1866 build, 0 errors.
  • Cell 10 (Pattern 2 auto-inject) reliably returns recalled facts (Python/SQL/VS Code/dark mode).
  • Cell 7 (Pattern 1 tool recall) is best-effort under gpt-4.1-mini; cookbook narrative now matches what the user can expect.
  • Reviewer to run end-to-end against the merged hindsight-openai-agents.

🤖 Generated with Claude Code

Two related changes to make 10-openai-agents-memory.ipynb's memory demos
work reliably end-to-end:

1. Model: gpt-4o-mini → gpt-4.1-mini in both cells where the Agent is
   constructed (cell 4 explicit-tool agent, cell 9 auto-memory agent).
   Pattern 2 (memory_instructions auto-inject) was returning "I currently
   don't have that information" under gpt-4o-mini even though the
   memory_instructions hook had injected the recalled facts into the
   system prompt — the model was simply not grounding its answer on
   them. With gpt-4.1-mini, cell 10 now returns:
     "You use Python and SQL as your programming languages.  As for
      tools, you use VS Code as your code editor, and you prefer to
      work in dark mode."

2. Pattern-1 prompts (cells 7 and 8): nudge the agent to call
   recall_memory explicitly before answering, instead of leaving the
   tool-use decision to the model.  This is the same template the
   Claude SDK cookbook adopted (commit d92ccab on cookbook main).
   Cell 7 stays best-effort under gpt-4.1-mini's tool-use variance —
   the model still sometimes asks "what's your IDE" rather than
   recalling — but with the sharpened prompt the failure rate is
   visibly lower, and the cell makes the user's intent (recall first,
   then answer) explicit to a reader.

Pattern 2 (cell 10) is the load-bearing demo for the auto-inject path
that most users will actually adopt.  Verified end-to-end against the
canonical PR #1866 build of hindsight_openai_agents.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant