cookbook(openai-agents): use gpt-4.1-mini + sharpen Pattern-1 prompts#40
Open
DK09876 wants to merge 1 commit into
Open
cookbook(openai-agents): use gpt-4.1-mini + sharpen Pattern-1 prompts#40DK09876 wants to merge 1 commit into
DK09876 wants to merge 1 commit into
Conversation
Two related changes to make 10-openai-agents-memory.ipynb's memory demos
work reliably end-to-end:
1. Model: gpt-4o-mini → gpt-4.1-mini in both cells where the Agent is
constructed (cell 4 explicit-tool agent, cell 9 auto-memory agent).
Pattern 2 (memory_instructions auto-inject) was returning "I currently
don't have that information" under gpt-4o-mini even though the
memory_instructions hook had injected the recalled facts into the
system prompt — the model was simply not grounding its answer on
them. With gpt-4.1-mini, cell 10 now returns:
"You use Python and SQL as your programming languages. As for
tools, you use VS Code as your code editor, and you prefer to
work in dark mode."
2. Pattern-1 prompts (cells 7 and 8): nudge the agent to call
recall_memory explicitly before answering, instead of leaving the
tool-use decision to the model. This is the same template the
Claude SDK cookbook adopted (commit d92ccab on cookbook main).
Cell 7 stays best-effort under gpt-4.1-mini's tool-use variance —
the model still sometimes asks "what's your IDE" rather than
recalling — but with the sharpened prompt the failure rate is
visibly lower, and the cell makes the user's intent (recall first,
then answer) explicit to a reader.
Pattern 2 (cell 10) is the load-bearing demo for the auto-inject path
that most users will actually adopt. Verified end-to-end against the
canonical PR #1866 build of hindsight_openai_agents.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related changes to
notebooks/10-openai-agents-memory.ipynbto make the memory demos work reliably end-to-end. Both pair with the integration PR vectorize-io/hindsight#1866.What & why
1. Model upgrade:
gpt-4o-mini→gpt-4.1-mini(cells 4 and 9)Pattern 2 (
memory_instructionsauto-inject) was returning "I currently don't have that information" undergpt-4o-minieven thoughmemory_instructionshad injected the recalled facts into the system prompt — the model was simply not grounding its answer on them. Undergpt-4.1-mini, cell 10 now returns the recalled facts verbatim:Pattern 2 is the load-bearing demo for the auto-inject path most users will actually adopt.
2. Pattern-1 prompt sharpening (cells 7 and 8)
Same template the Claude SDK cookbook adopted (commit
d92ccab373on cookbook main): nudge the agent to callrecall_memoryexplicitly before answering, instead of leaving the tool-use decision to the model."What IDE do I use? And what's my job?"→"Recall what you know about me first, then answer: what IDE do I use and what's my job?""Use the recall tool to refresh what you know about me first, then recommend...").Cell 7 stays best-effort under gpt-4.1-mini's tool-use variance — the model still sometimes asks "what's your IDE?" rather than recalling — but with the sharpened prompt the failure rate drops and the cell makes the user's intent explicit to a reader.
Relationship to other PRs
cookbook/pattern1-prompt-polish-oa-llamaindex) for the OpenAI Agents notebook. That PR was abandoned because the prompt polish alone didn't reliably move Pattern 1 cells undergpt-4o-mini. This PR pairs the polish with the model bump, which is what unlocks Pattern 2 reliably.Test plan
🤖 Generated with Claude Code