-
Notifications
You must be signed in to change notification settings - Fork 649
FEAT: Support image edition/remix in OpenAIImageTarget #1322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
doc/_toc.yml
Outdated
| - file: code/targets/5.4_openai_tts_target | ||
| - file: code/targets/6_rate_limiting | ||
| - file: code/targets/7_http_target | ||
| - file: code/targets/8_openai_responses_target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this also belong into the same category? At that point almost everything is in there so we might as well not have an additional level 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Yeah maybe we can flatten the hierarchy and have 1. Chat target 2. Responses target 3. Image target 4. Video target 5. TTS target (and then renumbering the other targets). Or alternatively, since the multi-level sectioning works, we could have one group for all OpenAI targets and subpages for each target. Which do you prefer? I'll update the TOC accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a fan of a flat structure 🙂 Opinions may diverge here, though.
@rlundeen2 , thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the upcoming commit I flattened it, we can always do a small section if needed.
romanlutz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good! Let us know what you think about the comments 🙂
| image_files = [] | ||
| for img_path in image_paths: | ||
| try: | ||
| image_files.append(open(img_path, "rb")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unable to look it up right now but we might have images stored in the cloud that won't be load able this way (I think). Roman TODO: look this up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked it up. The normalizer takes care of this so you should be using the serializer.
img_serializer = data_serializer_factory(
category="prompt-memory-entries", # folder name
value=image_path, # local or remote image path
data_type="image_path"
)
image_base64 = await img_serializer.read_data_base64()
or if you need bytes then just read_data()
Let me know if this works for you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I updated the code. Looks like the OpenAI API is expecting a tuple with file name and MIME type instead of just bytes, so I had to add a couple of extra lines. For file name, I made a conscious choice to just generate a UUID for simplicity and privacy (instead of sending the full image path). We can revise if needed.
| return [response] | ||
| return response | ||
|
|
||
| async def _send_edit_request_async(self, message: Message) -> Message: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this is meant to be a different target (with just 1 method different). It is a different API after all.
My worry: what if there's a different image API tomorrow that takes text + image + image (let's say to remix image 1 together with image 2 based on text)? Then we would not be able to accommodate that well.
Option: name it OpenAIImageEditTarget (or Remix? Whatever the API is called is best).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a technical point of view these are two endpoints of the image API, but from a user point of view this is conceptually very close - you are generating an image based on inputs (which can be text only or text + image, similar to the single chat target that takes text only or text + image). In my own red teaming, I like being able to switch seamlessly from one to the other just based on the input, not creating two different targets.
Also I'm not sure I fully understand the second line. The updated target already supports text + multiple (up to 16) images 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I meant something different (which is the point I suppose). I meant that the two images are combined in a way that the text outlines. For example, "take the house from image 1, and the tree from image 2, ..."
If more image APIs pop up this could come to bite us. Tagging @rlundeen2 to determine if I'm paranoid 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if the OpenAI API supports such scenario by using the file names instead of an ordinal... I'm not sure, I'll try on Monday.
| await image_target.send_prompt_async(message=request) | ||
| finally: | ||
| if os.path.isfile(image_piece.original_value): | ||
| os.remove(image_piece.original_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need integration tests. Ideally, for the new functionality (entra and api key based) but also for the various models. I can help identify what we have so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be great! I have never touched the integration tests...
Description
This PR adds support for image edition/remix in
OpenAIImageTarget, specifically being able to pass one text prompt (always required) and 1-16 image(s). It also fixes a handful of minor inconsistencies with class parameters (with different OpenAI models supporting different values).Tests and Documentation