FEAT: Support image edition/remix in OpenAIImageTarget #1322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

fdubut wants to merge 2 commits into Azure:main from fdubut:image_target

Contributor

fdubut commented Jan 22, 2026 •

edited

Loading

Description

This PR adds support for image edition/remix in OpenAIImageTarget, specifically being able to pass one text prompt (always required) and 1-16 image(s). It also fixes a handful of minor inconsistencies with class parameters (with different OpenAI models supporting different values).

Tests and Documentation

Updated unit tests to include support for image edition/remix, and the new requirements for input types.
Added an example of image edition/remix to the target documentation.
Split up the multi-modal target notebook into four notebooks (one per output modality).


          Support for text+image input for OpenAIImageTarget

2994ad1

romanlutz reviewed

View reviewed changes

doc/_toc.yml Outdated Show resolved Hide resolved

doc/_toc.yml Outdated

    
                      - file: code/targets/5.4_openai_tts_target

                    - file: code/targets/6_rate_limiting

                    - file: code/targets/7_http_target

                    - file: code/targets/8_openai_responses_target

Contributor

romanlutz Jan 23, 2026

Wouldn't this also belong into the same category? At that point almost everything is in there so we might as well not have an additional level 🙂

Contributor Author

fdubut Jan 23, 2026

Fair point. Yeah maybe we can flatten the hierarchy and have 1. Chat target 2. Responses target 3. Image target 4. Video target 5. TTS target (and then renumbering the other targets). Or alternatively, since the multi-level sectioning works, we could have one group for all OpenAI targets and subpages for each target. Which do you prefer? I'll update the TOC accordingly.

Contributor

romanlutz Jan 23, 2026

I'm a fan of a flat structure 🙂 Opinions may diverge here, though.

@rlundeen2 , thoughts?

Contributor Author

fdubut Jan 24, 2026

In the upcoming commit I flattened it, we can always do a small section if needed.

romanlutz reviewed

View reviewed changes

Contributor

romanlutz left a comment

This is really good! Let us know what you think about the comments 🙂

pyrit/prompt_target/openai/openai_image_target.py Outdated Show resolved Hide resolved

pyrit/prompt_target/openai/openai_image_target.py Outdated

    
                      image_files = []

                      for img_path in image_paths:

                          try:

                              image_files.append(open(img_path, "rb"))

Contributor

romanlutz Jan 23, 2026

I'm unable to look it up right now but we might have images stored in the cloud that won't be load able this way (I think). Roman TODO: look this up

Contributor

romanlutz Jan 23, 2026

Looked it up. The normalizer takes care of this so you should be using the serializer.

img_serializer = data_serializer_factory(
        category="prompt-memory-entries",  # folder name
        value=image_path,  # local or remote image path
        data_type="image_path"
)
image_base64 = await img_serializer.read_data_base64()

or if you need bytes then just read_data()

Let me know if this works for you!

Contributor Author

fdubut Jan 24, 2026

Nice! I updated the code. Looks like the OpenAI API is expecting a tuple with file name and MIME type instead of just bytes, so I had to add a couple of extra lines. For file name, I made a conscious choice to just generate a UUID for simplicity and privacy (instead of sending the full image path). We can revise if needed.

pyrit/prompt_target/openai/openai_image_target.py Outdated Show resolved Hide resolved

pyrit/prompt_target/openai/openai_image_target.py

    
                      return [response]

                      return response

                  async def _send_edit_request_async(self, message: Message) -> Message:

Contributor

romanlutz Jan 23, 2026

I'm wondering if this is meant to be a different target (with just 1 method different). It is a different API after all.

My worry: what if there's a different image API tomorrow that takes text + image + image (let's say to remix image 1 together with image 2 based on text)? Then we would not be able to accommodate that well.

Option: name it OpenAIImageEditTarget (or Remix? Whatever the API is called is best).

Contributor Author

fdubut Jan 23, 2026

From a technical point of view these are two endpoints of the image API, but from a user point of view this is conceptually very close - you are generating an image based on inputs (which can be text only or text + image, similar to the single chat target that takes text only or text + image). In my own red teaming, I like being able to switch seamlessly from one to the other just based on the input, not creating two different targets.

Also I'm not sure I fully understand the second line. The updated target already supports text + multiple (up to 16) images 😊

Contributor

romanlutz Jan 23, 2026

Right, I meant something different (which is the point I suppose). I meant that the two images are combined in a way that the text outlines. For example, "take the house from image 1, and the tree from image 2, ..."

If more image APIs pop up this could come to bite us. Tagging @rlundeen2 to determine if I'm paranoid 😆

Contributor Author

fdubut Jan 24, 2026

I am wondering if the OpenAI API supports such scenario by using the file names instead of an ordinal... I'm not sure, I'll try on Monday.

pyrit/prompt_target/openai/openai_image_target.py Outdated Show resolved Hide resolved

tests/unit/target/test_image_target.py Show resolved Hide resolved

tests/unit/target/test_image_target.py

    
                          await image_target.send_prompt_async(message=request)

                  finally:

                      if os.path.isfile(image_piece.original_value):

                          os.remove(image_piece.original_value)

Contributor

romanlutz Jan 23, 2026

We also need integration tests. Ideally, for the new functionality (entra and api key based) but also for the various models. I can help identify what we have so far.

Contributor Author

fdubut Jan 24, 2026

That would be great! I have never touched the integration tests...


          Addressing PR review comments

befe3ca

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet