GPT-4 with Vision (GPT-4V) allows users to tell GPT-4 to analyze user-provided image inputs and is the latest capability we are making available to everyone. Incorporating additional modalities (such as image inputs) into large language models (LLMs) is considered by some to be a key frontier in artificial intelligence research and development. Multimodal LLMs offer the possibility of expanding the impact of linguistic-only systems with novel interfaces and capabilities, allowing them to solve new tasks and provide novel experiences to their users. In this system card, we discuss the security properties of the GPT-4V. Our security work for GPT-4V builds on the work done for GPT-4, and here we delve into the assessment, preparation, and mitigation work done specifically for image inputs.