Adding AI Vision to SketchUp with OpenAI's gpt-4o Model (Ruby Snippet)

Earlier this week, OpenAI released its new gpt-4o model. This model not only improves and updates its current AI models, but it also adds vision ability. As a result, a user can upload images and then the AI can identify things in the image based on a user prompt. There are some great examples of the new capability on OpenAI’s website.

Since I previously implemented the OpenAI connection with SketchUp using both plain Ruby code and a handy extension, I had to try out whether we can connect the two using these new capabilities, too. One use case for this could then be the ability to ask questions like “What may be missing in my model?” or “Is there a cat in the model?” (Not sure why you would ask the latter but it leads to a correct answer!)

As it turns out, giving SketchUp this AI capability is not too hard to do. All I needed was to modify my old code a bit and add the ability to automatically upload a Base64-encoded screenshot image together with the prompt. You can try this out for yourself using the code snippet at the end of this post. All you need is an account with OpenAI.

Example Results

Using a very simple 3D model (shown in the image above), I first asked “Is there anything wrong with this building?” As you can see in the image at the top of my post, the AI suggested some valid corrections: The columns are too thin, they are badly placed, and it mentions the fact that the building lacks detail. All valid items that I could then address in my next modeling steps.

Another run using the prompt “How can I improve this model?” then provided the suggestions copied here verbatim:

Improving the model can involve several steps depending on the purpose and desired level of detail. Here are some suggestions:

1. **Add Textures and Colors:**

- Apply textures to the walls, roof, and columns to make the model more realistic. For example, you can use brick or stone textures for the walls and a shingle texture for the roof.

- Add colors to different elements to distinguish them and make the model visually appealing.

2. **Detail the Columns:**

- Add fluting (vertical grooves) to the columns to give them a classical look.

- Add bases and capitals to the columns for more architectural detail.

3. **Enhance the Roof:**

- Add roof tiles or shingles to give the roof a more realistic appearance.

- Include details like gutters and downspouts.

4. **Add Windows and Doors:**

- Incorporate windows and doors to make the building functional and realistic.

- Add frames, sills, and other details to the windows and doors.

5. **Improve the Surroundings:**

- Add landscaping elements such as grass, trees, and pathways around the building.

- Include additional elements like benches, lights, or other outdoor furniture.

6. **Scale and Proportion:**

- Ensure that the proportions of the building and its elements are accurate and realistic.

- Compare the size of the building to the human figure to ensure proper scale.

7. **Interior Details:**

- If the model is meant to show the interior, add interior walls, furniture, and other details.

- Include lighting fixtures and other interior elements.

8. **Structural Details:**

- Add structural details like beams, trusses, and supports if the model is meant to show construction details.

9. **Refine Geometry:**

- Ensure that all edges and surfaces are clean and well-defined.

- Remove any unnecessary geometry to optimize the model.

10. **Add Context:**

- Place the building in a context that makes sense, such as a cityscape, countryside, or other relevant environment.

By incorporating these suggestions, you can significantly enhance the realism and detail of your model.

The image below shows another example and its response. This time the prompt was “Interpret this art piece.”

Code Snippet

You can try this for yourself: Once you sign up with OpenAI and get yourself an OpenAI API key, paste the following code into the Ruby Code Editor (or SketchUp’s built-in Ruby Console), add your own API key, and then hit Run!

This code relies on the Base64 image-to-text conversion functionality, which is why we need to include require 'base64' at the top. Next, we save the current screenshot of the SketchUp model using the write_image method of the active_view. The OpenAI API request is then similar to what we have used before, only now we need to include the image_url parameter to deliver the encoded image file.

If you like, you can tweak the detail parameter of the submission, which allows you to control image detail for the request. Just keep in mind that the higher detail will also cost more money. Information about that can be found on OpenAI’s pages.

Tutorial Video

I have just implemented this capability into my OpenAI Explorer (Experimental) extension (version 2.3). Check out the following video for some details.

Is This Useful?

I am not entirely sure yet how much this will help with my daily workflow. The examples above are so far my most useful examples. There are also currently some important limitations with this model, e.g. it cannot count items very well, and the detail of the processing is obviously related to the detail in the image, which may require a higher resolution – and therefore cost – for some tasks.

What is your impression of this? Can you foresee this becoming a useful tool? Mention it in the comments below. If there’s enough demand, I might add this capability to my extension as well.