Immersive translation

Immersive translation is a visual experience improvement for camera-input translation

Lens and Translate supports a magical live view translation to support scan-and-read experience. In many cases, translated results on visual-driven surfaces cannot replace original text well. As a lead visual designer, I explored and launched visual improvement on both live camera view and post-capture screens.

Problem

The In-line translate filter distinguishes itself from other OCR-based verticals by incorporating an additional layer between the translated text and the original image. Lens conceals the original text with this extra layer with the translation results. While these translation results are well-suited for black text on white paper, translating text within visual images often produces results that do not integrate seamlessly. As a result, this failure of in-painting layer immersion often creates ‘visual busyness’.

The visual clutter is not the only concern; participants in the research noted that live translations are difficult to read due to inconsistencies in the in-painting. One participant stated, “The boxes behind the text flicker frequently in the camera view, and their color changes as I move my hand, making it hard to read the translation.”

How are translation layers rendered

The background of translation layers are rendered by picking a dominant color from an image captured by camera. There are three main causes that hinders translate immersion:

Pick the wrong background color: The in-painting layer looks off when choosing the wrong color. This is particularly problematic in live translation views.
Shadows create gradients: The in-painting layer gets off on the darker side of a page caused by shadow. It’s also more problematic in live translation views.
Text on complicated patterns: The trickiest challenge that is hard to resolve with the current rendering technology - If texts sit on visual image or complicate pattern, it’s almost impossible to pick a solid color from background. This problem exists in both live and post-capture screens.

Since the third cause requires a new rendering technology to resolve, I decided to focus more on tackling the first 2 issues with the visual design improvement.

VISUAL EXPLORATIONS

Collaborating with an engineer, I explored some visual options that could improve the immersion of the in-painting layers behind translation results:

Gradient in-painting: Applying gradients by picking two dominant colors from the left-end and the right-end of the background.
Rendering with triangulated polygons: Instead of rendering solid rectangles, rendering with small triangulated polygons to pick and apply multiple dominant background colors
Blurry in-painting: Apply layer blur to the in-painting layer to make borders less visible
Floating translation results: Separate translation results + in-painting layers from original image.

Short-term visual improvement

Although the triangulated in-painting boxes and gradient in-painting boxes represented optimal solutions, the team had limited resources to invest in a new rendering algorithm. Implementing such a solution would also lead to a significant increase in computational costs, which contradicted the technical advancements sought by the Lens team. Consequently, I needed to explore a purely UI improvement approach to address the issue.

For the immediate improvements, I refined the in-painting visual with minimal UI design adjustments, incorporating the following modifications:

Blurring the border
Add the layer blur + change the color picking logic
Apply the new rendering rules only to the post-capture view

After enhancing the immersion of the in-painting, we encountered an unexpected issue where the readability of the translated texts was compromised in certain instances. Smaller and brighter texts proved to be particularly difficult to read. To address this problem, I implemented a dark highlight behind the translated text whenever the contrast ratio fell below 4.5:1 for small text and 3.2:1 for large text. This adjustment resulted in a significantly tighter appearance of the in-painting layer, effectively reducing visual noise compared to before.

ENVISIONing the future of translate

The team kept exploring a new way of rendering to achieve more immersive translation results. One of the innovative algorithms we experimented is GAN (Generative Adversarial Networks) by Meta Research team. GAN is a type of AI method that generates photorealistic pictures by understanding the existing image or scene. Using GAN AI, it is possible to recreate an entire image with translated results, effectively replacing the original image. In the future, this technology may allow for the rendering of translated results not only by copying complex patterns and images to create in-painting layers but also by duplicating font faces to print translated texts.