How smartphones take better and better pictures

Stacking

The iPhone sensor is about 70 times smaller than the “full frame”, and for starters, mobile vendors had to play this gigantic gap. It directly affects the dynamic range (DD) of a photograph – the brightness spectrum between white and black colors, which the camera is able to capture. For its prototype – the human eye – this concept is irrelevant: “looking around” the surrounding space, we read visual information from the grains that “stick together” the brain. Due to the fact that the eye at a time looks only at one point, instantly adapting to differences in illumination (expanding and narrowing the pupil), a person sees every detail with maximum brightness and contrast.

February 16, 2020

108 megapixels – a breakthrough or failure? How iPhone, Samsung, and Huawei Cameras Actually Work

The camera is forced to simultaneously capture the entire scene, adjusting the exposure (its “pupil”) relative to the reference point, so a priori can not catch all the shades. The smaller the DD, the worse it conveys black-and-white gradations: everyone came across this, trying to capture a darkened object on a bright day — you get either a black foreground and a colorful sky, or the desired object against a “scorched” white background. In full-frame cameras of recent years, this problem has been largely solved: the dynamic range is sufficient for most scenes, and in difficult cases the photographer needs to competently work with light. In addition, the desire for pictures in which “everything is visible” characterizes the fans: well-known photographers often build the frame in contrast, highlighting the main thing with the help of black and white drawing.

But for smartphones, the original DD is so small that its extension looked objectively necessary. The initial step was the function of shooting in RAW files – digital negatives, with computer development restoring maximum information in the picture. HTC and Nokia were the first to introduce it in the early 2010s. But that was not enough. Much more effective than “pulling out” dark and light areas in a RAW converter, you can expand the dynamic range by adding information from another frame. This technique is called “epsilon photography”, or, as has become commonplace, stacking: several duplicates with a fork are created (changing one of the parameters by a given number of steps), and then they overlap each other. This simulates the ability of the human eye to repeatedly “pass” the scene with readaptation. Advanced photographers have been using stacking since the 2000s for a variety of tasks: increasing the depth of field (focus on the focus), creating time-laps (focus on the time taken) and panoramas (focus on the camera).

In relation to DD, stacking allows you to remove darkened and lightened versions of a scene (expo-fork), add the received information and display an HDR picture in a multi-frame (high dynamic range). Attempts to automatically create HDR were already in the iPhone 4 (2010) and Samsung Galaxy S3 (2012), but gluing took time and sinned with unnatural shades. Another thing today: smartphones no longer combine the classic triple plug (“normal” frame, “light” and “dark”), but up to a dozen images obtained with various settings.

This was realized thanks to an unprecedented rate of fire. Fast data reading by the processor, the absence of mechanical parts, instant autofocus due to the simplest optical design – all this allows the smartphone to take tens of frames per second. In professional cameras, the speed of serial shooting only recently approached this level – before that, the problem of photographers was to take even one frame on time, since there was a lag of 0.1-0.2 seconds between pressing the button and the shutter.

In mobile photography, the lag is not even zero, but a negative value: as soon as the user opens the Camera application, the sensor starts to “pour” pictures, saving them in a temporary buffer, because this is the only way to transmit the image to the screen. At the moment the shutter button is pressed, the buffer is already filled with photo instants of the near past: the phone selects the sharpest among them and starts processing.

iPhone Camera HDR — Photo by Nadine Shaabana on Unsplash

Today, HDR mode is enabled by default on many smartphones. And the ceiling in its development is clearly not achieved. So, “smart” stacking appeared in iPhones, pampering users with unusual ways of gluing frames: for example, simulating a long shutter speed with the help of many short shutters in LivePhoto mode (suitable for smoothing running water). And on Google Pixel phones, neural network stacking (HDR+) instantly reveals and glues several RAW files, something that is rarely done even by professional photographers.

Multi-eyed invasion

But, compensating for the size of the matrix by stacking, smartphone developers would have made little progress in improving photos, had they not come up with a solution to the problem with optics. After all, the artistic properties of the picture depend on it: volume transfer, micro-contrast, drawing in the blur zone. Sensors only register a beam of light passed through the lens, and if they, as befits microelectronics, improve every year, then optical instrumentation is a much more conservative industry. The aperture ratio and “airiness” of the picture directly depend on the amount of glass used, and increasing the maximum aperture from f/2.0 to f/1.4 gives a twofold increase in weight and dimensions.

Therefore, telephone lenses a priori do not stand comparison with full-fledged lenses (even if the manufacturer put the words Leica and Zeiss on the body, which have a magical effect for many). What to do? Here, smartphones also required multi-camera (for stacking, one photomodule would be enough). It should be noted that attempts to “clone” the rear camera were made back in 2007 (the first such model was Samsung SCH-B710), while buyers were trying to seduce by shooting 3D photos. But the function turned out to be useless, and a full-fledged offensive of “multi-eyed” devices began only 10 years later: as it often happens, the market was infected with the idea of a second camera after it appeared on the iPhone.

At the same time, the purpose of several cameras has changed. Today, they, firstly, provide shooting with different viewing angles. The standard “triad” of lenses is a standard one with a focal length (FR) of 20–25 mm in terms of a full-frame sensor, wide-angle (12–15 mm) and “long-range” (50 mm).

There are variations. So, Apple stopped on two cameras, only changing the additional module in the iPhone 11 from a telephoto lens to “wide”. Chinese companies are increasingly practicing an expanded set: in the new Xiaomi MiNote 10 it is the main lens (25 mm), portrait (50 mm), “shirik” (13 mm) plus special modules for zoom and macro photography. With such richness, there is no need for external attached lenses (“clothespins”): the accessory popular in the mid-2010s remained on the sidelines of progress.

But the main value of multi-eyed devices is that their cameras can capture images at the same time, as if helping each other. Together, mini lenses transmit more light than each individually, simulating expensive optics with a wide aperture. Compared with the first version of the implementation of this principle (in HTC One, 2014), today’s devices have advanced far ahead.

For example, at Huawei, a picture is captured with two matrices – color and black and white. The fact is that photosensors can detect only the intensity of the photon flux, but not the color information (light wavelength). Therefore, in all cameras the matrix is covered with a multi-colored filter, where each pixel corresponds to one of the three primary colors – green, red or blue (by analogy with the three types of cones in the human eye that perceive a certain part of the color spectrum). Thus, the pixels are initially painted only in three colors (then, by averaging the data of neighboring pixels, they are mixed, restoring the entire color spectrum), and some of the photons that do not correspond to the given wavelength are cut off. The result is a loss of light. Therefore, Huawei has one sensor responsible for color, and black and white, which does not have a filter above it, catches a maximum of photons: after gluing, a brighter and clearer picture is obtained.

Equally elegantly, mobile brands have resolved the issue of optical zoom. No more retractable lenses, like in the Samsungs of the early 2010s: only a “smart” image combining. Simultaneous shooting with two fixed lenses – say, 25 mm and 50 mm – covers the focal lengths between them, providing a double zoom without loss of quality: a technique also unthinkable in professional photographic equipment. A more radical version is presented in the Huawei P30 Pro: a 125 mm telephoto lens and a five-fold optical zoom. According to all laws of optics, such an approximation requires a tube to drive out of the case, but the Chinese managed to hide everything in a thin case due to the periscopic lens design.

Personal portraitist

But for many buyers, the main symbol of the progress of mobile cameras has become the ability to shoot portraits with a small depth of field (depth of sharply depicted space). In a professional environment, there is a twofold relation to pictures with a blurred background (bokeh): it is useful as a depth crop tool (for hiding unwanted objects in the background), although the classics of photography preferred just a lot of depth of field, trying to semantically beat out the background in the composition. Gradually, from the lifesaver of “lazy” photographers, the blurred background turned into a kind of fetish: in the eyes of an inexperienced viewer, bokeh is an indispensable element of “expensive”, “professional” photography. This is explained physiologically: such images resemble the work of the eye, which, focusing on a specific point, also erodes secondary objects in the field of view.

photo of a girl with the blur background — Photo by Oleg Ivanov on Unsplash

Smartphone makers indulge the tastes of the public. In the mid-2010s, experiments began with software bokeh: the gadget highlighted the face in the frame and blurred everything else. The result was far from ideal – the algorithms inaccurately delineated the human figure, then moving away part of the hairstyle and frame of glasses in the bokeh, or, on the contrary, surrounding the portrayed “halo” from sharp details in the background. And the blur itself, mathematically imitating the pattern of optics, looked unnatural, reminiscent of the bluer tool beloved by new retouchers in Adobe Photoshop.

In an effort to improve the portrait mode, the developers introduced ToF (time of flight) technology into the camera phones: a special sensor on the rear panel launches light pulses to the objects caught in the frame, measuring the number of milliseconds spent by photons on the way. This is a simplified version of the lidar (LiDAR – Light Detection and Ranging) – a laser rangefinder in unmanned vehicles, which calculates the distance to objects in 3D space.

Today ToF is present in the flagships of Samsung and Huawei: when taking portraits of their cameras, they map the depth of space, adjusting the degree of blurring of objects. Apple dispenses with ToF, but uses a second camera as a range finder (while the first shoots), and is also experimenting with Dotprojector technology – projecting a grid of rays onto 3D objects (used in frontal face recognition).

It is curious that in the history of photography there were already “multi-eyed” devices: at the end of the 19th century stereoscopes were popular, and then two-lens “mirrors” entered the market for several decades. But then they surrendered under the pressure of simpler and more accessible single-lens ones. Now smartphones are resurrecting a half-forgotten branch of phototechnology.

Have you ordered the details?

Having coped with the expansion of DD and simulating aperture optics, smartphones had to master the third whale of high-quality photography – high detail. To do this, we used pixel-coding – a technique based on in-body image stabilization (IBIS) technology, when the sensor is mounted on a moving platform, which shifts when the photographer’s hands oscillate in order to avoid “blur”. It has been known in the photo industry since the mid-2000s, but before pixel-shifting – the development of IBIS’s idea of stacking by movement – mobile phone manufacturers have come faster than most photo brands.

Xiaomi Mi Note 10: awesome details, pixel shifting

The essence of the reception is that by default the camera shifts the sensor by 0.5–1 pixels. Having made up to a dozen photos with a picture shifted in all directions, the gadget collects them into one, getting many times higher resolution. In fact, this is the same as stacking when creating a panorama from a series of frames, only with a microscopic camera shift.

The main purpose of pixel encryption is digital zoom without loss of quality. This contradiction seems to be hidden in this phrase, because it seems customary that with a strong approximation, the picture splits into individual pixels (glitch effect). But it turned out that a solution was nearby: several defective photos turn into one decent.

This allowed vendors to realize a hybrid zoom: a lens with optical zoom plus digital zoom with stacking give an increase in the original image by 20-30 times – in fact, the smartphone turns into a telescope. As recent announcements showed, this is not the limit: in Xiaomi Mi 10 users will find 50x zoom, in Samsung S20 Ultra – 100x. The hybrid approach itself, when the properties of optics, matrices and algorithms give a multiplier effect, is a clear find of computational photography. There is no such synergy between individual camera nodes, and sometimes stabilization in the lens and on the matrix does not work at all at the same time.

It is known that Google (Super Resolution technology) is working on improving pixel encryption: according to rumors, in the long run the result will be comparable to the popular technique from science fiction films, when the image can be enlarged endlessly, restoring the criminal’s face to reflect in the victim’s pupils. Similar developments are underway at Yandex (DeepHD). And Apple learned to increase the detail of images without pixel encryption, by gluing duplicates taken with different shutter speeds (DeepFusion). There are other ways to restore the image (debut): coded shutter, coded aperture, phase coding … Before use in smartphones, they have not yet matured, but are being studied by IT companies.

The night turned into day

The top of computing photography, perhaps, can be considered night shooting. It is believed that where, where, and in low light the camera will give odds to a mobile phone. But if photo brands have always relied on increasing photosensitivity (plates, films, and now a digital sensor), then the mobile industry has come up with a much more inventive problem. An example is Night Sight on Google Pixel smartphones.

Awesome night mode on Google Pixel smartphones — Photo by Andre Benz on Unsplash

In it, the IT giant had to use shooting in RAW, HDR-stacking, “blur” compensation, scene recognition by neural networks (they, for example, are responsible for bringing the picture to a single color scheme illuminated by “warm” and “cold” lamps). And the appearance of the second camera in last year’s Pixel 4 made Night Sight suitable even for shooting stars. In sum, this creates a sense of magic: the eyes see pitch darkness, and in the photo – light twilight. As the forums joke, soon a black cat can be removed on a smartphone in a dark room and it will be clear.

To be continued.

Stacking

Multi-eyed invasion

Personal portraitist

Have you ordered the details?

The night turned into day

Leave a Reply Cancel reply