11. November 2020 - Providentia editors

Data protection: Anonymization is the ideal solution

When attempting to process personal data in compliance with the GDPR, there are often gray areas. Unless, that is, the data from autonomous vehicles or connected traffic is first anonymized and then processed – as in Providentia++.

What is the data used – and not used – for?

If you have ever driven past a sign which is equipped with various cameras and radars, you probably took a frantic look at your speedometer. You couldn’t know that these sensors are not meant for surveillance and checking speed, but for research. The Providentia++ project, for example, which uses sensor stations along the A9 highway, is interested in traffic data. This serves to digitally map vehicles’ movements. With the help of a digital twin, it is possible to analyze how drivers handle their vehicles, when they change lanes, and how they react when the car in front of them suddenly brakes. With the help of large amounts of data, is able to make recommendations. This data also helps manufacturers of digital assistance systems to validate their systems and make them even more precise.

What does the GDPR require?

In the EU, however, processing personal data is strictly prohibited. “The General Data Protection Regulation (GDPR) defines the possible ways of establishing a legal basis,” explains Prof. Uwe Baumgarten from the TU Munich. As stated in Article 5 of the GDPR, personal data must be “processed lawfully … and in a transparent manner in relation to the data subject.” The GDPR also specifies that personal data must be “accurate” and “processed in a manner that ensures appropriate security of the personal data.” It may only be stored for as long as is “necessary for the purposes for which the personal data [is] processed.” There is in fact some room for interpretation here. Article 6 of the GDPR stipulates several conditions under which data may be processed: For example, the person concerned may give his or her consent to the use of personal images or data. Then nothing stands in the way of this data being legally processed. Data can also be used for the performance of a contract. For example, anyone who buys a vehicle signs a sales contract. “If it serves to fulfill the contract, the personal data contained therein may be processed,” says Prof. Baumgarten. Customers will hardly object to this when it comes to repair shop visits or software updates. There are also a few other conditions that allow personal data to be processed, such as “compliance with a legal obligation” or “to protect … vital interests.” But what about when image data and videos are collected? “The area between possible personal consent and contract performance is often fluid,” says Prof. Baumgarten. “This is a gray area.”

Which data processing serves to perform a contract?

A current example: the US car manufacturer Tesla, which has just been “awarded” the Big Brother Award by Digitalcourage e.V. for alleged data protection violations. While personal data may be processed in the USA, the laws in the EU are clearly regulated in the GDPR. The dilemma is self-inflicted, as , the former data protection commissioner of the state of Schleswig-Holstein, writes. According to his 37-page analysis, it is not clear from the Tesla’s general terms and conditions whether and to what extent the company collects personal data. Binding statements on data deletion cannot be found. In addition, it is unclear which sensor data Tesla transmits and stores, and which data remains in the car and is overwritten. In total, the lawyer names nine concrete reasons why Tesla is violating European data protection and consumer protection regulations. Among other things, Tesla is “jointly responsible for unnecessary, comprehensive, unrestricted video surveillance and the subsequent processing both in driving and parking mode” and does not state “on which legal basis … the data processing it carries out is based.” The Tesla case makes one thing clear: Since far more data about drivers and vehicles is now generated than just a license plate number, the protection of personal data will become increasingly important.

Why is anonymizing data the safest method?

“So the most important question should be how personal references can be removed from data,” says Prof. Baumgarten from the TU Munich – through anonymization, for example. Especially since it is hardly possible to ask passing vehicles for permission. In the Providentia++ architecture, the sensor data is anonymized “close to the point of capture.” In the first step, traffic data is stored locally. Objects such as trucks, buses, and cars are identified and anonymized locally before the data is fused with all available sensor data in the next step to create a digital twin. “When it comes to autonomous driving, no one is interested in who is in the car or where the car is coming from or going to,” says Prof. Baumgarten. At Providentia++ partner Cognition Factory, only data that does not allow conclusions to be drawn about data protection–relevant information is stored for analysis and interpretation. This means, for example, that the position of a vehicle on an image and the vehicle class (car, truck, etc.), is stored for a period of one hundred days, but the image of the object itself is not stored. After one hundred days, the data disappears from the servers. In general, however, the GDPR also provides for the possibility of archiving data for “scientific or historical research purposes” (GDPR, Article 89, paragraph 1).

Can neural networks be trained with anonymized data?

Since data is important for things like training neural networks, some tension arises here. In order to train algorithms to recognize vehicles, Dr. Claus Lenz from Cognition Factory had to use available data sets from the USA and Asia, even though the trucks there look different than those in Europe. Little by little, he was able to add his own data. Today, his algorithms can very reliably detect not only cars, but also trucks and buses. “Feeding algorithms with a large amount of data shortens their learning curve,” says Lenz. Depending on the method used to anonymize license plates and people, the images may become “noisy,” which limits their processing by algorithms. The company brighter AI, a start-up from the incubator of the automotive supplier Hella and a new associated partner of Providentia++, has found a way around this by recognizing personal image information and anonymizing it with artificially generated replacement data. “The anonymization of license plates and faces is very natural, so the algorithms can learn on the basis of this data,” says Marian Gläser, CEO of brighter AI. The technology has already been used for analyses at Deutsche Bahn, among others. The process is as follows: Personal data is transferred in encrypted form to a server and anonymized in the cloud. Only then is it processed further. The brighter AI approach anticipates Prof. Baumgarten’s greatest wish, so to speak: that the personal reference is removed from images and neural networks still remain capable of learning.

Picture and video: brighter AI

Bild und Video: brighter AI


1. July 2022

Cognition Factory: Evaluate and visualize camera data

Since the beginning of research on the digital twin, AI specialist Cognition Factory GmbH has focused on processing camera data. In the meantime Dr. Claus Lenz has deployed a large-scale platform


1. July 2022

Digital real-time twin of traffic: ready for series production

Expand the test track, deploy new sensors, decentralize software architecture, fuse sensor data for 24/7 operation of a real-time digital twin, and make data packets public: TU Munich has decisively advanced the Providentia++ research project.


11. May 2022

Elektrobit: Coining Test Lab to stationary data

Elektrobit lays the foundation for Big Data evaluations of traffic data. Simon Tiedemann on the developments in P++.