University of Georgia

Jaewoo Lee: Preserving privacy in an ocean of personal data

Photography By Andrew Davis Tucker
University of Georgia researcher Jaewoo Lee "data mining" in a gold mine.

Social media, facial recognition software, e-commerce, data mining—many of today’s technologies would have been unimaginable just a few decades ago. As technology has improved, the amount of personal data individuals share has grown along with it. What are the ramifications of this, and how do people ensure their privacy is preserved?

That’s where researchers like Jaewoo Lee, an assistant professor of computer science in the Franklin College of Arts and Sciences, come in. Lee studies privacy-preserving machine learning, focusing on how computer algorithms can safely collect data and protect personal information in data analyses. This year Lee was awarded a prestigious NSF CAREER Award to support his machine-learning research.

Lee primarily develops theories for other researchers to consider when programming their machine-learning algorithms, like injecting random noise into a dataset or replacing personal information with random information. He’s helped create several algorithms, including one that preserves individuals’ privacy when they share their phone call history to detect spam calls, and one that flags self-harm and other violent images on social media.

How do third parties access and use our data?

There are two different paths third parties can take to access your data. The first one is the illegal path. There are many hackers out there whose job is to poke around, try to find systems that are not secure and install a back door to hack into your system. Normally, the hacker’s target is not the system they hack, but a bigger system that can be accessed from a compromised computer.

There is also a legal way for third parties to access personal information. Many companies with large user bases, like Amazon, Facebook and Google, collect information from their users. You’re forced to consent to a service’s terms and conditions when you sign up. These terms and conditions list that the service has the right to collect your information, but also the responsibility to protect your private information. Even when a service has no specific plan or need to use the collected data, they do it anyway because they believe it may provide useful information for their business operations in the future. Small companies normally share their data with third parties because they cannot analyze the data themselves. And if there’s a security hole in that third party, your privacy is already breached.

How can the average person better protect their data?

I recommend people take the time to read the terms and conditions when they sign up for a service. Most of the time, it’ll list what information will be collected and with whom the information will be shared. When you provide information, think about what information can be inferred. For example, most people are less reluctant to provide zip code, date of birth and gender, but one research paper showed that the combination of those three pieces of data can uniquely identify 78% of the U.S. population. Hackers can use those pieces of information to breach your privacy. My recommendation is, when you share personal information, think about how it will be used and think about the combination of those attributes. How many people will share the same zip code, date of birth and gender with me? Is the number large enough to hide myself in a group of individuals?

There is much talk about leveraging data from personal devices in the fight against COVID-19. What are your thoughts about that?

I’m sure that data will provide useful information to help prevent the spread of COVID-19. But at the same time, it could be a very dangerous idea because one major difficulty in private data release is that we don’t know how the released information could be used in the future. That’s hard to predict. To use the collected data for social benefit while protecting privacy of members of our society, we need to build trustable systems that ensure the collected data will only be used for the purpose it is collected, i.e., to monitor COVID-19 related health issues.

What is the difference between machine learning and artificial intelligence?

Artificial intelligence is an umbrella term that includes any approach that gives a machine some form of intelligence. Intelligence, in this sense, is when we want our machines performing high-level tasks rather than simply performing predefined routines. This can include predicting the future or inferring some information based on other information. Machine learning is a technical branch of AI in which intelligence is created by learning from data using statistical models. In the early stages of artificial intelligence, people tried to inject our human knowledge into machines through hard coding. But if we hard code knowledge into a computer program, its knowledge doesn’t change, evolve or improve. Later on, people observed that we should let machines learn from datasets rather than hard coding knowledge into it. That way, if a dataset changes, then the knowledge learned from that dataset will also change, which is where machine learning comes in.

How has the field of artificial intelligence changed over the past decade?

Artificial intelligence applications have increased, and I think they will have an ever-growing influence in every aspect of our daily lives. In the past, our focus with machine learning was simply to retrieve useful information to help humans make better decisions. These days, people want machines to do more than just make predictions based on historical data. Rather, we want machine-learning applications to make smart decisions by themselves and take actions. One example of this is autonomous vehicles, which are equipped with sensors that collect information from the environment. Based on the information it collects, the vehicle makes predictions, like whether a car will change lanes or a pedestrian will jump into the road. If this is true, the vehicle will stop moving. But if it turns out the vehicle’s prediction was wrong, its machine-learning algorithms use that information to modify itself based on a trial-and-error process.

What are the most likely ways these technologies will affect our lives in coming years?

While many people are impressed by the accuracy and performance of current AI systems, many machine-learning researchers agree that we have a long way to go. We imagined 20 years ago that we would have flying cars and pills that provide all the nutrition people need by 2020. We are not there yet, and I think the same thing is happening. There is a lot of hype in machine learning, but, as a researcher, I think we are still at the very early stages. We don’t yet completely understand why machine-learning algorithms make accurate predictions, but new research papers published every year improve our understanding and introduce new techniques. Regardless of its speed, researchers also agree that more sectors will be employing AI systems (if they haven’t already), and their impact in our daily lives will continue to grow.

We are slowly progressing toward a future in which everyone can trust the accuracy of computer algorithms. In some domains, we already observe the performance of AI systems surpassing that of humans. For example, AI can detect and identify things that are not detectable by human eyes in medical image analysis. Once we can trust the accuracy and safety of AI systems, many human jobs that are repetitive and routine in nature will be replaced by intelligent machines. This means many jobs will disappear in the future. At the same time, it also means that new jobs in which people can focus on more complex and creative tasks will be created. In order to embrace AI systems in our daily lives and to use them to benefit our society, we need to ensure that those systems guarantee privacy and are implemented ethically and responsibly.