• Skip to content

Georgia Tech at ICLR 2022

GT Research in Machine Learning

  • Lead Author Spotlight
  • Authors
  • Papers
  • About GT@ICLR

Main Content

Georgia Tech at ICLR 2022

The International Conference on Learning Representations (ICLR), taking place April 25-29, is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics.

Explore our interactive virtual experience of Georgia Tech research at ICLR and see where the future of deep learning leads.

Gain insights into advancements in deep learning research by exploring Georgia Tech’s latest work. Interact with the chart to access the papers or search for individual authors.

ICLR 2022
Lead Author Spotlight

Chen Liang, PhD in Machine Learning student

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

#TRAINING LARGE TRANSFORMER MODELS
We propose a novel training strategy that encourages all parameters to be trained sufficiently. Specifically, we adaptively adjust the learning rate for each parameter according to its sensitivity, a robust gradient-based measure reflecting this parameter’s contribution to the model performance. A parameter with low sensitivity is redundant, and we improve its fitting by increasing its learning rate. In contrast, a parameter with high sensitivity is well-trained, and we regularize it by decreasing its learning rate to prevent further overfitting. We conduct extensive experiments on natural language understanding, neural machine translation, and image classification to demonstrate the effectiveness of the proposed schedule.

Q&A with Chen Liang

If you bumped into the plenary speaker at the conference, how would you describe your paper to them in 30 seconds?

Recent research has shown the existence of significant redundancy in large Transformer models. One can prune the redundant parameters without significantly sacrificing the generalization performance. However, we question whether the redundant parameters could have contributed more if they were properly trained. To answer this question, we propose a novel training strategy that encourages all parameters to be trained sufficiently. Specifically, we adaptively adjust the learning rate for each parameter according to its sensitivity, a robust gradient-based measure reflecting this parameter’s contribution to the model performance. A parameter with low sensitivity is redundant, and we improve its fitting by increasing its learning rate. In contrast, a parameter with high sensitivity is well-trained, and we regularize it by decreasing its learning rate to prevent further overfitting.

How did this work push you to grow technically, as part of a team, or in applying your work to the real world?

I worked on developing parameter-efficient large neural models from a novel perspective. Instead of pruning redundant parameters, we train them sufficiently so that they also contribute to the model performance. Such a new perspective gives me inspiration for future research.

What key takeaways would you share with people to help them remember your work?
  • There exists a significant number of redundant parameters in large Transformer models, which can hurt the model generalization performance.
  • We propose a novel training strategy, SAGE, which encourages all parameters to receive sufficient training and become useful ultimately.
  • SAGE adaptively adjusts the learning rate for each parameter based on its sensitivity, a gradient-based measure reflecting this parameter’s contribution to the model performance.
  • A parameter with low sensitivity is redundant and we improve its fitting by increasing its learning rate. A parameter with high sensitivity is well-trained and we regularize it by decreasing its learning rate.
  • SAGE significantly improves model generalization for both NLP and CV tasks and in both fine-tuning and training-from-scratch settings.
Career or personal advice you would want from your future self?

Career advice: What is the best decision you have made in the workplace?
Personal advice: How to better manage work-life balance?


DISCOVER MORE LEAD AUTHORS

In Pursuit of Better Data Forecasting

Yao Xie is aiming to create higher confidence in the data models that drive societal outcomes

By David Mitchell

In countless fields from law enforcement to medicine and many in between, professionals look to make educated predictions about outcomes based on available data.

Consider police who may want to find two crimes committed by the same person. Can they find the needle in the proverbial haystack of countless police reports? How about a doctor trying to make decisions that will prevent patients from encountering sepsis? Can they parse enough data – blood pressure, blood sugar, temperatures, heart rate, and more – at various times to come up with a prediction with high enough confidence levels to help inform a doctor’s potentially life-and-death decisions?

Current approaches typically rely on statistical models, but those are often simplified to avoid plummeting confidence levels. A new study by researchers in Georgia Tech’s H. Milton Stewart School of Industrial Systems and Engineering (ISyE), and led by Yao Xie, associate professor in the school, offers a new approach using deep learning that could change the way decisions are made in similar situations.

Full Story

Georgia Tech research at ICLR includes some of the latest innovations in deep learning. Discover the people and get full access to details and papers in the main papers program.

Explore papers
EXPLORE AUTHORS

WHO’S WHO

click chart to interact with every organization at ICLR

Machine Learning Center at Georgia Tech

The Machine Learning Center was founded in 2016 as an interdisciplinary research center (IRC) at the Georgia Institute of Technology. Since then, we have grown to include over 190 affiliated faculty members and 145 Ph.D. students, all publishing at world-renowned conferences. The center aims to research and develop innovative and sustainable technologies using machine learning and artificial intelligence (AI) that serve our community in socially and ethically responsible ways. Our mission is to establish a research community that leverages the Georgia Tech interdisciplinary context, trains the next generation of machine learning and AI pioneers, and is home to current leaders in machine learning and AI.

Read more

Copyright © 2025 · Altitude Pro on Genesis Framework · WordPress · Log in