(Written in Fall 2024, during my Meta AI internship)
It’s year 2024–––only three years since LLMs become (somewhat) general-purpose and can follow instructions. Who would have guessed how much AI has shaken the whole world to the point that Nobel Prize in Physics was awarded for “laying the foundation for today’s powerful machine learning”, and Nobel Prize in Chemistry was awarded for “using AI to predict proteins’ complex structures”?
It is kind of surreal for me being part of this revolution. In this post, I want to share some of my (some may be bitter) observations.
Before I started my PhD, I was working on frame semantics. I was not really satisfied with my work, because in the back of my mind, I couldn’t see how my work that leads to linguistic-driven solutions can scale up and solve any NLP problems in any language. Note that linguistic tasks also often don’t come with enough data, so training models on linguistic data does not give any significant performance gains–––in today’s terms, they don’t pass the vibe check.
Anyone who transitioned from computational linguistics into LLMs scaling during the time (around year 2020-2022) must have come across the Stochastic Parrot paper and the hype around linguistic-driven language modeling. At that time, I was really confused on whether I should work with LLMs. Many linguists were saying that “language models are simply regurgitating what they are trained on”, that “next-token prediction cannot learn the language (and therefore cannot do incredible things like humans do)“.
Ultimately, I decided to lean fully into LLMs and believe the magic of scaling. My decision eventually comes down to “if AlexNet can replace hand-engineered kernels, large language models can do the same”. You might think it is a no-brainer from today’s point of view, but at that time, I was really conflicted when I made this conscious decision.
I would say I was (at least) part of the recent LLM revolution hype train. I contributed to the first instruction-following model (T0) and its following series of multilingual instruction-following models such as mT0 and Aya. I also contributed the low-resource-language jailbreaking technique to AI safety field, and I am currently working on alignment.
What is hyped stuff? Those stuff that you know can be easily classified as one of the viral topics on X that people cannot stop talking about. The cornerstone paper on the topic had blown up and gained thousands of citations in less than a year (think about T0/Flan for instruction-following, GPT-3 for in-context learning, chain-of-thought for reasoning, etc.)
Voices about on hyped stuff are often loud for a period of time, and things move really fast and are competitive. Hyped stuff usually catches attention of people who have never worked on it directly before, and yet they turned into huge proponents for it in academic social circle after a short while.
LLMs were hyped stuff in 2021 and are commodity now. Everyone now works with LLMs to some degree, regardless of how much they believed in LLMs in pre-ChatGPT era. I’d say now there are two groups of people: (1) those who are excited to work on it (usually early adopters or (2) those who begrudgingly have to work on it (because of reviewers #2 and where funding comes from).
<aside> <img src="/icons/light-bulb_orange.svg" alt="/icons/light-bulb_orange.svg" width="40px" />
This is my bitter observation. If the hype train is right, you will have to directly join it anyway if you want to stay relevant in AI research. So why not join early? Hence, study the trend, think two steps ahead, and iterate fast (Omar Khattab has really good advice on this) in the direction of hype train (that you think is the right direction).
</aside>
One common argument against working on hyped stuff is that it is very easy to get scooped. Yes that is inevitable. My multilingual toxicity work (where we are trying to collect multilingual prompts that elicit toxicity) got scooped by Microsoft and AllenAi–––they released their work around the same time, and we were only halfway to completion so we had to halt our project entirely.
But I think people who complain about being scooped and hence shy away from hyped stuff miss the big picture. Hyped stuff is nascent by nature. There are exponentially more things remain unexplored and you can always find another research problem to pivot into.
In my case, we stopped our project but the project (despite abandoned) had enabled me to fully understand what problems haven’t been studied and what resources are available to research on, so I quickly bang out one paper in just a month, which got into EMNLP 2024.