I am a first-year PhD student in Linguistics at Stony Brook University. My research interests are in Natural Language Processing, Computational Linguistics, Machine Learning, and Computational Social Sciences.

I employ both computational and linguistic approaches to (A) understanding a better and more efficient way of building robust NLP models with small data, an effort toward what Andrew Ng termed as data-centric AI; (B) studying the properties of languages and their linguistic and social meanings inferred from actual language use in large-scale linguistic corpora. I am intrigued by computational learning theory and have a great intellectual interest in the efficiency, sparcity, and explainability of large pretrained language models (i.e., foundation models).

I am currently looking for a 2023 summer ML/NLP/CompLing/Language Engineer or related internship in US or Canada.


I was born and raised in Fuqing, a small southeastern city of China. Prior to coming to Stony Brook, I completed a bachelor's degree in Chinese Language and Literature from Hunan University, and a master's degree in Applied linguistics from University of Saskatchewan.

I am a proud self-taught and self-motivated programmer. I started learning programming since 2020, and have managed to make programming relevant to and then part of my daily life. Looking back, I am glad to find my experiences with NLP align well with the three major phases of the field featured as: rule-based methods, statistical machine learning, and deep learning.

I am a highly curious person. When I "was young", I traveled a lot. Now I more enjoy experiencing life at a place for a while, instead of rushing for pictures. I also like reading and writing. Here is a collection of my essays in Chinese on law, history, sociology, literature etc. Feel free to reach out!


Here is my CV.



Deep Learning

Text Processing

Web Scraping

  • Google Scholar Analyzer : Auto-aggregating academic profiles of researchers on Google Scholar.
  • YouTube Info Collector : An interface to scrape information (video titles, post dates, view counts, like counts, and comments etc.) from YouTube videos based on queries, video links, or channel links.