I am a first-year PhD student in Linguistics at Stony Brook University. My research interests are in Natural Language Processing, Computational Linguistics, Formal Language Theory, Machine Learning, and Computational Social Sciences.

I work with linguistic corpora, both real and synthetic. My current research focus is on understanding how and how well neural networks learn and generalize in light of formal language theory and computational learning theory. I appreciate the insights from the traditional symbolic learning literature and believe that there is a tremendous benefit of utilizing these insights to examine and explain the capabilities of neural networks, particularly the prevailing large language models (or foundation models).

Inspired by data-centric AI proposed by Andrew Ng, I am also interested in finding a better and more efficient way of building robust NLP models with small data. My past research is highly applied, with an aim to understand the social meanings of actual language use in both spoken and written linguistic data.

I am currently looking for a 2023 summer ML/NLP or related internship in US or Canada.

Bio

I was born and raised in Fuqing, a small southeastern town of China. Prior to coming to Stony Brook, I completed a bachelor's degree in Chinese Language and Literature from Hunan University, and a master's degree in Applied linguistics from University of Saskatchewan.

I am a proud self-taught and self-motivated programmer. I started learning programming since 2020, and have managed to make programming relevant to and then part of my daily life. Looking back, I am glad to find my experiences with NLP align well with the three major phases of the field featured as: rule-based (symbolic) methods, statistical machine learning, and deep learning.

Feel free to reach out, if you are interested in talking with me!

CV

Here is my CV.

Research

Resources

Deep Learning

Text Processing

Web Scraping

  • Google Scholar Analyzer : Auto-aggregating academic profiles of researchers on Google Scholar.
  • YouTube Info Collector : An interface to scrape information (video titles, post dates, view counts, like counts, and comments etc.) from YouTube videos based on queries, video links, or channel links.

Chinese-related