Human and LLM perceptions of necessity, possibility, and modality

Date worked on:

October - November 2024

Research context:

For a project in COGS 50.05 (Psycholinguistics), my group members and I designed and conducted an experiment comparing human & LLM perception of necessity in language.

My involvement:

Lead researcher

Collaborators:

Yvonne Chen (Dartmouth undergraduate, co-lead researcher)

Yiran Jiang (Dartmouth undergraduate, co-lead researcher)

Daniel Lubliner (Dartmouth undergraduate, co-lead researcher)

Adam Tobeck (Dartmouth undergraduate, co-lead researcher)

Dr. Samantha Wray (course professor)

Over five weeks, my group members and I designed and conducted a research study to address the question: how do humans and generative AI, specifically the LLM ChatGPT Model GPT-4o, interpret the necessity of a task based on different expressions of possibility with modal verbs?

We asked both ChatGPT and human participants to rate the perceived necessity of tasks on a scale from 1 to 5, with each score accompanied by a label (ex. 2: less necessary). The sentences in each prompt used different modal verbs ( “can,” “may,” “should,” “must,” and “need.”), each of which were tested with two prompts, for a total of 10 test sentences for each participant group.

For both human and GPT responses, language of possibility was negatively correlated with necessity ratings, where modal verbs with lower levels of possibility had the greatest level of necessity. Moreover, subjects disagreed more about modal verbs indicating lesser necessity, and less about verbs indicating higher necessity.

The responses given by GPT-4o closely mirror those put forth by human participants, suggesting that the model can replicate some aspects of how humans interpret languages. These results of this study have implications for understanding human decision-making, AI comprehension of human language and modal cognition, and linguistic relativity.

Read the full version here!