Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Human language reflects our social values, biases, and moral judgments. Large language models (LLMs) trained on extensive human texts may therefore learn or encode such information, allowing them to generate responses within moral and ethical domains. Investigating whether LLMs exhibit human-like (including potentially biased or skewed) moral judgments is therefore crucial. Recent moral psychology research suggests that humans tend to have stronger negative reactions toward, and attribute more blame to, intelligent autonomous machines than to fellow humans for identical harm. Here we examine whether LLMs (OpenAI’s GPT-3.5 and GPT-4) exhibit a similar bias against machines in the specific domain of driverless cars. We replicate experiments from two previous studies in the USA and China and find that GPT-4 (but not GPT-3.5), similar to human participants reported previously, consistently rates machine drivers as more blameworthy and causally responsible than human drivers for identical traffic harm (Study 1), while also rating machine versus human drivers’ identical actions as more harmful and morally wrong (preregistered Study 2). This asymmetry in moral judgments is replicated across both LLMs and human participants in a new crash scenario that is unlikely to have been included in the LLMs’ training sets (preregistered Study 3). We discuss whether the blame bias against machines might be morally justified, and also propose that its presence in humans and LLMs could be due to different mechanisms.

Original publication

DOI

10.1080/10447318.2025.2526593

Type

Journal article

Journal

International Journal of Human Computer Interaction

Publication Date

01/01/2025