Road Rage Against the Machine: Humans and LLMs Share a Blame Bias Against Driverless Cars
Chu Y., Liu P., Savulescu J., Earp BD.
Human language reflects our social values, biases, and moral judgments. Large language models (LLMs) trained on extensive human texts may therefore learn or encode such information, allowing them to generate responses within moral and ethical domains. Investigating whether LLMs exhibit human-like (including potentially biased or skewed) moral judgments is therefore crucial. Recent moral psychology research suggests that humans tend to have stronger negative reactions toward, and attribute more blame to, intelligent autonomous machines than to fellow humans for identical harm. Here we examine whether LLMs (OpenAI’s GPT-3.5 and GPT-4) exhibit a similar bias against machines in the specific domain of driverless cars. We replicate experiments from two previous studies in the USA and China and find that GPT-4 (but not GPT-3.5), similar to human participants reported previously, consistently rates machine drivers as more blameworthy and causally responsible than human drivers for identical traffic harm (Study 1), while also rating machine versus human drivers’ identical actions as more harmful and morally wrong (preregistered Study 2). This asymmetry in moral judgments is replicated across both LLMs and human participants in a new crash scenario that is unlikely to have been included in the LLMs’ training sets (preregistered Study 3). We discuss whether the blame bias against machines might be morally justified, and also propose that its presence in humans and LLMs could be due to different mechanisms.