0
LLM Themes Are Not Observations
https://towardsdatascience.com/llm-themes-are-not-observations/(towardsdatascience.com)Using themes extracted by LLMs from text like support calls in causal analysis is fraught with hidden dangers, as these "generated variables" are not direct observations but outputs from a model applied to a self-selected group. The timing of when the text was created is crucial, since treating a post-treatment theme as a pre-treatment control variable can severely bias an analysis and even reverse its conclusions. Furthermore, these analyses are often distorted by selection bias because the customers who choose to generate text are systematically different from those who do not. Failing to account for these selection, timing, and measurement issues can lead to flawed insights, such as making a beneficial business strategy appear actively harmful.
0 points•by will22•18 hours ago