What’s the Best Way to Brainwash an LLM?

https://towardsdatascience.com/whats-the-best-way-to-brainwash-an-llm/(towardsdatascience.com)

An experiment was conducted to determine the most effective way to fine-tune a language model to adopt a specific persona, C-3PO. Three Supervised Fine-Tuning (SFT) strategies were compared: training on conversational demonstrations, first-person statements, and third-person synthetic documents. Using perplexity and human evaluation, the study found that training the model on first-person statements was surprisingly the most effective method. This approach led to the most generalized and internalized persona, suggesting that updating a model's self-representation is a powerful technique for personality adoption.

0 points•by ogg•1 month ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?