Twitter has worked hard to ensure that its platform has had healthier conversations for a while now. And so far, those efforts are paying off as the number of users grows every quarter.
One of the features that Twitter has been researching to limit hate speech is a warning that may appear on the user’s screen if the user wants to comment that could hurt another. Nobody. The company began testing this alert in 2020.
At the time, Twitter stated, “When it starts to warm up, you can say things that you don’t want to say. To help you rethink an answer, we’re doing a limited experiment with a command prompt on iOS that gives you the option to review your answer before posting if any potentially harmful language is used. “”
These warnings showed a positive impact on the quality of the conversations during the tests. And this week Twitter is announcing the launch of the feature. These warnings can now be viewed on the screens of all iOS and Android app users who use the English language (however, Twitter suggests other languages are supported).
A system that, according to Twitter, works
If Twitter provided this feature after a year of testing, it is because the company has determined the impact on user behavior. In fact, 34% of people who saw the warning would have chosen to revise their responses or end up sending none. And 11% of people who saw these warnings would have posted less offensive comments afterward. Twitter also states that someone who received this warning is less likely to receive offensive responses.
Improvements have also been made to this mechanism. Twitter admits there were a lot of false positives when testing began. Sometimes people saw the social network’s warning unnecessarily “because the algorithms that fed the prompts had difficulty picking up nuances in many conversations and often failed to distinguish between potentially offensive language, sarcasm, and friendly jokes.” “”
To fix this, the Twitter algorithm is now taking into account the relationships between two users. So if the author of a tweet and the person about to reply follow each other and have regular discussions, this will be taken into account as to whether or not the alert is triggered.
Twitter also says that it has improved its algorithm to know when certain words are not being used in an offensive manner. On the other hand, Twitter also collects user feedback so that users can indicate whether triggering this feature is relevant or not.
Here is the message displayed by Twitter (translated from English) when the algorithm believes the user is about to post an offensive response: “Would you like to check this before you tweet?” We ask people to review responses that contain potentially harmful or offensive language. “