Would you like to know how two Whatsapp chat bots had a fight?
Being an engineer at a tech startup, you get to witness many strange occurrences. This is one of the funniest I’ve come across.
BusinessChat.io is a customer support and marketing platform over Whatsapp. We have a chat bot engine, and a marketing campaigns module that lets our users send a message to tens of thousands of their customers with the click of a button. Our user uploads a list of contacts, assigns them to a group, and reaches out to all of them simply by picking a message template, and hitting “Send”.
When a contact replies to a marketing message, they enter a conversation with the chat bot. Whatsapp bots usually reply with quick-reply messages, which are interactive messages that have buttons the contact can tap to let the bot know what they want. So for example, the bot might ask the contact: “How can I help you today?”, with buttons saying “Talk to customer support” and “Find our nearest branch”.
With BusinessChat, if the bot prompts you to tap a button and you respond with something it doesn’t understand (i.e. text different from any of the buttons it expects you to click) the bot will reply to you with “Sorry, I don’t understand” and will once again prompt you to click one of the buttons.
So, one peaceful day, a colleague of mine taps me on the shoulder and says “There’s a conversation that takes forever to load.”
For some context, in BusinessChat, a conversation is a sequence of events and messages exchanged between a business and one of its contacts. You can load the conversation for a specific contact in the UI. Since the beginning of the project, we thought these messages and events weren’t worth paginating or lazy-loading, since the longest conversation ever made is about 500 messages long, and messages aren’t large at all. So for the longest time now, conversation loading without pagination has been fast and cheap.
My colleague and I began our investigation. We started by checking out the number of messages in the troublesome conversation, and to our surprise, it had 90,000 messages.
My mind started generating hypotheses. “Is there a bug in the retry logic that infinitely tries to send a failed message? No, it’s hard-coded to retry 3 times only” and “Is there an infinite loop in one of the bots? Impossible, I validate that bots have no invalid cycles.”
About five stressful minute into the investigation, all doubts were erased when we saw the actual content of the conversation. It was entirely made of messages saying “Sorry, I don’t understand”, sent not only from our bot, but also the other number. It was two bots telling each other they couldn’t understand one another 45,000 times.
What triggered this battle of bots was that one of our users had imported a list of contacts where one of the numbers was connected to a bot. When the user started a marketing campaign, and sent a message to this number, war began and lasted 3 hours until, presumably, someone turned off the other bot.
I would like to believe that someone really did notice the infinite loop and turned off the bot. The alternative would be that our bot took down their service, which I really hope wasn’t the case.