Few lines of script to help devs to learn orthography
In our remote company we are heavily using text messaging tools, we’ve tried Skype, then HipChat and then Slack which we are using until now. One of the coolest features in Slack is its scripting capability. Moreover Github built Hubot – a bot running our scripts. It can be easily integrated with Slack. You deploy bot on Heroku, put custom scripts in proper directory and that’s it.
We are using Slack for some time and I’ve planned to play with custom scripting since beginning of Slack in SoftwareMill. Yet I didn’t have the any idea good enough that I would love to implement it right away. That was the case until last week when I’ve noticed that we are making quite a lot of orthographic mistakes. And when you often see words written with error, your internal ‘correct or not’ detector starts to go haywire – you are no longer sure if that word should be written this way or another. So I’ve decided to write a script that will try to spot such errors, correct and comment them in a gentle way :)
module.exports = (robot) -> robot.hear <regexp>, (msg) -> # Code reacting on any message written in any chat room that matches <regexp> msg.send 'Example response to chat room' robot.respond <regexp>, (msg) -> # Code reacting on any message written to our bot that matches <regexp> msg.send 'Example response 2 to chat room'
Because we are interested in watching every room (channel) we will only play with hear method.
At the beginning I wanted to implement something simple to check that my idea working and to see if it is easy to write and run custom script on Hubot. And after that, I could make my script more sophisticated, etc. So at first I’ve created something like this:
module.exports = (robot) -> robot.hear prepareErrorDetectingRegEx(), (msg) -> author = msg.message.user.name grammarFailure = msg.match exclamationSentence = msg.random messages msg.send '@' + author + ', ' + exclamationSentence + '! It should be *' + errors[grammarFailure.toLowerCase().trim()] + '*' prepareErrorDetectingRegEx = () -> errorWords =  for k, v of errors errorWords.push k joinedErrors = errorWords.join('|') errorDetectingRegex = new RegExp('.*(' + joinedErrors + ').*', 'i'); return errorDetectingRegex # key-value pair where key is an error and value contains correct word errors = 'wziąść' : 'wziąć' 'wziasc' : 'wziąć' 'wziaśc' : 'wziąć' 'wziasć' : 'wziąć' 'pokarze' : 'pokażę' 'pokarzę' : 'pokażę' 'żądzić' : 'rządzić' 'żadzić' : 'rządzić' 'żadzic' : 'rządzić' 'ządzic' : 'rządzić' messages = ['come on', 'gimme a break', 'are you serious?'] # for a Polish list of 'gentle' messages check source code (link at the end of the post) :-)
Entry point of our script is in robot.hear function. Wee are creating regEx in prepareErrorDetectingRegEx which loads all errors and producing OR combo with all possible mistakes. This regular expression seemed fine at first, but then turned out to be not so perfect :)
When any text written in chat matches our pattern, we know that we have our “victim”. Now we can extract author name, find correct version of word and prepare complete response like:
@user, gimme a break! It should be <correctVersionOfWordWrittenInBold>
After first release, it was time to test script on production and see how it deals with messages posted during normal work day. And of course, it turned out that many common errors are missing in errors object but also that first version of regular expression is far from perfect. It was detecting mistakes in a properly written words, for example if we had pair ‘eror’: ‘error’ it was firing also for messages like ‘abcEror’ or ‘erorSomething’.
This one can be fixed by new, better pattern checking only for separate words (or start/end of sentence)
prepareErrorDetectingRegEx = -> errorWords =  for k, v of errors errorWords.push k joinedErrors = errorWords.join('|') new RegExp '(^|\\s)(' + joinedErrors + ')($|\\s)', 'i'
That was the first issue, the second one was rather on code level: in Polish we have these national characters like “ąęłśćżźóń” but some people write words without them. And to catch all possible combinations, each word had to be added to errors in many versions. Below you can see example listing with only three errors but using all combinations requires 10 records:
errors = 'wziąść' : 'wziąć' 'wziasc' : 'wziąć' 'wziaśc' : 'wziąć' 'wziasć' : 'wziąć' 'pokarze' : 'pokażę' 'pokarzę' : 'pokażę' 'żądzić' : 'rządzić' 'żadzić' : 'rządzić' 'żadzic' : 'rządzić' 'ządzic' : 'rządzić'
This approach is tedious but also much more error-prone. To make adding more records easier, I have to simplify algorithm a bit. And when I was discussing this with Szimano he suggested escaping all polish characters first and after that applying regEx to detect errors. This approach was very easy to implement: first we accept all messages using /.*/ pattern, then replace all national characters with their standard versions:
replacePolishChars = (text) -> text.toLowerCase() .replace('ą', 'a') .replace('ć', 'c') .replace('ę', 'e') .replace('ł', 'l') .replace('ń', 'n') .replace('ó', 'o') .replace('ś', 's') .replace('ż', 'z') .replace('ź', 'z')
Now we don’t have all those different combination of one error word and our example errors list looks much simpler:
errors = 'wziasc' : 'wziąć' 'pokarze' : 'pokażę' 'zadzic' : 'rządzić'
So after two iterations we have a stable script doing what we want and keeping our orthography skills sharp. Complete source code is available at GitHub.
But as we are using this script, some new ideas started appearing and landed in my backlog for future versions:
Store most often misspelled words in a database
Store users making most mistakes and print ‘hall of fame’ table
Allow user to add new entries to errors directly from chat rooms
As you can see, scripting in text communicators like Slack or HipChat is really powerful and easy to deploy tool, you are only limited by your creativity. And you don’t need hundreds line of codes to write something useful.
Further reading: Hubot Scripting Guide