A talk about hacking attempts and dealing with spam bots using artificial intelligence.
Episode #9-18 released on December 16, 2018
Today, you will learn about the a few technical details you can use in your own web-sites to prevent people from trying to hack your web-site, and more specifically and to the point, learn how I am able to prevent people from hacking my own web-sites.
On December 3, and December 6, 2018, a person, or more likely a bot, tried to compromise the TQA Weekly web-site through use of SQL injection. This is not the first time, nor the last time someone will try, but I do account of that when I design web-sites and I usually address all issues related to user modifiable entries before coding any functions into my code that process data. I, also, have a separate issue relate to spam bots which is actually dealt with using a limited form of AI in the form of an API I designed for my own web-sites called Hivemind.
The attack attempted to use a function of MYSQL called UNION. UNION is an operator used to combine the set of results from two or more tables within a MYSQL database. Usually, when trying to use this, you are trying to access other tables within the database, more notably user accounts. User accounts on my web-site are fully encrypted, and each set of data is encrypted using different keys, some generated on the fly, and not actually stored.
Now, that you have an idea of what they were trying to do, and you know I am prepared for this, what happened?
Nothing happened. The attempts, to be really exact 30 attempts, resulted in no change to the content delivered. They attempted to use the brand-new search engine of TQA Weekly as an attack vector to try to obtain access to other tables of information and it didn't amount to anything. The search engine is specifically designed to ignore everything that is not a letter or number and uses different sets of smart parameters to deliver the content you are looking for. In their case, because the beginning of the SQL injection attempts started with a search of graphics cards, all they got were the results of the graphics cards on my web-site.
How do I prevent SQL injection and other attacks, and how am I even aware of these attempts?
What did I change about my search engine, despite the attempt failing because of good design?
Well, the first thing I did was make a clause that does more than ignore the extra characters, it actually eliminates the entire entry. So, instead of filtering out data, it NULLs it. After that, it sends the person to blank search engine page asking what they are looking for? And that is it.
Now, I said I have more than one issue at the same time, the other being spam, how did I deal with it?
A few years ago, I designed a small API called Hivemind v0.1, and it was tasked in using known sources of spam and eliminating their ability to spam me by preventing the form from being able to send any messages by blocking them. It was designed to learn using a base set of variables which allowed it to function. I, also, required users using the form to click a link in an email to send the message to me. The combined method stops almost all bot-based spam bot attempts in their tracks, and for those bots at the time, it blocked them entirely.
So, to say spam bots are a problem is a gross over statement, because with two vectors addressed, they can't really use the forms anyway. However, the Internet is a kind of wild west, and escalation in technologies is a thing. Recently a single Russian server was actually able to bypass the methodology used in code because it was both able to use the form and click the link in the email. Until now, I wasn't aware of the server and it slipped through.
How did I deal with that spam bot?
I redesigned my Hivemind program, now called Hivemind v2.0. This time, I taught it how to learn using no previous knowledge what so ever. The code can learn from every new attempt, can analyze all incoming requests for contact, can review past requests and has a few tricks up its sleeve. I use a series of triggers and counters to allow the code to decide which action it should take. Because of this, the script can operate under three sets of time, past, present and future.
All data starts in the present, obviously. A bot can try sending information through the form, and at that time the email address entered is used to send an email to verify that they, in fact, sent the email. Bots, however, try to send multiple messages, so there is a function that detects this, and does the following: it records the current attempt, saves the message into an event recorder, deletes the message that already made it through, so basically addressing the past there, and then blocks the user or bot proactively preventing any future issues. If the user or bot attempts to change IP address or configuration, the message is flagged, and the person is blocked again. So, basically the program learns to recognize future attacks from past ones.
The script goes a bit further by, also, whitelisting and blacklisting countries where users attempt to abuse the forms. I won't talk about how that engine works exactly, but we can explain it in a very simple fashion, there are places where my shows are more likely to be relevant, so basically countries where the legal or defacto language spoken is English, and other places where it isn't. In places where I am not likely to have business relations, the code will simply block the country from using the forms or registering when the threshold is met. No country actually starts off as blocked in the code, the actual interactions with the site decide all that, so in essence, I have given the control of what happens to the Hivemind API used in my web-site. And, that is how I deal with SPAM on my web-site today.
And, an interesting fact, the original reason for Hivemind v0.1, was to address concerns from a small portion of my audience that have some visual impairment. They were unable to use the forms at the time that relied on Recaptcha, and as we all know, Recaptcha is not basically unusable for anyone but bots.
And, another interesting fact, if you were wondering why my show is published in both in video and audio formats, it is to allow people who have visual impairments to listen to the show and allows people to consume my show as a podcast, instead of just a YouTube video.
Host : Steve Smith | Music : | Editor : Steve Smith | Producer : Zed Axis Dot Net