Mobile Security Application |System Design | HLDs with use cases
For almost all of us, the last few months have drastically increased our dependability on mobile devices. Through the increased use of corporate mobile apps, virtual private networks (VPNs), internet hotspots, and other applications, mobile devices are more ubiquitous than ever.
Because of this enhanced, unprecedented and sudden dependence on mobile capabilities, mobile security should be at the forefront of everybody’s minds — not just the minds of security professionals.
In this article, we will develop a high level system design for mobile security against cyber attacks and provide an extensive analytics system to report against such malicious activities
In the architectural design of our system, we not only focus on providing protective measures against cyber attacks, bullying, phishing and other security threats to its users, but also keeping in mind to keep the system highly-available at all times, effective data storage, ﬁrewall integration , optimizing the throughput and building a cost-effective system.
The scope of the design is to provide infrastructure architecture for our project. This design will cover all the aspects related to important features, controlling cyber attacks, and a highly available -robust environment. This design will use AWS-managed resources for meeting cloud infrastructure and storage requirements.
Some of the components which require high availability and almost zero downtime will run on Kubernetes powered cluster.
In this design, we also cover evaluation and analytics in terms of score, and accuracy, data analysis to provide better insight into the working of different protocols using machine-learning algorithms.
Authentication Module API
This module will help in login and authentication for users to access our software services so that any attempt of unauthorised users can be revoked. It will also help in monitoring login attempts of users for security purpose.
Web Application Firewall(WAF)
A web application firewall can help protect our application by filtering and monitoring HTTP traffic between the application and the internet .I
t will also protect the App from attacks such as cross site forgery, cross site scripting ,file inclusion and SQL injection ,among others.
AWS WAF is a great tool that can help protect our Application and its APIs against common web exploits that may affect availability, compromise security, or consume excessive resources.
Load Balancers are used for the efficient distribution of network/application traffic across multiple servers so as to evenly distribute the load, and overloading only no single server.In the proposed system design, load balancers are incorporated into application delivery controllers (ADCs) which can improve the performance of micro services .
Looking into the complexity of this design, round-robin algorithm can be used to distribute the traffic in line.
We can also incorporate multiple-load balancers which can allow serving multiple connections at the same time, and provides an effective solution if any load balancer is down.
Device Service Server(DSS)
This server is introduced between our software user interface console and Global Storage Database system, in which the admin can use DSS API to query this database and display the information on UI Console.
This information can range between anything from user information to user app usage data and app settings.
Kafka Rest API and Server
Considering a large volume of data coming from a large number of applications of different users and that too repeatedly at a fixed interval of time, this design focusses on two main challenges:-
- Collecting large volume of data
- Analysing the collected data
To overcome these challenges, we tend to use Kafka Rest API which is designed for distributed high throughput systems.With its built-in partitioning, replication and inherent fault-tolerance, integrating Kafka makes it a good fit for large-scale message processing.
Also, on further analysing the system requirements, Kafka can be integrated with Apache Storm and Apache Spark, for stream processing and batch processing respectively.
Data and Log Analytics
Providing meaningful log analytics is an essential part of any application that has enormous amount of log data. These logs from various sources are useful for getting valuable insights and information.
For this purpose, we have designed an effective solution for logging infrastructure and getting meaningful data out of it.
Data will be collected and transmitted using Kafka. Kafka can help in the buffering of data and queue it.
Logstash will aggregate the data from Kafka topic, process it using pre-defines rules and send to Elasticsearch.
Elasticsearch will further apply indexing and mapping of the complete log data.
Finally, we use Kibana for visualising the mapped data on the UI Console where Data Analysts can observe and analyse the data analytics of each user.
Alter System Module
Based on the notification for suspicious activities and providing information for protective measures to the users, effective alert system is integrated in our design for both Android and IOS users.
To serve this purpose, we use:-
- Firebase Cloud Messaging for Android Users, and
- Apple Push Notification service for IOS users.
This can also help in customising whether to provide notifications when app is in foreground or running in background.
Hourly-Data-Collect-Service API and Daily File Generator
- This API will GET all application logs, app settings, data usage logs, custom data features for web crawler, and file scan statistics for malware, etc for every x hours (let’s say 3 hours )
- The Daily Feed generator will further create a daily file of logs by appending the hourly data.
- This daily-file will be processed and analysed using number of sophisticated techniques and security features. Rank and cumulative score will also be applied to each app, and if the rank/score is higher than the threshold limit, our system will alert the user for suspicious activity.
Logs Writer Module
This module read and writes all the different types of data received from the user such as app logs, social_media logs and data usage, etc in the separate databases of global storage DB.
In triggering of some events such as SMS/Email/peripheral device/surfing new websites, log writer will immediately forward the data to event trigger API for real-time analysis and protection.
Content Filtering Module
This module uses some pre-defined and tested rules and algorithms to do some operation such as Malware Signature matching, spam email filtering and DNS content filtering, and puts the suspicious objects in quarantine and notifies the user for safe and legitimate content.
Proxy & Firewall
These components are an essential parameters of network security and can block / restrict unsafe content from internet. They are designed and setup to provide both the user clients and our application and internal databases for any security breach.
Firewalls can block ports and programs that try to gain unauthorised access to our system, while proxy servers will basically hide internal network from the Internet. Both the firewall and proxies can be configured according to the system requirements.
File and Database Design
- Data Storage in this design is an essential feature, as it will be handling all types of data such as -
- User Information
User Data usage
- Application logs
- Malicious objects
- Quarantine data
- Test Database which contains all the pre-stored training data which will be used in filtering phishing emails, processing spam SMS, insecure wifi detecting rules, etc
- Database for logs and monitoring
- Database for machine learning algorithms and data analytics
- Database for filtering insecure web surfing, etc…
- The proposed solution for data storage is using cloud technologies like Amazon S3 and Hadoop, HDFS, Hive and PIG for big data processing.
- These services can help in providing replicas for databases, automatic sync, security and file encryption and reliable tech support.
- Database should also use replication and sharding so as to increase throughput of the database.
- Integrated databases should also guarantee either strong and eventual consistency, depending on the requirements.
- Database indexing should also be done in an optimised manner in order to speed up read queries.
- This database will be used to store all the user information including accounting profile device information and credentials in encrypted form. The authentication module API will use this database for authorisation and authentication of the users.
- This DB will be used to display the data in the UI console for system administrators. The user information will be stored in key-value pairs for faster searching using concept of hashing.
- For this purpose, using etcd database for our system design is an optimal choice. Etcd is a highly available key-value store which can be used for persistent storage.It has high access control and can be accessed by only using API in master node, nodes in other cluster do not have access to etcd store.
This DB will contain all the data and information of infected files, suspicious data and spam content for fixed number of days. Users can look into this data and recover/permanently delete depending upon the requirements.
This datacenter will be used to protect and restore a database.This will be achieved by database replication and can be done for a database or a database server.It enables the creation of a duplicate instance, in case the primary database crashes, is corrupted or is lost.
On evaluating the design, it was found that due to many machine learning algorithms running for different purpose and to speed up the data analytics of various features in real-time, a strong fully-managed service is required which could provide increased speed of iterations, multi-processing/ batch operations compatibility, and manage servers.
For these requirements, AWS SageMaker makes a good choice for all the above mentioned features.
With this framework, Developers can quickly and easily build machine learning models, and deploy them into production-ready environment without caring to manage servers in almost real-time manner.
UI CONSOLE DESIGN
This console is designed for system Administrators to serve the following functionalities:-
1.)It will be used to analyse all the data analytics, logging and monitoring of system, error reports and other statistics of algorithms.
2.) It will be used to monitor and display all the user profiles, their device information, usage, authorisation activities, daily statistics of the user, etc.
3.)This console will serve as a UI for client-product interaction, receive and reply their messages, provide solutions for their queries and will help to organise a general security-awareness quiz for our clients.
4.)UI Console will be used to control all the app settings in user’s mobile for additional security, if the user has authorised to do it.
5.)This console will also be helpful in monitoring authorisation and authentication of users and to report any suspicious activities by third party users.
SYSTEM-HIGH LEVEL DESIGNS
This section contains some high level design to ressolve some of the unwanted suspicious activities and cyber-attacks:
- Malicious Apps
- Insecure WiFi
- Insecure Web-Surfing
- Phishing Emails
- Phishing via other channels
Our System will monitor all existing apps and any new apps to avoid any suspicious activity and security breach caused by stealing of confidential info and other app log and will alert users, both Android and IOS, for the same.
This feature can help in ensuring the users that their mobile data is safe and will also protect their valuable information.
The most challenging part in this design is selecting and optimising security features across which the app will be tested.
Proposed System Design-
1. The system will collect scan and collect all application logs from every user at scheduled interval of time (let’s day Daily at an interval of every 3 hours), using data collect API.
2. This API is very critical for this feature and highly available with absolutely zero downtime, and hence, we setup this API in Kubernetes powered cluster to make sure API runs in a robust environment, and even if the service goes down, Kubernetes can launch it again automatically without any manual intervention.
3.App logs will be stored in Amazon S3 and appended in a single file.
All the data stored will be separate for every user, and the file created can be named as-“user_id.app_log.csv”.The user_id assigned will be unique and thus can be useful in identifying the file for each user.
4. The file created will be processed twice a day and the processing includes-
(i)Subsetting of columns for important features based on the security criteria
(ii)Cleaning of data according to pre-defined set of rules
5.Feature selection is one of the most important step in this design which hugely impacts the performance of the model.For selecting relevant features, we can use machine learning algorithms such as- Univariate Selection technique, feature importance and correlation matrix.
6.Selecting features based on optimal solution can help in reducing over-fitting, improving accuracy, and reducing training time.
7. Selected features will be processed through a rule engine to calculate an app score and this app score will be updated after every processing of log file.
8.Column “App_Score” will be appended in the DataFrame which will contain all the latest score for every app and for every user.
9. If at any point of time, score of the app is higher than the threshold limit, then application will alert the user regarding it including the cause and solution to the issue.
10. Altering the user can be done by using-
(i) Firebase Cloud Messaging for Android Users
(ii) Apple Push Notification Service for Android Users
Public-wifi can often be insecure and cyber criminals can easily penetrate inside our phones and steal personal/sensitive information. They can easily intercept our login credentials, intercept general data, spread infections and malware, steal bandwidth and use our network for illegal purposes.
It can use Firewall/VPN Gateway to monitor the internet traffic and can use wifi encryption level checker to monitor the level of encryption.
It will also send alerts to user for any suspicious activity and ,if the network is being used to steal sensitive information.
Proposed System Design-
- This product has Encryption level checker module in it. If the device connects to a new wifi network, it automatically checks for the encryption level used by the Wi-Fi network, and notifies you if it is insecure.Most of the public Wi-Fi networks normally use the ‘WEP’ open authentication that is insecure. This type of encryption has many security flaws that can cause user’s personal information, to be seen.
- It can use Firewall/VPN Gateway to monitor the inbound/outbound traffic, as per the requirements.On the outbound side, firewalls can be configured to prevent users from sending certain types of emails or transmitting sensitive data outside the network.
- Firewall can also be useful in case of banking transactions, when we don’t want any sensitive information to be leaked.It can allow all traffic to pass through except data that meets a predetermined set of criteria, or it can prohibit all traffic unless it meets a predetermined set of criteria.
1.Internet has become a potential hotspot for hackers to utilise websites for spreading viruses, steal confidential information, and perform other malicious activities.
2.This design focusses on controlling insecure web-surfing and protecting confidential information of users, and also alerting the user in real-time for any potential risk.
Proposed System Design
- This system will test a number of websites across a number of security features, and rank and score them and store them in etcd database. etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. This DB is continuously updated by monitor all existing, and new sites, based on the latest security threats and vulnerabilities.
- Thus, DB will be used to store all frequently visited, past visited, popular and risk-potential websites and their rank records.
- When a user requests a website, request is sent as a POST request to Secure_Surf API, which hits the etcd DB, and returns the response of whether the website requested is secure or not.
- If the website requested is insecure, our system will warn the user in real-time regarding insecure surfing. User is free to continue or discard the surfing session, as needed.
- Since multiple users can request websites at a same time, it can create load on API which in turn can cause server downtime. This can be easily overcome by using Load Balancer, which evenly distributes the load using horizontal scaling.
- This product will also use additional firewall used to allow/block content, control both incoming and outgoing traffic, restricting insecure webpages, alert about accessing http content and allowing https content.
- Phishing is a type of online scam where users can get a number of emails which can be harmful and used to steal personal information ,spread malware & infections and even control financial losses.
- This product can help to identify such scams by performing sophisticated techniques and flagging such emails to the users.
- The most challenging part in this design is to continuously update the DB , used to store information about whether the sender (of email) is a cyber criminal or not.
Proposed System Design
- The proposed design in email phishing design covers both the important aspects of email-filtering:-
- Email Filtering based on Sender’s info and domain
- Email Filtering based on its contents
- Whenever a new email event is triggered, system will automatically check the sender’s address and domain. If the sender is blacklisted in system’s DB, then email will be flagged as harmful.
- The product will maintain a DB which will contain the information of all the blacklisted and whitelisted sender information. This information will be stored in key-value pair for fast searching and the DB will be continuously updated with the help of many datasets and upcoming new email scams.
- If the sender’s email information isn’t stored in the DB, the email will be further processed by using optimised machine learning techniques and classification algorithms.
- In the training module, header of email content will be pre-processed and meaningless stop words will be removed.Then, each training email will be checked to capture the values of all necessary pre-defined attributes such as-spam keywords found in sender’s name/address/title, email size, and attachments type.
- In the proposed design, “Keyword Database” will be built in advance which will contain two types of information-
- Spam Keyword Table: This table will record those suspicious keywords that are found frequently in spam emails
- Legitimate Keyword Table: This table records keywords commonly found in legitimate emails and are seldom discovered in spams
- It will subsequently maintain a Rule Engine DB that will store all the pre-defined rules, and values of decision tree based on the attributes.
- This Rule Engine DB will be used in scoring the attributes of any new incoming email and deciding whether it’s spam or legitimate.
- Based on the evaluation, email will be flagged as spam or legitimate and the etcd DB will be updated based on the results.
- Looking into the complexity of this design, we can use AWS SageMaker for speeding the throughput and reducing the latency of filtering algorithm used and getting real-time inferences.
- All the DB can be stored in Amazon S3 with sufficient replicas sets, so at any point of time we don’t face database server downtime issues.
Phishing via other channels
- Phishing is a cybercrime in which a target or targets are contacted by email, telephone or text message by someone posing as a legitimate institution to lure individuals into providing sensitive data such as personally identifiable information, banking and credit card details, and passwords.
- This design proposes a solution to phishing attacks mainly through sms/instant messaging apps and calls by cyber criminals.
- It can help the users to identify such attacks and flag them as suspicious by a number of sophisticated techniques.
- It can also help in providing real-time legitimacy of any suspicious phone call to the user and the user can post any suspicious messages in the app to get a professional opinion.
- This design focusses on phishing via other channels such as sms/instant messaging app , and incoming calls etc.
- If there’s a call or SMS event trigger, then system will first identify if that number is already stored on user’s mobile through a simple query service.
- If the number is unsaved, system will search this number through hashing in already black listed numbers DB to flag it as suspicious, if it is found in DB.
- If the number isn’t in DB, it will apply pre-trained SMS filtering algorithm to identify if this is a phishing /spam message. This algorithm includes:-
- Pre-processing of the message
- Feature extraction using Tf-ID vectoriser and Naive Bayes algorithm, and
- machine learning algorithm, so as to classify the messages as spam or not spam.
5. This design also includes scoring and metrics to evaluate accuracy and precision of the predictions made for complete feature performance.
Hope you enjoyed reading the blog!