Abstract — MongoDB databases seem to have taken the world by storm. It is a document-oriented NoSQL database. It is the most popular open-source database that allows users to query their data without having to learn and master SQL. This paper reviews the security features offered by MongoDB and how it performs compared to relational databases and security flaws.
Keywords — MongoDB, SQL, NoSQL, Security, Relational, Database
NoSQL database is used to manage huge sets of unstructured, semi-structured, and structured data. The data is not stored in tabular relations like relational databases. MongoDB. Due to the rich set of a rich set of features, query capabilities, and transaction management, relational databases were used in a wide variety of applications in the earlier days. However, relational databases are unable to handle and process big data such as documents, e-mails, multimedia, and social media effectively. NoSQL databases emerged to overcome some of these problems. The term NoSQL was introduced by Carlo Strozzi in 1998. This term refers to non-relational databases and it was later reintroduced in 2009 by Eric Evans. The word NoSQL has also received another meaning which is “Not Only SQL”. A common misconception is that NoSQL databases do not store the relationship between data well. NoSQL databases can store relationship data. However, they just store it in a different format as compared to relational databases. When MongoDB databases are compared with SQL databases, many find modeling relationship data in NoSQL databases to be easier than in SQL databases. This is mainly because related data does not have to be split between tables.
II. Overview Of MongoDB
MongoDB is a schema-less, document-oriented, open-source, high-performance database. MongoDB was developed in 2007 by a New York-based organization 10gen which his now called MongoDB Inc. Kevin P. Ryan, Dwight Merriman (CTO and founder), and Eliot Horowitz (founder of ShopWiki, founder), the founders of DoubleClick, are the developers of MongoDB. These three developers had faced issues related to database scalability in the relational database when they were developing internet applications at the company. Due to the need or idea of the database processing data in large amounts, the database was named MongoDB. The word MongoDB was taken from the term “humongous”. The fundamental idea behind MongoDB is to replace the concept of a “row” with a more flexible model; the “document”. The platform is developed in the C++ programming language (MongoDB Tutorial, n.d.).
Fig. 1. MongoDB Document structure 
III. Features of MongoDB
Key features of MongoDB are:
A. Aggregation Framework
The aggregation framework contains a set of analytics tools within MongoDB that allows running various types of reports or analysis on documents in one or more collections. Based on the idea of a pipeline, MongoDB passes the documents from that collection via one or more stages. Each of the stages performs a different operation on its inputs. The individual stage takes as input whatever the stage before it produced as output. And the inputs and outputs for all stages are a stream of documents.
B. BSON Format
MongoDB stores data in BSON format both internally, and over the network. BSON stands for “Binary JSON”, but also contains extensions that allow the representation of data types that are not part of JSON. For example, BSON has a Date data type and BinData type. BSON is designed to be traversed easily. This is a significant property in its role as the primary data representation for MongoDB .
A complex feature provided by MongoDB is called sharding. Experts also claim that getting comfortable with the concept of sharding can take some time. Sharding is a mechanism that scales all the write operations by distributing them across multiple shards. Each document contains an associated shard key field that decides on which shard the document is supposed to reside.
Fig. 2. Sharding in MongoDB 
D. Adhoc queries
Ad-hoc queries are the queries that are not known in advance and not known while the database structure gets created. So, MongoDB provides ad-hoc query support that makes it distinct in this case. Ad-hoc queries are updated in real-time, leading to an improvement in performance.
IV. Security Features of MongoDB
This section focuses on some of the Key Security features provided by MongoDB with regards to authentication, encryption, and access control.
Authentication means verifying the identity of the client. This is critical as we would like for each user to have a personalized view of the data as well as safeguard everyone else‟s data. MongoDB supports various authentication mechanisms and based on an organization’s existing mechanism, an appropriate mechanism can be chosen for integration.
MongoDB’s default authentication method is SCRAM (Salted Challenge Response Authentication Mechanism) i.e. SCRAM- SHA-1 (and SCRAM-SHA-256 for version 4.0). SCRAM is based on Internet Engineering Task Force(IETF) that defines best practices for authenticating users with passwords. SCRAM makes use of the provided credentials to match with the user’s name, password, and authentication database .
Another method called MongoDB-CR, similar to SCRAM verifies username and password against an authentication database. However, this functionality was removed from Version 3.0, and only older versions use it today. Both SCRAM and MongoDB-CR methods send passwords encrypted, and a different hash is generated for each new session so no one can misuse them.
MongoDB can also use external authentication protocols such as:
LDAP: With Lightweight Directory Access Protocol (LDAP), users log in to the system using their centralized passwords. LDAP is designed to help anyone locate and access the information they need in either a public or private network.
Kerberos: This is a secret key authentication protocol for server/client interactions. When using Kerberos, users can log in only once using an access ticket.
B. Authorization/Role-based security
Authorization is a step required before authentication. Role-Based Access Control (RBAC) is one of MongoDB’s best security features. RBAC governs the access of a user to the MongoDB system. A role essentially decides what permissions a user has and what they can access. Once a user has been defined by a role, the user cannot access the MongoDB system outside it.
Authorization can be activated using the — auth or the security-authorization setting. — auth enables authorization to control a user’s access to a database and its resources. This feature also enforces authentication after it is enabled. It requires all clients to verify their identities before being given access. There are two types of roles, built-in and user defined. Built-in roles include readWrite, dbAdmin, dbOwner, userAdmin, etc. User-defined roles are defined by the database administrator. Database administrators are given the responsibility of creating new users and assigning them roles. Administrators have the power of using MongoDB built-in roles or can create roles for a specific purpose. Access control is not enabled by default and needs to be enabled by using security, authorization settings. Limiting user roles limits the danger occurring from a single account being hacked .
MongoDB supports TLS/SSL (Transport Layer Security/Secure Sockets Layer) to encrypt MongoDB’s network traffic, i.e. to send and receive data over networks. TLS and SSL are both standard technologies that are used for encrypting network traffic. This is known as Data-in-motion Encryption.
Fig. 3. MongoDB End to End encryption 
Mongo DB Enterprise provides a mechanism of native, storage-based symmetric key encryption at the file level to encrypt the data at rest. This entire process of database encryption is also called Transparent Data Encryption (TDE). MongoDB utilizes the Advanced Encryption Standard (AES) 256-bit encryption algorithm, an encryption cipher that makes use of the same secret key to encrypt and decrypt data. This methodology is known as Data-at-Rest Encryption. MongoDB does not offer any in-house features for application-level encryption. If we want to encrypt each field or document, MongoDB documentation suggests writing a custom encryption/decryption methods or using solutions that are created by one of their partners.
Auditing is without any doubt the most important part of security because it allows database administrators to track the history of the database. MongoDB Enterprise includes an auditing capability for mongod and mongos instances. The auditing facility gives a chance to the administrators and users to track system activity for deployments that has multiple users and applications.
Fig. 4. MongoDB Auditing 
The auditing framework captures administrative actions (DDL) such as schema operations as well as authentication and authorization activities. It also captures read and write (DML) operations to the database. Administrators can construct and filter audit trails for any operation against MongoDB, whether DML, DCL, or DDL without having to depend on third-party tools. For example, it is possible to log and audit the identities of users who read or retrieved specific documents, and any updates, inserts or deletes made to the database during their session. Auditing allows us to filter the output of a particular user, database, collection, or source location. This process generates a log that can be review for any security incidents. This log shows any security auditor that the company using MongoDB has taken the correct steps to protect its database from an intrusion and to understand the depth of any intrusion should one occur. MongoDB auditing allows us to fully track the actions of an intruder in the environment. Auditing is available only in MongoDB Enterprise. It’s not in the Community version. It is available in some other open-source versions of MongoDB such as the Percona Server for MongoDB .
V. MongoDB vs MySQL
To compare the security features of MongoDB with a relational database, this study will focus on Relational Database MySQL. Before moving to security comparison, this section presents a brief comparison between the two database systems. Both MongoDB and MySQL databases are free and open-source database software. MongoDB is one of the most popular document-oriented databases under the range of the NoSQL database. MySQL is an open-source relational database management system (RDBMS). It was originally built by MySQL AB and it is currently owned by Oracle. It works on the concept of storing data in rows and tables which are further classified into the database. It uses Structured Query Language SQL to access and transfer the data.
· Deployment and Community support: MongoDB is owned and developed by MongoDB Inc. It is extremely simple to deploy MongoDB. It is also available for Web applications, SaaS, and Cloud applications. It can easily run on multiple platforms including Linux, Windows, and MacOS. MongoDB attracts users with its clean and simple architecture apart from its collaborative and helping community. MySQL is owned and developed by Oracle Corporation. MySQL can be installed manually from the source code. It is available for Web applications, SaaS, and Cloud applications. It can also run on multiple platforms including Linux, Windows, and MacOS. One advantage of MySQL is that people have been using it for quite some time and hence it has a strong community.
· Schema: MongoDB does not have any restrictions on schema design. The schema contains a collection of documents and it is not necessary to have any relation amongst those documents. The only restriction with this is supported data structures. Due to the absence of joins and transactions, frequent optimization of schema-based on organizational requirements is expected. MySQL needs clearly defined tables and columns and, every row in the table should have the same column. Because of this, there is not much space for flexibility in the manner of storing data. Development and deployment processes are slowed down as well due to the fact that even a little modification in the data model mandates the change in schema design.
· Querying Language: MongoDB uses un-Structured Query Language. In order to build a query in JSON documents, one needs to specify a document with properties. These properties should match the results we wish to achieve. It is typically executed using a very rich set of operators, linked with each other using JSON. MongoDB uses a “cascade” of operator/operand structures which is typically in the name:value pairs. MySQL uses the Structured Query Language (SQL) to communicate with the database. It is a very powerful language that consists mainly of two parts: Data Definition Language (DDL) & Data Manipulation Language (DML).
VI. MongoDB and MySQL Security Comparison
A. Base Security Model
MySQL provides a privilege-based security model i.e. a user is provided with access to only specific commands such as CREATE, UPDATE, DELETE, etc. Based on the roles and responsibilities of the user, such privileges are assigned. MongoDB supports TLS and SSL for encryption to ensure the data is only accessible to the particular user who is supposed to be using it.
MySQL offers complete logging by default and supporting transaction and rollbacks helps in ensuring data integrity. By default, complete logging is not enabled in MongoDB. Additional logging is built into the operating system and application layers.
C. Access Control
Relational Databases have a privilege- based access control whereas MongoDB has role-based access control. MySQL provides various types of access control mechanisms such as Discretionary Access Control (REVOKE & GRANT) commands, Role-Based Access Control, etc. MongoDB only offers a role-based access control which is not enabled by default. It provides some built-in roles which provide a set of privileges commonly required in a database.
D. Code Injections
E. Integrity Models
Both MySQL and MongoDB provide solutions that have integrated complete logging but only in Relational Databases it is activated by default. Transactions and rollbacks maintain the consistency in the relational databases better. MongoDB trades this consistency with higher availability by supporting unacknowledged writes. MySQL follows an ACID (Atomic, Consistent, Isolated, Durable) model. A relational database that does not any of these four goals is not considered reliable. MongoDB follows the BASE (Basic Availability, Soft state, Eventual consistency) model. With the release of MongoDB 4.0, we now have multi-document ACID transaction support. Through snapshot isolation, transactions provide a consistent view of data while enforcing all or nothing execution and thus maintaining data integrity .
VII. MongoDB Data Breach Case
One of the biggest security issues that have happened in history was in January 2017 when nearly 27,000 MongoDB databases were affected in a ransomware campaign to wipe information from misconfigured databases lacking password protection. The hacker used an automated script to scan for exposed MongoDB databases. When it found one, the script deleted the contents and uploaded a ransom note demanding 0.015 bitcoin to retrieve them. If the victim did not pay in two days, the hacker threatened to publish their information and report them to their local GDPR enforcement authority. In January 2017, a cluster of attacks against MongoDB servers affected more than half of Internet-facing MongoDB databases. Unprotected MongoDB databases were a source of multiple leaks and breaches. The attacker went by the online handle Harak1r1. They hit the servers across the globe. 
Ransom Note by the hacker
Part of the ransom note, which is in broken English, is as follows:
“All of your data is backed up. You must pay 0.015 BTC to [REDACTED] 48 hours to recover it. After 48 hours of expiration, we will leaked & exposed all your data. In case of refusal to pay, we will contact the General Data Protection Regulation, GDPR & notify them that you store user data in an open form & is not safe. Under the rules of the law, you face a heavy fine or arrest & your base dump will be dropped from our server.” 
VIII. Security Issues in MongoDB
A. Authentication Weakness
Authentication is supported in MongoDB only in Standalone mode. There is no authentication mechanism when MongoDB used in shared mode. In replica, mode authentication is done using a pre-shared secret key which is provided in conﬁguration ﬁle by the key ﬁle parameter. The administrator has the option to decide which password will be used to access the replica server. This password can be kept the same for all the replica servers. The password storing method is the MD5 hash of the string “username: mongo: password”, and it can be easily accessed from the admin data-ﬁles. This implies that if an attacker gets access to the MD5 hash, they can get the actual password for the whole database. By default, there are no password credentials when the MongoDB database is installed. It is left to the developers to build-in the security. This weak authentication method had led to a data breach in 2017 which saw about 27,000 MongoDB instances exposed. This case is discussed above in this paper. This had primarily happened due to a default setting not being changed by the database admin and user.
B. Authorization Weakness
When a new user is created in the MongoDB database, they have access to the entire database by default. This implies that the new user has access to the entire database. This could be the most potent weapon for any malicious activity. Unrestricted accesses lead to data breaches.
To put in simple words, code injection is the addition or injection of unvalidated data into a vulnerable program. This code is executed as an application code which often leads to malicious activities and data breaches. A MongoDB injection occurs when a client is able to inject MongoDB commands that will be executed by the database engine. The attacker will try to inject a custom object with MongoDB commands inside the query object. This gives them access to unauthorized documents.
Fig. 5. MongoDB Injection Attack 
D. Clear Text Data
MongoDB data files are stored as clear text. No encryption mechanism is applied to encrypt the data. If an unauthorized user gets access to the database, they can read the data stored on the entire database. Hence, the data can be captured in an ARP Poison attack or another such MITM attack. In order to avoid this kind of security fears encryption algorithms must be applied to the sensitive information. Other than that access level for the database must be assigned and file system permissions must be implemented. All of the MongoDB data files are unencrypted and there is no method to automatically encrypt these files. This means that if any potential attacker has access to the MongoDB file system, they can directly extract the whole information.
E. Change User Password Function:
MongoDB has a function called changeUserPassword() is which is used to change the password of the existing user. This function takes two parameters i.e. “username” and “new password”. If an unauthenticated user gets inside the system, they can change a user’s password without even having to know the earlier password. This function could be misused by an insider within the organization as well. By changing the definition of this function, we can stop unauthorized access to some extent. If the changeUserPassword() function asks for an old password, the risk of compromise will be reduced. Improved versions of this function could ask for the old password and calculate the hash value. Calculated hash is matched with the old hash stored in the database for the user. If both the hashes match, only then the password should be changed. One needs to be careful with the changeUserPassword() while using it.
IX. MongoDB Security Checklist
This section discusses a list of security measures that one should implement while installation to protect MongoDB.
· Enable Access Control and Enforce Authentication: Access control should be enabled, and an authentication mechanism must be specified. One can use the default MongoDB authentication mechanism or an existing external framework. Authentication will ensure that all clients and servers provide valid credentials before they can connect to the system. In clustered deployments, enable authentication for each MongoDB server.
· Configure Role-Based Access Control: First, a user administrator should be created. Then additional users should be created. A unique MongoDB user should be created for each person and application that accesses the system. Roles should be created that define the exact access a set of user’s needs. If possible, a principle of least privilege should be followed. Users should be created and assign the roles only that they need to perform in for the organization. No one should have elevated accesses.
· Encrypt Communication: MongoDB should be configured to utilize TLS/SSL for all incoming and outgoing network traffics. TLS/SSL must be used to encrypt communication between mongod and mongos components of a MongoDB client as well as between all applications and MongoDB.
· Encrypt and Protect Data: Starting with MongoDB Enterprise 3.2, the WiredTiger storage engine’s native Encryption at Rest can be configured to encrypt data in the storage layer. If WiredTiger’s encryption at rest is not used, MongoDB data should be encrypted on each host using filesystem, device, or physical encryption. MongoDB data should be secured using file-system permissions. MongoDB data includes data files, configuration files, auditing logs, and key files.
· Limit Network Exposure: We need to make sure that MongoDB runs in a trusted network environment and limit the interfaces on which MongoDB instances listen for incoming connections. Only trusted clients should be allowed to access the network interfaces and ports on which MongoDB instances are available.
· Audit System Activity: All the access and changes to MongoDB database configurations and data should be tracked. MongoDB Enterprise includes a system auditing facility that can record system events (e.g. user operations, connection events) on a MongoDB instance. This audit facility should be activated in place. These audit records permit forensic analysis and allow administrators to verify proper controls.
· Run MongoDB with a Dedicated User: MongoDB processes should be run with a dedicated operating system user account. One should ensure that the account has permission to access data but no elevated permissions.
· Request a Security Technical Implementation Guide: The Security Technical Implementation Guide (STIG) contains security guidelines for deployments within the United States Department of Defense. Upon request, MongoDB Inc. provides its STIG, for situations where it is required. This copy can be requested for further information .
Any kind of database is prone to security attacks, hence one needs to invest heavily in protecting sensitive data. The security procedures for MongoDB range in the topics discussed in this paper. Despite MongoDB’s top-notch security features, & providing a checklist for administrators to properly keep their databases out of the reach of unauthorized parties, breaches continue to happen. MongoDB should promote the best practices discussed in the section above and enable all the security features. This could avoid a potential incident like the breach and hacks that occurred in 2017 from happening. MongoDB has security features that can be extremely beneficial for an organization, however, it still has a long way to go to strike the vulnerabilities.
This paper and the research behind it would not have been possible without the exceptional support of Professor Dan Costa and Mr. Ajay Valecha (Teaching Assistant) at Carnegie Mellon University. I am grateful to them for sharing extensive knowledge of MongoDB databases and providing a vast set of study and research resources.
 “MongoDB Tutorial”, Java T Point, https://www.javatpoint.com/mongodb-tutorial
 “JSON and BSON”, MongoDB, https://www.mongodb.com/json-and-bson
 “The definitive guide to MongoDB security”, Opensource.com, https://opensource.com/article/19/1/mongodb-security
 “MongoDB Official documentation > Security”, MongoDB Documentation, https://docs.mongodb.com/manual/security/
 “Securing MongoDB Part 3: Database Auditing and Encryption”, MongoDB Blog, https://www.mongodb.com/blog/post/securing-mongodb-part-3-database-auditing-and-encryption
 “22,900 MongoDB Databases Affected in Ransomware Attack”, GlobalDots, https://www.globaldots.com/blog/22900-mongodb-databases-affected-in-ransomware-attack
 “Hacker Held 22,900 MongoDB Databases To Ransom By Threatening To Report Firms For GDPR Violations!”, PBSE Cyber News Group, https://www.cybernewsgroup.co.uk/hacker-held-22900-mongodb-databases-to-ransom-by-threatening-to-report-firms-for-gdpr-violations/
 “NoSQL Injection Attack”, SB Computter, https://sbcomputter.com/nosql-injection/
 “Security Checklist”, MongoDB Documentation, https://mongoing.com/docs/administration/security-checklist.html
 “What is MongoDB — Working and Features”, GeeksforGeeks, https://www.geeksforgeeks.org/what-is-mongodb-working-and-features/
“Sharding”, MongoDB Documentation, https://docs.mongodb.com/manual/sharding/