To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Watch for their articles in the Data Pipeline.
I’ll admit it. Originally, I saw MongoDB as a developer’s extravagance. It was steadfastly dedicated to the document model and JSON/BSON, which developers loved. But, along the way, it jettisoned so many things that had been considered intrinsic to any database — including ACID transactions, high-performing reads and indexing on anything apart from the data’s primary key — that it just seemed more of a proof-of-concept than a true enterprise platform. However, things have changed a lot over the last decade and a half, not just incrementally, but diametrically.
Today’s MongoDB is all-in on enterprise requirements and priorities, and it seems dedicated to addressing both long-standing and modern enterprise computing pain points.
[ Related: MongoDB’s journey to analytics ]
At its MongoDB World conference at New York City’s Javits Convention Center, the company is making a slew of announcements around new features and capabilities in the core MongoDB product, now at version 6.0, and Atlas, Mongo’s database-as-a-service (DBaaS) that runs on all three major US hyperscaler clouds.
Headlining MongoDB’s announcement
The execs at MongoDB see two areas of innovation to be their banner announcements. These are not the areas that I necessarily see as the most exciting, but since the company sees these as the big haul, let’s start with time-series data workloads on one hand and something new called “queryable encryption” on the other.
Time-series applications are often deployed to specific databases specialized for those workloads (InfluxDB being a canonical example), but MongoDB feels that is inefficient and unnecessary. That’s why, after the release of MongoDB 5.0, time-series collections in that version were enhanced with several features like, support for sharded clusters, data tiering, multi-deletes, cardinality handling improvements, handling of missing data points (with densification and gap-filling), and compression to reduce storage overhead.
In addition to those previously released capabilities, MongoDB also announced a special secondary and geo-indexing, as well as read performance improvements, specifically for time series collections. These are important changes for performance, especially in a database that at one time was optimized almost exclusively for write performance and had little support for secondary indexes.
Meanwhile, MongoDB is seeking to make it easier and more attractive to encrypt data, helping customers stay in compliance with data protection regulations while still maintaining queryable functionality. As it turns out, MongoDB acquired a company over a year ago called Aroki Systems, which was focused on cryptography, specifically for this purpose. Until today, the acquisition was kept quiet — until today so that MongoDB could show how important this initiative is.
Using a technology called structured encryption, MongoDB users can perform equality search queries on encrypted data directly, without first decrypting it, and can do so even on data in which the encryption scheme was randomized. Not only does this seek to mitigate functionality penalties for encrypting data, but it also encourages the encryption to be stronger.
My gut tells me this bit of queryable encryption is just a “taste,” and that there is more to come. When I mentioned above that MongoDB is dedicated to addressing modern enterprise computing pain points, I meant it.
Protecting data and maintaining sufficient functionality for data-driven operations is hard. MongoDB seems to see that and wants to solve what has for a long time seemed like a roadblock to progress: being both compliant with data protection regulations and technically agile with data and analytics.
By the way, in the name of making hard things easier, especially when it comes to servicing dev/test and sparse workloads, Atlas Serverless instances are now in general availability (GA). Along with that, MongoDB is announcing integration of its serverless offering with Vercel, a front-end developer service that integrates with numerous serverless solutions.
Stay in sync
Another area of major investment for Mongo seems to be in the variety of data synchronization technologies it uses. Sync is important for a variety of reasons. For example, when a customer wishes to scale up a MongoDB or Atlas cluster, that typically means adding new nodes, to which data must be synced so they can become fully participatory in the cluster.
Another application for sync is in the support of hybrid clusters, in multiple scenarios: on-premises and edge as well as cloud, multi-region, and multicloud. Because Atlas is now available in more than 95 distinct regions across all major cloud providers, a single cluster can mix and match clouds and regions, and sync technology is critical to making this work.
In each of these hybrid cases, nodes are physically distributed, sometimes to meet performance requirements like putting data geographically closer to the customer which creates a snappier experience; sometimes for high availability to keep data in different regions and clouds which makes outages far less likely; and at other times for data sovereignty requirements where data must be stored in the data subject’s home country. Sync should also be smart: even when a sync hasn’t yet been performed, a database should be able to run partial synchronizations to service the queries submitted to it. Mongo has added features to address all of these scenarios.
The company says its new initial sync via file copy improves initial sync performance by four times, helping clusters scale up faster. Cluster to Cluster Sync powers hybrid deployments involving both Enterprise Advanced (on-premises or on the edge) and Atlas (cloud or DBaaS) clusters, bi-directionally (i.e. in either direction, not both simultaneously). According to Mongo, this capability also enables Atlas-to-Atlas synchronization for workload isolation, and facilitates disaster recovery and hot standby scenarios. In addition to all this, an erstwhile preview feature called Flexible Sync, which syncs just enough data to satisfy queries sent to the cluster, has reached GA.
The area of analytics is one where MongoDB has made major investments, with an expansive scope. The very notion of operational analytics embeds a certain conflict of interest — being able to perform analytical queries on an operational database can be powerful and helps enterprises avoid performing analyses on stale data. But it also introduces the risk that the analytical queries will tax the database at the expense of operational performance.
Even for databases that have an answer to this concern, there’s also the issue that many customers want to perform analysis on large datasets that are not stored in the database at all, but rather in cloud object storage, where the storage costs can be much lower.
To address these requirements, Mongo is doing a couple of things. For one, the company is introducing the ca[ability to add nodes to the cluster that are dedicated to servicing analytical queries, and which can be scaled separately from the nodes dedicated to operational tasks.
But what can Mongo do for the data lake crowd? A feature called Data Federation now allows the querying and merging of data across different clusters and object storage. This sounds a bit like the external table capabilities supported by a number of relational database platforms and can be very useful.
However, some customers will prefer to work in a pure data lake environment. For them, a preview feature is being launched today called Atlas Data Lake, and it will support scheduled extracts of Atlas cluster datasets.
Now customers will have a spectrum of choices:
- Perform analytical queries in MongoDB proper.
- Perform queries that federate data in the database and object storage-based data lakes.
- Convey data out of the database and into the lake on a scheduled automated basis.
While we tend to think of analytics as involving the kind of aggregation and drill-down queries that business intelligence (BI) tools perform, that’s not always the case. Sometimes search is a more desired or effective approach, given the universality of the search metaphor on the Web and elsewhere.
Mongo has a treat for the search crowd as well in the form of general availability for the platform’s “facets” feature which allows customers to pursue categorized searches and filtering. They are highly comparable to the checkbox categories that appear on the left-hand side of department and search results pages on Amazon and other ecommerce providers — underscoring the consumer experience familiarity of using search as an analytics tool.
Customers who want to use BI tools to analyze data in Mongo are getting new features too. While Mongo previously had a technology called the BI connector, which flattened document-based data to look more tabular, it today is announcing something more robust: Atlas SQL, a new dialect of the ubiquitous query language that understands and is aware of the hierarchy and structure of document-based data, allowing BI tools to query the database (and, ostensibly, view its metadata) in a crisper, more native fashion. Atop this foundational technology, parties can build highly optimized connectors and today, MongoDB is launching one such revamped connector, specifically for Tableau, with others ostensibly to come.
That Atlas SQL theme of ambidextrously handling the continuum from tabular and relational technology over to the document-based world is applicable in more than BI use cases. It also works in scenarios where customers may wish to migrate databases from relational platforms to MongoDB and Atlas. A new tool from MongoDB called Migrator does just that, and will support Oracle, Microsoft SQL Server, MySQL and PostgreSQL databases as sources.
Rather than naively map tabular structures to simple documents in a black-box fashion, MongoDB says that Migrator will produce a recommended starting schema using more refined transformation rules to convert relational schemas to the document model, and will allow customers to override or customize the recommendations. What’s more, the Migrator will apparently take note of these overrides and customizations, and use them to further train and improve the recommendation model, hopefully resulting in better default schemas in the future.
Distribution of Migrator will be controlled, initially being made available through MongoDB sales and consulting channels, for use on what the company says are “suitable customer projects.” While customers may be disappointed that a more self-service option isn’t available, tools like Migrator typically yield the best results when implemented by practitioners specialized in the sometimes ‘dark art’ of database migration. In other words, Migrator will likely be better situated as a specialist’s productivity tool than as a “wizard” for automated migrations.
Don’t neglect the base
While talk of the core capabilities of the database might make it seem like MongoDB has put developers on the back burner, nothing could be further from the truth. Considering MongoDB has official drivers for 14 different programming languages, it couldn’t neglect its dev faithful even if it wanted to, which it clearly does not.
But perhaps one of the most interesting developer-oriented announcements is around a new way to query MongoDB in a programming language-agnostic way. The Atlas Data API provides a purely HTTPS-based access method for Mongo, that doesn’t require any SDK or driver at all. Developers simply use a REST-like API for easy access to the data sitting in the Atlas DBaaS platform.
The language-specific front isn’t being neglected, though: C# developers are getting a redesigned LINQ provider and .NET analyzer; MongoDB queries in the Compass tool can now be transformed into corresponding Ruby/GoLang code; the Realm Kotlin SDK with support for sync has hit GA, and DART/Flutter developers are getting a beta SDK, also with support for sync. Python developers are getting a library called PyMongoArrow, which can export MongoDB data to DataFrames, Numpy Arrays and Parquet files (with just one line of code, according to MongoDB) and the export will leverage Apache Arrow for better performance. There are also features for Node.js and Rust developers too.
Also on the developer front are a new Atlas command-line interface and registration experience, the GA of the Atlas Kubernetes Operator, continued iteration of Atlas’ Terraform provider, and AWS CloudFormation resource.
It’s not just tech
MongoDB has always been a developer-friendly platform, so the dizzying array of new technical capabilities, while impressive, is not surprising. But there’s important news beyond the tech itself. MongoDB isn’t just a platform, it’s a company too. And, in a data-driven-appropriate manner, that company is executing especially well on a number of key performance indicators.
For example, MongoDB claims it has hit a $1 billion revenue run rate. It also says it has nearly $2 billion on its balance sheet and estimates a cumulative R&D investment of nearly $1 billion by the end of its 2023 fiscal year. In terms of customer counts, the company claims to have 1,300 customers generating $100,000 or more in revenue, more than 160 customers generating $1 million or more in revenue and even now, has several generating upwards of $10 million.
The company also just reported an excellent quarter, generating a 20-cent profit per share on $285 million in revenue, while Wall Street analysts were expecting a nine-cent per share loss on just $267 million in revenue. Its stock jumped almost 15% last Thursday as a result, making headlines outside the world of tech press coverage.
The thing that used to bug me most about NoSQL vendors in their early years was their almost religious zealotry in defining themselves by what they were not, rather than by what they were. In the case of MongoDB, that situation has now flipped. The company is adding features at an accelerated rate and is paying attention to the details that it may have deemed “boring” in the early days, but which are key to acknowledging and resolving customers’ database pain points.
All the announcements detailed in this post (as well as others I simply didn’t have room to include) largely focus on facilitating new workloads, use cases, deployments and customers, focused on additive value rather than zero-sum logic. MongoDB isn’t the only latter-generation database company executing in this manner, but it’s an important exemplar and role model.
This puts the ball back in the courts of the relational database and cloud vendors to provide value to customers and productivity to developers. Ultimately, such positive competition is good for the whole industry, as it’s inspiring rather than combative. This is especially appropriate in a year when vendors’ events are coming out of their pandemic-driven virtual conference shells and returning to live event formats. The world at large can be a discouraging place. The vibe coming out of MongoDB World, however, is that it’s time to move forward.