We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 – 28. Join AI and data leaders for insightful talks and exciting networking opportunities. Register today!
There is so much talk about data that it’s almost become a cliché. It’s true that data is being generated at an ever-increasing rate. That increase brings challenges for storing and managing the data, accompanied by challenges in converting the information into insights and business value.
It’s a classic case of separating the wheat from the chaff. And there’s quite a bit of chaff. Up to 70% of all data collected and stored within a company will never get to the analytics stage. That means only 30% of the data you collect will actually provide value for your company.
Many companies are examining the use of GraphQL to rise to these challenges. So, how can you use GraphQL to get more data from the storage stage to the analytics stage so you can actually gain insights from that information?
ETL vs. APIs
One way businesses bring data from the collection stage to the analytics stage is through extract, transform and load (ETL) processes. ETL software pulls data from various sources and feeds it through a pipeline directly into a “data lake” or data warehouse. You can transfer the data in batches, or you can transfer data in real-time as it updates, which is called a “stream” of data. Then, various types of analytics software can sort the data and present it to your team members.
ETL is great for when you need to compare big datasets. For example, if you need a day-by-day comparison of daily expenses in your business this year to compare to your expenses last year, you’ll need to compare a lot of data all at once. So, it’s helpful to have all that data in one place where you can sort and compare it more easily.
Another way businesses can gather data for analytics is through application programming interfaces (APIs). APIs let software programs communicate information with each other. For example, your customer service smartphone app can use an API to connect to another smartphone app that can then alert your IT team when customers are complaining about a technical issue. Or, your apps can send data to your data analysis software through an API.
APIs can cache and temporarily store data from apps. Then, developers can use GraphQL or a similar language to send a request, called an API call or a “query,” to get the data as they need it. GraphQL queries are more specific than a normal ETL process because you can “nest” your queries to get the exact information you want. So getting data from APIs is great for analyzing smaller, more specific pieces of data.
For example, if you want to know how many women above a certain age purchased a certain product from your website in a given month, you could query your ecommerce API with GraphQL. Instead of sending a query that just asks for the entire number of purchases for that product throughout that month, you could send a query that asks, “Among all the women who purchased this product in January, how many are above this age?” That information could help you target your advertising for that product.
Data challenges in APIs
We’re clearly in the age of APIs, now that around 90% of developers use them. There are literally thousands of pre-programmed APIs publicly available for any company to use for everything from improving in-office productivity to providing better customer service. So, you don’t necessarily need to worry about creating the APIs yourself. Your primary concern should be efficiently getting data from those APIs, but that’s not always easy.
With the sheer number of APIs comes a large degree of variation. There’s variation in API formats, access controls, performance levels, querying and much more. Basically, communicating data between all those different kinds of APIs can become messy because they handle data in different ways. Application developers are typically busy building the most effective user experience, and they want to avoid having to worry about precisely how APIs format and handle data. They may not have the time or expertise to wade through the different formats to get the full benefit of the data.
This is where GraphQL comes in. GraphQL is the new API query language and has taken the world of developers and big and small companies by storm. GraphQL allows frontend developers — the folks whose job is to worry about the user experience — to query for backend data, irrespective of the API style or purpose. In short, GraphQL makes it easy for you to aggregate useful data from any kind of API.
GraphQL for data management
What makes GraphQL relevant to your data management goals? A central concept in GraphQL is the stitching of multiple pieces of data; you get customer data from one backend, and orders data from another, and now you can ask for “give me all the orders for customer John Doe.”
This concept of stitching is powerful and allows for compositions of subgraphs. There could be one team that builds out the customer subgraph, another team that builds out the ecommerce subgraph, and a third team that focuses on the marketing subgraph. Now, a query: “Show me the relevant promotions for customer John Doe” could fetch data from each of the subgraphs.
As you can see, this is revolutionary. Instead of thinking of your GraphQL API layer as a central monolith, it can be partitioned into teams and then combined. It can be partitioned by countries (to protect data privacy laws) and then combined. It can be partitioned by clouds (to improve performance) and then combined. The new layer is a graph of graphs. In the same way, as the web was formed —interconnections within and outside a domain, the same composition can happen in the GraphQL API layer.
As you start to think about this graph of graphs, you will, rightfully so, think about performance, governance, standardization, etc. Good GraphQL implementations make them easy. For example, building out this graph of graphs declaratively (in other words, describing what the graph structure is, rather than how it is executed) allows for easier performance goals, cleaner governance, and easier standardization.
In summary, a new data layer is emerging in companies: the API layer. This layer sits between the systems that store, manage, and analyze data, and the systems and apps that collect data. The best way to access the API layer is through a query language like GraphQL. GraphQL lets developers get to the data more easily without having to worry about the “how.”
Furthermore, it is naturally decomposable, allowing for very flexible architectures, by having an inbuilt graph of graphs concepts. That in turn means you can process data more efficiently and get more business value from the information that your APIs collect.
Anant Jhingran is the CEO and cofounder of StepZen.