![]() Most of these tags are pretty static, but when one of the tags has high cardinality, it simply explodes the number of combinations of tags. And there can be lots of tags, so there can be lots of combinations! When people talk about cardinality in monitoring and how it’s hard to handle high-cardinality dimensions, they’re basically talking about how many distinct combinations of tags there are, and thus the number of series. (name=os.cpu.util,role=web) = (name=os.cpu.util,role=db) = (name=os.cpu.util,role=web) = īut what if “role” changes over time? What if it’s not constant, even within a single server? Most existing time series software says, well, it’ll become a new series when a tag changes because the tag is part of the series identifier: The typical time series monitoring software solves this by storing the tags with the series identifier, making it part of the identifier: Plus, most time series software typically tries to avoid N-dimensional storage because time-value pairs can be encoded efficiently-it’s much harder to build a database capable of storing these arbitrary name=value tags. This looks wasteful, doesn’t it? We’ve repeated “role=web” again and again, and we should be able to do it just once. One way to conceptualize this is to make those data points N-dimensional instead of simply timestamps and numbers: But it doesn’t contain a lot of richness: what if I have a lot of servers and want to know the average CPU utilization of, say, database servers versus web servers? How can I filter one kind versus the other? To solve this problem, many monitoring systems nowadays support tags with extra information. This data model is the canonical starting point for most monitoring products. So, for example, you might measure CPU utilization and store it in a time series database: ![]() A time series is a labeled set of values over time, stored as (timestamp, number) pairs. Generally, this refers to the number of series in a time series database. If you’ve seen discussions of “high-cardinality dimensions” or “observability requires support for high-cardinality fields,” this is what we’re talking about. ![]() Name probably has high cardinality, unless there’s more to this table than meets the eye (such as multiple rows for different product colors and other variations).Ĭardinality in Time Series DatabasesIn addition to cardinality in databases, I also want to help simplify what it means to use cardinality in monitoring. The Category column will have a lot of repetition, and it’ll have low or medium cardinality: maybe 50 or 100 different Category values. If there’s a thousand rows in the table, there’ll be a thousand different ProductID values. The ProductID column is going to have high cardinality because it’s probably the primary key of the table, so it’s totally unique. Cardinality in Database ExamplePicture a product description table in an e-commerce database: A lot of distinct values is high cardinality a lot of repeated values is low cardinality. It’s more common to simply talk about “high” and “low” cardinality. We usually don’t talk about cardinality as a number, though. This is all about how many distinct values are in a column. How to Conduct a Database Design Review High and Low Database Cardinality DefinitionThe more important definition of cardinality for query performance is data cardinality. So you’re really talking about the relationship cardinality. ![]() In this sense, cardinality means whether a relationship is one-to-one, many-to-one, or many-to-many. What Is Cardinality in Data Modeling?The first meaning of cardinality is when you’re designing the database-what’s called data modeling. Let’s explore the simple definition first and then dig into why cardinality matters for query performance. For our purposes, one matters a lot more than the other. With this definition of database cardinality in mind, it can mean two things in practice. Repeated values in the column don’t count. When applied to databases, the meaning is a bit different: it’s the number of distinct values in a table column relative to the number of rows in the table. Fear not: I’ve got you, as they say.Ĭardinality’s official, non-database dictionary definition is mathematical: the number of values in a set. But if you don’t know it-and it takes a while to get comfortable with cardinality-it’s super confusing when the DBA just drops it into the middle of a sentence without slowing down. Databases have a lot of jargon, and cardinality is one of those words experienced people tend to forget they didn’t know once upon a time.
0 Comments
Leave a Reply. |