Exploring the Promise and Challenges of Composable CDPs in Modern Analytics

SukYeon Jung
6 min readAug 16, 2023

In the dynamic landscape of data analytics, enterprises face a mounting challenge in constructing efficient data pipelines to harness behavior data. While I work with this endeavor, three pressing issues have surged to prominence: the loss of the authentic customer journey due to fragmented behavior data and the redundant accumulation of data, driving up operational costs, and inconsistent source of truth.

  1. Limited Data Scope: The initial challenge revolves around the constrained scope of data collection. While data garnered from applications and websites provides valuable insights into user behavior, it fails to capture the holistic customer journey. A significant gap arises when integrating data from diverse sources such as Customer Relationship Management (CRM), Enterprise Resource Planning (ERP), and databases. This disjointed approach culminates in an incomplete depiction of the customer experience. Consider a scenario where a customer traverses a multistage journey — engaging with an Instagram ad campaign, making an online purchase, initiating a product return at a physical store, and subsequently swapping for an alternate model. Relying solely on website data fragments this intricate narrative, leading to the loss of vital Voice of Customer (VoC) insights and an accurate understanding of the customer’s trajectory.
  2. Redundant Data Collection: The subsequent challenge entails the repeated collection of identical data in disparate formats. Imagine a user accessing Company X’s website through a Facebook ad campaign. This interaction triggers behavioral data aggregation through third-party solutions like Appsflyer (for ad performance measurement), Google Analytics 4 (for web behavior analysis), and Braze (for push message delivery). Simultaneously, Company X’s internal data pipeline gathers the same user data to feed its data lake on platforms like Databricks, designated for internal analysis and AI/ML training. This dual-pronged approach, while informative, results in needless redundancy. The divergent tools and methodologies employed yield inconsistent and fragmented data. This data dichotomy not only becomes cumbersome but also engenders inflated storage and processing expenditures.
  3. Inconsistent Source of Truth: Adding to the complexity, the third challenge arises from disparate rules applied when each solution collects events. For instance, varying rules such as session time windows (30 minutes or 1 hour), device IDs, or log-in IDs lead to inconsistencies in fundamental metrics like Daily Active Users (DAU), Monthly Active Users (MAU), conversion rates, and average session times. This incongruity forces each team to grapple with differing metrics and future planning strategies when employing various solutions.

The Customer Data Platform (CDP) emerged as a partial solution to these challenges. It offers a comprehensive suite of features, encompassing:

  • Data Collection: Across web and mobile applications
  • Data Storage: Safeguarding collected data
  • Data Transformation: Preparing data for analysis
  • Integration: Collaboration with third-party solutions

Once web and app events flow into the CDP, it generates user identities. These identities, along with corresponding events, undergo storage and transformation. This rendering makes them accessible for downstream applications, spanning from product analysis for websites and mobile apps to in-app or web marketing automation, and even third-party marketing automation. By streamlining data collection and storage within the CDP, the arduous repetition of data accumulation is circumvented, while a unified data source is harnessed by diverse teams, thus eradicating issues stemming from multiple data collections and disparate user identities.

However, the CDP has inherent limitations. Diagram 1 portrays a key challenge: the presence of two conflicting sources of truth. This incongruence breeds data inconsistency among teams. One truth originates from the CDP, while another emanates from the company’s data warehouse. Typically, marketing and product teams rely on CDP data for real-time analysis and automation advantages. Conversely, the data team favors data from the warehouse, which boasts enhanced data richness and greater flexibility. Additional concerns encompass escalated storage expenses due to duplicate data stored in separate databases, apprehensions regarding security and compliance, and the imposition of a restrictive schema by third-party solutions (the CDP).

[Diagram 1]

Amidst the rapid expansion of data warehouse businesses, driven by their competitive pricing, robust computational capabilities, and streamlined data maintenance, a novel concept has emerged: the Composable Customer Data Platform (CDP). As illustrated in Diagram 2 below, this concept redefines data management. Rather than routing data through a traditional CDP and subsequently syncing it with a data warehouse, companies now opt to directly collect and store all data within their data warehouse. This data is then promptly activated without the need for additional steps. Additionally, the warehouse data is channeled to data sources like Salesforce (via reverse ETL) to enhance application functionality. This approach delivers several benefits, including:

  • Cost Effectiveness: The elimination of redundant data collection and storage leads to substantial cost savings. The company operates with a singular data warehouse housing all data.
  • Security: By housing data within the existing data infrastructure, the company gains full control and governance over its data, bolstering security measures.
  • Compliance: Transitioning data storage from third-party solutions to the company’s data warehouse, as offered by the composable CDP, liberates the company from many data compliance enforcements, including those mandated by GDPR.
  • Flexibility: The composable CDP removes the structural constraints imposed by conventional CDPs, thereby providing newfound flexibility.
[Diagram 2]

To construct the composable CDP, several essential components are integral, including data collection, ETL processes, data warehousing, modeling, and related downstream applications. Table 1 outlines these critical components.

[Table 1]

While the composable CDP presents a promising paradigm, it does possess limitations. Chief among these is the substantial cost and time investment required for its implementation. In many instances, the adoption of standalone Software as a Service (SaaS) solutions or legacy CDPs might prove more expedient and efficient. Another drawback is its inability to facilitate real-time analysis. Given that data must be stored and processed prior to analytical application, the implementation of real-time product analysis becomes challenging.

As businesses navigate the landscape of data management, the merits and drawbacks of various models must be carefully considered. While the composable CDP presents an innovative approach, its practicality depends on factors such as the organization’s resources, goals, and analytical needs. Only time will tell whether the composable CDP will solidify its place as a transformative solution or if other models will rise to prominence, offering distinct advantages to the data analytics arena.

Upon closer examination, the multifaceted nature of various models reveals both advantages and disadvantages. Drawing from my experience in assisting companies with the implementation of data pipelines and analysis solutions, I’ve observed diverse organizations, each at varying stages, making distinct decisions in response to these challenges.

The trajectory that the mainstream will ultimately embrace is a captivating phenomenon to behold. It’s plausible that the current independent Software as a Service (SaaS) players will sustain their dominance, benefiting from their established footing and specialized offerings. Conversely, the rise of composable CDPs and data warehouse-native solutions could reshape the landscape, driven by their potential to simplify data management and foster enhanced governance. A compelling alternative could be the hybrid model, where real-time analytics are fortified through SaaS tools, while the composable CDP’s versatility is harnessed for more comprehensive analysis and intricate model training.

As the data analytics journey continues to unfold, the interplay of innovation, practicality, and industry dynamics will play a pivotal role in shaping the ultimate direction. The choices made by businesses, shaped by their unique needs and aspirations, will collectively carve the path toward a more efficient and insightful data-driven future. The convergence of these diverse possibilities forms an exciting tapestry of possibilities, with each thread contributing to the rich narrative of data analytics evolution.

--

--

SukYeon Jung

Writes about cloud computing, company cultures, and finance