Project Overview

Problem

Since the inception of our analytics function, we had been storing our marketing and communications data for our reports in spreadsheets. Although this method worked in the short-term, keeping data decentralized in spreadsheets limited our ability to conduct complex marketing analysis and to overcome data size limits.

Objective

This project aimed to create a centralized BigQuery database to store all current and historical marketing and communications data. There are several use cases for this database. First, its primary use case is to be used to conduct ad hoc analysis to answer more complex questions around our marketing and communications data, particularly around cross-channel analysis, at the campaign level. The goal is to reduce the amount of analysis time by using SQL to improve the efficiency of retrieving and calculating data. Second, this database is intended to be used as a backend data source for future marketing reports in Looker Studio.

The Approach and Process

1. Conceptual Design

Defining Entities

The first step in the database design process was defining my entities and how they would systematically relate to each other to make a relational database. The way that my marketing and communications data was currently structured was by tactic. So for example, all email performance data lived in a single spreadsheet, which lived separately from organic social data. These different tactics would each become a fact entities that would become the foundation of my database. The campaign and tags tables serve as dimension entity that serve to relate the fact entities together.

Entity Relationship Diagram (ERD)

marcomm_database_ERD.PNG

This is the latest version of the entity relationship diagram for the marcomm database. Entities color coded in green represent fact entities; entities color coded in blue represent dimension entities; and entities color coded in purple represent bridge entities.

Defining Relationships

When I defined the relationships between fact entities and dimension entities in this database, I used a simple question to determine those relations:

Does Entity A have many Entity B or does Entity B have many Entity A?

For example, I used this question structure when determining the relationship between billboards and campaigns.

Does a single billboard have many campaigns or does a single campaign have many billboards?

In our instance, a single campaign has many billboards, so this would be represented as a one-to-many relationship in the ERD.

I applied this question structure when defining relationships between my fact entities and dimension entities. Many of the relationships in this database have a one-to-many relationship simply because that is how our campaigns are structured in our office. It’s important to note that because the fact entities stored tactic information such as earned media or paid digital, there weren’t any logical relationships between two fact entities.

The specific instance of a many-to-many relationship in my database was between the campaigns and tags dimension entities.