If you've already checked out our article what data Faraday expects then you're well on your way to understanding what is specifically needed for prediction modeling in Faraday's system.

However, to drive home these best practices, here are some hypothetical examples to help understand how we bridge the gap between the incoming data we take and the models that you help design to predict your desired outcomes!

Even if you seem to understand the concept of an event stream it is always nice to see visual examples that show the shape of the data as you look at it in a database or spreadsheet.

Faraday has 4 base-level data points we utilize when we're processing any particular stream of events that you give us:

  • date (datetime)

  • value (i.e. - monetary value of said event per your business)

  • product(s)

  • channel (e.g. - "acquisition source")

Note: Not limited only to the above - give us all data points you desired to, which are meaningful to your business outcome!

Example 1: item-level data

In the above example, the uniqueness of each row is based on the *item* within an event. Therefore, we can see that events are duplicated for each item present in an event (look at the eventids).

Example 2: customer-level data

In the above example, the uniqueness of each row is based on the *customer* record and their rolled-up totals and last event date for summary analysis. There is no indication of when each unique event occurred.

Example 3: event-level data

In the above example, the uniqueness of each row is based on the *event* that occurred. Notice how the column for the products in the order is in a list format, separated by commas only.

Now we do understand there are a couple assumptions here:

  • The event examples above are based on orders. What if your business doesn't specifically operate on orders? No problem, you may simulate this same data for any specific event stream that constitutes a individual's behavior in your system:

    • insurance policies started

    • emails clicked or bounced

    • leads created

    • investments made


  • We also assume your product set within the file is made up of 10-20 (max) easily-readable grouped categories. These are high-indexing across the historical events you provided, so ideally they have coverage across most of the events that have happened.

    • overall you might have 1000 different products

    • these products may need to be mapped from SKUs or pattern-matched according to some rule you have:

      • "Has the word 'deluxe' in the title."

      • "SKUs beginning with "AM-" are our armchairs."

...if you need to map or group your products in a more concise manner that might not be a simple pattern you can elucidate, a "SKU mapping" spreadsheet can supplement your data! We will take your mapping spreadsheet and join it direct to your data as if you provided it in the main dataset.


Why would Faraday be interested in all the metadata that accompanies a particular event record (i.e. - value, channel, product, etc)? Well...you may read more about how we use these features to roll-up data by individual, here.

Did this answer your question?