Identification

Identification is key in doing web analytics. We define the following “objects” in web analytics.

  1. A person
  2. A user
  3. A session
  4. An event

These objects are hierarchically linked.

A person, which is a real natural person, can be multiple users.

A user is a browser and when you visit a website on both your laptop and tablet, you will be 2 users.

A user can have multiple sessions.

A session can have multiple events.

Person

As mentioned above, a person is a real natural person. In web analytics it is almost impossible to really identify a person. That is why we use proxies to identify a person. Commonly used proxies are:

  • Email
  • Customer ID
  • Phone number

Note that these variables are personally identifiable and you should process these identifiers with caution. You should at least hash these parameters before you save these identifiers with a third party. For more information, read our hashing documentation.

Even though you hash a variable, for yourself it is still personal identifiable information. Therefore, if a user wants you to delete their data, you will have to make sure that all third parties delete the users data. This can often be difficult.

At Harvest we have a function called aliasing. Aliasing will be done before that is stored in our Harvest Store. It makes it much easier to comply with privacy regulations. For more information, read the aliasing documentation.

User

A user is below a person in the identification hierarchy. A user is created on the browser level and is identified by the userID that is stored in the harvest_user cookie. More information about how Harvest manages users, can be found in the user management documentation.

Since a user is identified based on a userID in the cookie, this has the following impacts:

  • When a person visits multiple domains on the same browser, the person will be identified as different users. A user is identified per browser per domain, because we use a first party cookie.
  • When a person visits the website on multiple devices, each device will be a different user.
  • When a person visits the website on the same device through multiple browsers, each browser will be a different user.

When many users perform actions across multiple domains, devices or browsers, you can imagine that this has impact on your analytics. To overcome this issue, there are several solutions, which are, unfortunately, all not perfect.

Cross domain tracking

The moment you link to your own domains, it is possible to tag each domain with the userID of the origin domain. This makes sure that the target domain will know the userID from the origin. This way we can link the two userIDs and we now the user is the same person.

More information can be found in our cross domain tracking documentation.

User stitching

As mentioned in the person part above, it is possible to use a proxy for a person and sent the ID to identify a person yourself. This way, we can link several userIDs based on that single ID. For more information read our user stitching documentation.

Session

A session is a grouping of events based on time. This way we can identify which time blocks a user visited the site. A session is identified by a sessionID which is stored, with other information, in the harvest_session cookie. With each event we are able to sent both the userID and the sessionID. Without any help we can therefore match the user and the session.

For more information about how we handle sessions, read our session management documentation.

Event

The lowest level is the event, the actual action a user performed. Each event is identified with an eventID. Each event contains the eventID, sessionID and userID. This way all of these IDs can be linked and we know exactly which event occured in which session and by which user. When there is also an identifier to match a person, we can even link the events to that person.