Data Processing

Since day one at bchic, our credo has been: maximum transparency. We want to show exactly how we process and store data. For us, it's clear: We comply with all relevant data protection laws and consistently practice data minimization. That means: We only process and store what's truly necessary, useful, and privacy-friendly.

Let's dive in together and take a close look at what happens when our bchic script lands on your website and how it protects your visitors' privacy.

Imagine this: You've just embedded the bchic Analytics script on your website or activated our plugin in your CMS. The script is now live and ready to capture privacy-compliant website data. As soon as you (or another visitor) open your page in a browser...

Your cookie-free script loads instantly

The bchic embed script (a small JavaScript file) is loaded from our global Content Delivery Network (CDN). This has a huge advantage: The file arrives lightning-fast from a server closest to you, often in just 30 milliseconds.

Once the script is loaded, a page view request is sent to our servers. If the visitor is in the EU, the request goes through our EU isolation to our EU servers. Outside the EU, it runs through our US servers. This request then contains information about the currently visited page and the referring website. Your browser also transmits your IP address and user agent (which contains information about your browser and device). The best part: We don't use cookies. Goodbye, annoying cookie banner blocking half the page!

Our digital fortress: the firewall

We closely monitor how many requests each IP address makes per second and over periods from 1 second to 5 minutes. This helps us fend off DDoS spam attacks. And yes, visitors from the EU are of course still routed through our EU infrastructure.

We fundamentally don't store personal data in our logs, as this contradicts our core values. The only exception where we temporarily store an IP address is when it's classified as an attacker (e.g., during a DDoS attack).

Security checks in detail

We track how many requests your IP address sends to our system. While we already do this at the firewall level, we additionally have our own application logic to protect against spam attacks. The data we capture looks something like this:
111.22.333.444 = 3 requests
444.22.333.444 = 2 requests
The exact retention periods vary and are not publicly communicated for security reasons. But rest assured: We apply strict data minimization here. Your IP count won't be stored longer than 24 hours after your first request, similar to our access log policy.

If we detect minor abuse, the IP address is temporarily blocked at the application level. For more serious abuse, we permanently block the IP address at the firewall level. This means the IP address remains stored to protect our systems. Fortunately, this has only happened once so far.

Are you new here?

Once you've passed our firewall, we need to detect whether this is a new or returning visit to your website. For this, we use a unique visitor detection method we developed in 2019. In short: We use SHA256 hashes (feel free to learn about hashes and salts!) to identify unique visitors over 24 hours, completely without privacy risk.

Here's an example of the data we receive when you visit a website using bchic:
{
  "user_signature": "e5c7a9b0c2d1e6f8a3b5c7d9e1f3a5b7c9d1e3f5a7b9c1d3e5f7a9b1c3d5e7f9",
  "page_request_signature": "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1b2",
  "hostname": "example.com",
  "pathname": "/blog/artikel-123",
  "referrer": "google.com",
  "screen_width": 1920,
  "screen_height": 1080,
  "time_on_page": null
}

One of our most important development principles is not to store your raw data (IP and user agent) together with your browsing activities. That wouldn't be privacy-friendly enough. Instead, we create an anonymized "signature" that we can use for later visits.

Many analytics providers store your raw IP address and user agent directly alongside your browsing activities. We strictly reject this. We will never do that. We only store raw IP addresses for security purposes, and they don't appear in our customer data exports or dashboards. IP addresses are only examined more closely when we're under a DDoS attack and our protection team needs to identify malicious actors.

Here are the details about our hashes:

  • User Signature Hash: This is our base hash. It allows us to anonymously identify a visitor (you) without knowing who you really are. This enables us to capture site-wide unique visits. This hash is stored along with your page views and also serves to filter spam requests. We create it from the following data:
    • Salt: Each website has a unique salt that's renewed daily at midnight. This makes it impossible for a hacker to crack the hashes via brute force. More on brute-forcing below.
    • IP address: Usually unique to your network, self-explanatory. Even though you sometimes use a shared network or proxy, it's usually reliable.
    • User agent: Together with the IP address, the user agent often increases the uniqueness of the user signature.
    • Hostname: The website address (e.g., http://www.examplesite.com). This parameter is crucial because it prevents us from tracking browsing activities across different websites.
  • Page Request Signature Hash: Every time you visit a page, we generate a hash that signals to us that this page was viewed. This allows us to capture unique views at the page level. This hash consists of:
    • User Signature: The user signature is the basis of this hash.
    • Pathname: The requested path (e.g., /blog).

After that, we check whether these hashes already exist in our database. If not, it's a new, unique visitor. If they do, it's a returning visitor. If the hashes are new, we add them; they're then automatically deleted at midnight.

The brilliant thing about this hash system? We (or anyone else) can do absolutely nothing with these hashes. We can't "decrypt" them to see personal details. They're only valuable for the duration of a database existence check. Beyond that, they're completely, wonderfully useless:

Even if someone managed to break into AWS, where our servers are hosted, and get their hands on all our hashes and salt keys, there would still be quadrillions of possible combinations for a brute-force attack. Nobody on this planet has the resources to crack that. We've checked the numbers: Your hacking budget would need to be many times the global gross domestic product, hundreds of trillions of dollars.

We round page views up to the nearest full hour instead of capturing them to the second. Why? Because this prevents our access logs (which contain second-based data) from being matched with our database of browsing activities. We do this deliberately to protect your privacy.

Storing the page view

We collect data for pages, referrers, and additional statistics (e.g., browser statistics, device type statistics, country statistics). We do this to guarantee you a lightning-fast dashboard experience and maximum performance.

An example of how we capture data for page statistics:
{
    id: 232332323234234,
    user_signature: 5f9b9f01f747722565af71b4e602dc6239f050616b2dfa00944db79b84804c32,
    site_id: 1234,
    hostname: http://www.milliondollarhomepage.com
    pathname: /blog
    is_new_visit: true,
    is_new_session: true,
    is_unique: true,
    referrer_hostname: https://bing.com,
    referrer_pathname: /about,
    timestamp: 2021-01-01 00:01:05,
    duration: 0,
}
Note: The dataset mentioned above contains no personal data whatsoever.

The user signature is technically pseudonymized according to GDPR, but that's just a formality. You can't crack a hash like this via brute force. We describe the user_signature as practically anonymous. If you really had hundreds of trillions of dollars, you might be able to crack it, but that's pure theory.
{
    site_id: 1234,
    hostname: https://milliondollarhomepage.com,
    pathname: /blog,
    pageviews: 1,
    visits: 1,
    timestamp: 2021-01-01 00:00:00 (aggregated to current hour)
}
It works similarly for referrer statistics, browser statistics, and so on. For browser statistics, we naturally store browser_name and browser_version instead of hostname and pathname.

Capturing bounces & session duration

When a user leaves a page, we attempt to send a second request. Browser-specific rules can sometimes prevent this, but in most cases, since the most common browsers support this, it works.

When this second request is triggered, it contains all the information from the first request, plus the time spent on the page (in seconds). Since we receive a second request, we also know: It wasn't a bounce. This allows us to update the previous page view (via the user signature), store the duration, mark it as 'not a bounce,' and then update the aggregated tables (page statistics, browser statistics, etc.).

Data collection without privacy intrusion

That's it!

This is how we process your website visit with bchic, and this applies to every single visit that runs through our analytics script. We were able to gain valuable information for your business without violating your visitors' privacy. That's exactly how it should be. We don't need to invade your website visitors' privacy to deliver useful data to you. We've invested thousands of hours to develop and implement the most privacy-friendly methods, and your website visitors will thank you for it.

Now that you've read this, you have a comprehensive insight into how we work at bchic analytics. You might now be wondering how all this fits together with data protection law. We're fortunate to have a first-class, EU-based data protection officer, access to excellent lawyers, and a genuine passion for data protection law.

Here's an overview of the privacy standards we focus on at bchic analytics, our privacy-friendly analytics service:

  • GDPR Compliance
  • Schrems II Compliance
  • ePrivacy Compliance (Cookie Law)
  • PECR Compliance
  • COPPA Compliance
  • CCPA Compliance

Do you have questions about data processing? Contact us anytime!