DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Trending

  • Unmasking Entity-Based Data Masking: Best Practices 2025
  • AI-Based Threat Detection in Cloud Security
  • How Trustworthy Is Big Data?
  • Streamlining Event Data in Event-Driven Ansible

The HTTP Series (Part 3): Client Identification

In this article, we take a look at the different ways browsers can identify patterns in client behavior, and we go into the code be Cookies.

By 
Vladimir Pecanac user avatar
Vladimir Pecanac
·
Aug. 04, 17 · Analysis
Likes (9)
Comment
Save
Tweet
Share
11.9K Views

Join the DZone community and get the full member experience.

Join For Free

Up until now, you learned about the basic concepts and some of the architectural aspects of HTTP. This leads us to the next important subject to the HTTP: client identification.

In this article, you’ll learn why client identification is important and how can Web servers identify you (your Web client). You will also get to see how that information is used and stored.

This is what we have learned so far, and where we are now:

  • The HTTP series (Part 1): Overview of the basic concepts
  • The HTTP series (Part 2): Architectural aspects
  • The HTTP series (Part 3): Client Identification
  • The HTTP series (Part 4): Authentication mechanisms
  • The HTTP series (Part 5): Security
  • The HTTP Reference

First, let’s see why websites need to identify you.

Client Identification and Why It’s Extremely Important

As you are most definitely aware, every website, or at least those that care enough about you and your actions, include some form of content personalization.

What do I mean by that?

Well, that includes suggested items if you visit an e-commerce website, or “the people you might know/want to connect with” on social networks, recommended videos, ads that almost spookily know what you need, news articles that are relevant to you, and so on.

This effect feels like a double edged sword. On the one hand, it’s pretty nifty having personalized, custom content delivered to you. On the other hand, it can lead to Confirmation bias that can result in all kinds stereotypes and prejudice. There is an excellent Dilbert comic that touches upon Confirmation bias.

Yet, how can we live without knowing how our favorite team scored last night, or what celebrities did last night?

Either way, content personalization has become part of our daily lives we can’t, and we probably don’t even want to, do anything about it.

Let’s see how web servers can identify you to achieve this effect.

Different Ways to Identify the Client

There are several ways that a web server can identify you:

  • HTTP request headers
  • IP address
  • Long URLs
  • Cookies
  • Login information (authentication)

HTTP Request Headers Used for Identification

Web servers have a few ways to extract information about you directly from the HTTP request headers.

Those headers are:

  • From – contains user’s email address if provided.
  • User-Agent – contains information about the web client.
  • Referer – contains the source the user came from.
  • Authorization – contains username and password.
  • Client-ip – contains user’s IP address.
  • X-Forwarded-For – contains user’s IP address (when going through the proxy server).
  • Cookie – contains server-generated ID label.

In theory, the From header would be ideal to uniquely identify the user, but in practice, this header is rarely used due to the security concerns of email collection.

The user-agent header contains the information like the browser version or operating system. While this is important for customizing content, it doesn’t identify the user in a more relevant way.

The Referer header tells the server where the user is coming from. This information is used to improve the understanding of user behavior, but less so to identify it.

While these headers provide some useful information about the client, it is not enough to personalize content in a meaningful way.

The remaining headers offer more precise mechanisms of identification.

IP Address

The method of client identification by IP address has been used more in the past when IP addresses weren’t so easily faked/swapped. Although it can be used as an additional security check, it just isn’t reliable enough to be used on its own.

Here are some of the reasons why:

  • It describes the machine, not the user.
  • NAT firewalls – many ISPs (Internet service providers) use NAT firewalls to enhance security and deal with IP address shortage
  • Dynamic IP addresses – users often get the dynamic IP address from the ISP
  • HTTP proxies and gateways – these can hide the original IP address. Some proxies use Client-ip or X-Forwarded-For to preserve the original IP address

Long (fat) URLs

It is not that uncommon to see websites utilize URLs to improve the user experience. They add more information as the user browses the website until URLs look complicated and illegible.

You can see what the long URL looks like by browsing Amazon store.

https://www.amazon.com/gp/product/1942788002/ref=s9u_psimh_gw_i2?ie=UTF8&fpl=fresh&pd_rd_i=1942788002&pd_rd_r=70BRSEN2K19345MWASF0&pd_rd_w=KpLza&pd_rd_wg=gTIeL&pf_rd_m=ATVPDKIKX0DER&pf_rd_s=&pf_rd_r=RWRKQXA6PBHQG52JTRW2&pf_rd_t=36701&pf_rd_p=1cf9d009-399c-49e1-901a-7b8786e59436&pf_rd_i=desktop

There are several problems when using this approach.

  • It’s ugly.
  • Not shareable.
  • Breaks caching.
  • It’s limited to that session.
  • Increases the load on the server.

Cookies

The best client identification method up to date excluding authentication. Developed by Netscape, but now every browser supports them.

There are two types of cookies: session cookies and persistent cookies. A session cookie is deleted upon leaving the browser, and persistent cookies are saved on disk and can last longer. For the session cookie to be treated as the persistent cookie, Max-Age or Expiry property needs to be set.

Modern browsers like Chrome and Firefox can keep background processes working when you shut them down so you can resume where you left off. This can result in the preservation of the session cookies, so be careful.

So how do the cookies work?

Cookies contain a list of name=value pairs that the server sets using Set-Cookie or Set-Cookie2 response header. Usually, the information stored in a cookie is some kind of client id, but some websites store other information as well.

The browser stores this information in its cookie database and returns it when the user visits the page/website again. The browser can handle thousands of different cookies and it knows when to serve each one.

Here is an example flow.

1. User Agent -> Server

POST /acme/login HTTP/1.1
[form data]

The user agent identifies itself via the form input.

2. Server -> User Agent

HTTP/1.1 200 OK
Set-Cookie2: Customer="WILE_E_COYOTE"; Version="1"; Path="/acme"

The server sends the Set-Cookie2 response header to instruct the user agent (browser) to set the information about the user in a cookie.

3. User Agent -> Server

POST /acme/pickitem HTTP/1.1
Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"
[form data]

The user selects the item to put the shop basket.

4. Server -> User Agent

HTTP/1.1 200 OK
Set-Cookie2: Part_Number="Rocket_Launcher_0001"; Version="1"; Path="/acme"

Shopping basket contains an item.

5. User Agent -> Server

POST /acme/shipping HTTP/1.1
Cookie: $Version="1"; Customer="WILE_E_COYOTE"; $Path="/acme"; 
        Part_Number="Rocket_Launcher_0001";
[form data]

The user selects the shipping method.

6. Server -> User Agent

HTTP/1.1 200 OK
Set-Cookie2: Shipping="FedEx"; Version="1"; Path="/acme"

New cookie reflects the shipping method.

7. User Agent -> Server

POST /acme/process HTTP/1.1
Cookie: $Version="1";
        Customer="WILE_E_COYOTE"; $Path="/acme";
        Part_Number="Rocket_Launcher_0001"; $Path="/acme";
        Shipping="FedEx"; $Path="/acme"
[form data]

That’s it.

There is one more thing I want you to be aware of. Cookies are not perfect either. Besides security concerns, there is also a problem with cookies colliding with the REST architectural style (the section about misusing cookies).

You can learn more about cookies in the RFC 2965.

Conclusion

This wraps it up for this part of the HTTP series.

You have learned about the strengths of content personalization as well as its potential pitfalls. You are also aware of the different ways that servers can use to identify you. In part 4 of the series, we will talk about the most important type of client identification: authentication.

Thank you for reading and feel free to leave the comment below.

References

  • The HTTP reference: https://code-maze.com/the-http-reference
  • The HTTP series part 1: https://code-maze.com/http-protocol-overview-part1
  • The HTTP series part 2: https://code-maze.com/http-series-part-2
  • HTTP: The Definitive Guide: http://shop.oreilly.com/product/9781565925090.do
  • Confirmation bias explained: https://en.wikipedia.org/wiki/Confirmation_bias
  • REST anti-patterns: https://www.infoq.com/articles/rest-anti-patterns
  • Cookies RFC: https://www.ietf.org/rfc/rfc2965.txt

Published at DZone with permission of Vladimir Pecanac. See the original article here.

Opinions expressed by DZone contributors are their own.

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!