Ever feel like your every move is being watched when you're online? Well, you're not just being paranoid. Nearly 500 of the top 50,000 Web sites on Alexa use so-called "session replay" scripts from third-party companies to record practically everything users do while visiting their sites, according to new research from a team at Princeton University.
Among the activities being tracked and recorded are keystrokes, mouse movements, scrolling behavior, and content viewed from all the pages users visit. The researchers said it's as if "someone is looking over your shoulder" during the entire time you visit each of those sites. They found a total of 482 sites with signs indicating that data about user activity was being recorded and sent to third parties.
That level of tracking is enabled by scripts provided by third-party session replay companies that include Russia's Yandex, U.S.-based FullStory, Malta's Hotjar, U.K.-based UserReplay and SessionCam, the Czech Republic's Smartlook, and Israel-headquartered Clicktale. Among the Web sites using such scripts: WordPress, Microsoft, Adobe, Outbrain, Spotify, RT.com, Rotten Tomatoes, Sears, Costco, Ancestry.com, The Gap, CBS.com, GoFundMe, CodeAcademy, FitBit, Kaspersky.com, and the U.S. Embassy.
Users See No Indication of Recording
Session replay companies tout their scripts as a way for Web sites to improve usability, identify new business and marketing opportunities, and better understand their audiences by viewing visits, "through your customer's eyes," according to Smartlook. However, some companies enable their users to view not just anonymized data about site visits, but identifying information about individual visitors.
"[T]he extent of data collected by these services far exceeds user expectations; text typed into forms is collected before the user submits the form, and precise mouse movements are saved, all without any visual indication to the user," Princeton researchers Steven Englehardt, Gunes Acar, and Arvind Narayanan wrote last week in a blog post summarizing their research findings. "This data can't reasonably be expected to be kept anonymous. In fact, some companies allow publishers to explicitly link recordings to a user's real identity."
Such identifying capabilities could prove to be problematic in regions where regulations and privacy laws limit what kinds of information companies can collect about their customers. For example, the European Union's General Data Protection Regulation, which goes into effect next year, requires businesses to inform customers about the data they collect and, in many cases, seek permission first.
The Princeton researchers found that session replay companies vary widely in how they mask personally identifying text that's typed into forms on their users' Web sites. UserReplay and SessionCam, for instance, automatically replace most user input with an equivalent length of masking text, while Hotjar masks or excludes only passwords and credit card numbers, and Yandex excludes only passwords. The researchers said Web sites using session replay scripts can also specify additional input they don't want recorded, but added such mitigations can be "complicated, error prone and costly."
Exposed Data Increases Online Risks
The researchers said in their blog post that session replay scripts leave many opportunities for things to go wrong and put people's sensitive data at risk.
"Collection of page content by third-party replay scripts may cause sensitive information such as medical conditions, credit card details and other personal information displayed on a page to leak to the third-party as part of the recording," they said. "This may expose users to identity theft, online scams, and other unwanted behavior. The same is true for the collection of user inputs during checkout and registration processes."
While the researchers noted last week that ad-blocking lists EasyList and EasyPrivacy blocked only some of the session replay scripts, those lists have said they now block all of the providers identified in the study. The research team said at least one session replay company, UserReplay, allows users to disable data collection for people who have activated the "Do Not Track" setting in their browsers, but added none of the top sites using that script has done so.
"Improving user experience is a critical task for publishers," according to the researchers. "However, it shouldn't come at the expense of user privacy."