An attacker can identify which streaming video a targeted user is watching on services such as Netflix and Youtube. The attack can be done remotely over the Internet, and requires only that the user (or even someone else on the same local network) visits a suitably-crafted website from his web browser.
- Roei Schuster, Vitaly Shmatikov[PDF] Latest version (9 April 2017):
OverviewBeauty is in the eye of the beholder, and so, to garner what we behold is to glean what we cherish. Today, much of what we behold is video content streamed from the Internet, and our choice of movies, TV shows, news clips and social media videos reveals much about our personality, preferences, socioeconomics, and mood. A bevy of parties seek to exploit this information: whether for monetization by advertisers, insurers, and price-differentiating vendors, or to castigate those who access undesired information.
Encryption of Internet traffic places a hurdle to such monitoring, but it is known that traffic analysis, inspecting merely the size and timing of network traffic, often allows deduction about the traffic's content. How effective is traffic analysis for encrypted streamed videos, and how feasible is it for potential adversaries?
We demonstrate highly effective techniques for how a an attacker can deduce what of videos are watched by a targeted user on streaming services such as Netflix and Youtube, from direct and indirect measurement of encrypted network traffic. We consider three classes of attack scenarios:
On-path attacks, the easiest scenario, involve an adversary who can passively monitor the user's traffic. This includes Internet Service Providers, malicious Wi-Fi access points, proxies, VPNs, routers, and taps on network connections. These have full and accurate visibility of individual packets, and using our techniques, they can identify which encrypted videos are watched by users of the network links they monitor.
To identify videos based on the recorded traffic patterns, we employ "deep learning" techniques using artificial neural networks, trained them on movies from several leading streaming services. The training of the neural-network video classifier needs to be done on videos streamed to a streaming device (or software) similar to the target user, and connected to a similar streaming server. When trained to recognize dozens of titles, our YouTube detector has 0 false positives with 0.988 recall, while the Netflix detector has a false positive rate of 0.0005 with 0.93 recall.
Q1: What services are vulnerable?
We tested our detectors on four services: Netflix, YouTube, Amazon Video,
and Vimeo. We found all of them to be vulnerable to our
Generally, any streaming service using the popular MPEG-DASH standard for streaming over HTTP(S), and specifically the MPEG-DASH content segmentation, is likely to cause an exploitable information leak. The inherent reason is discussed in Q6 below. All four services we tested are therefore vulnerable: YouTube directly follows the standard, while Amazon Video, Netflix, and Vimeo use very close variants which adopt its leaky use of content segmentation.
Q2: How could it be exploited?
Malicious Wi-Fi access points, proxies, routers, enterprise or national network moderators, and ISPs are all able to gain insight on user viewing habits by analyzing encrypted application layer traffic frames.
Censhorship gateways can use this information to block monitored content even under encryption.
All providers of web content accessed by the user, including advertisers, analytics providers and social networks, can also compromise the user's viewing privacy, by leveraging our side-channel attack.
Q3: How is VBR related to video content?
Compression using variable-bitrate (VBR) encoding is designed to use the minimum amount of data needed to represent a scene at a given perceptual quality. This highly dependent on the video content. For example, highly eventful action-scenes, such as the one in the video Iguana vs. Snakes, require a high bitrate to represent. This is easily seen in the diagram below, depicting the bit rate fluctuations as the video progresses through different scenes.
Q4: What are bursts in traffic pattern, and how are they related to video content?
In MPEG-DASH streaming, videos on the server are divided into segments that the video-player client then fetches. This results in an on/off, bursty traffic pattern during video streaming.
To demonstrate how this relates to actual content, we used the aforementioned Iguana video: a high-bitrate portion of the video interleaved with a low-bitrate one. The left-hand figure shows packet sizes along the time axis; observe the pattern of buffering followed by the on/off steady state. The right-hand figure shows the size of traffic bursts; the first, largest burst is the buffer.
Q5: How does the video identification work?
The root cause of information leaks in video streams is that the amount of information needed to represent a video segment, at a given perceived quality, depends on the content of the segment. For example, a nearly-still nature scene, or a talk show where most of the picture is static, can be compressed to a much smaller size than a fast-paced action scene. Streaming services use Variable Bitrate (VBR) compression schemes that take advantage of this, to reduce the amount of transmitted data. Consequentially, their traffic pattern is data-dependent.
One can eliminate VBR encoding, or slow down the adaptive response to change in video compressibility, but this reduces the compression ratio (hence, raises costs and may cause network congestion and buffering delays in video playback).
The VBR pattern is inherently observable in traffic if the duration of the client's buffered video is close to constant (or, more generally, an affine function of presentation time). Thus, one can try to foil analysis by erratically changing the buffer size, though this too reduces network efficiency and increases chance of buffering delay in video playback.
Q7: Can the attack be detected?
The on-path attack is completely passive; it cannot be detectable, neither by the user nor by network-based monitors.
The off-path attacks, in our experiments, were imperceptible to the viewer being targeted: the streaming playback proceeded without interruption despite the extra traffic induced by the attack. This is because the upstream routers try to fairly allocate the bandwidth among the multiple data streams: in our case, the video and attacker's. Since video streaming typically requires less than half of the available bandwidth, it can still be streamed at the requisite average pace. Moreover, whenever a video segment is transmitted, the routers typically give it a higher dynamic priority (i.e., a "fast pass" to the top of the queue), compared than the attacker's stream which has already used up more than its fair share. The extra traffic induced by the attacker will affect the user's total bandwidth usage, similarly to downloading a large file.