:::info Authors:
(1) Diwen Xue, University of Michigan;
(2) Reethika Ramesh, University of Michigan;
(3) Arham Jain, University of Michigan;
(4) Arham Jain, Merit Network, Inc.;
(5) J. Alex Halderman, University of Michigan;
(6) Jedidiah R. Crandall, Arizona State University/Breakpointing Bad;
(7) Roya Ensaf, University of Michigan.
:::
Table of Links3 Challenges in Real-world VPN Detection
4 Adversary Model and Deployment
5 Ethics, Privacy, and Responsible Disclosure
6 Identifying Fingerprintable Features and 6.1 Opcode-based Fingerprinting
6.3 Active Server Fingerprinting
6.4 Constructing Filters and Probers
7 Fine-tuning for Deployment and 7.1 ACK Fingerprint Thresholds
7.2 Choice of Observation Window N
7.4 Server Churn for Asynchronous Probing
7.5 Probe UDP and Obfuscated OpenVPN Servers
9 Evaluation & Findings and 9.1 Results for control VPN flows
12 Acknowledgement and References
7 Fine-tuning for DeploymentSo far, we have described features that render OpenVPN vulnerable to fingerprinting. We still need to quantify detection thresholds (e.g. ACK fingerprints) for implementation. Furthermore, there are metrics that can affect the system performance, such as packet loss or observation window choice. We seek to fine-tune our system by quantifying these parameters.
\ We use two datasets here. ISP Dataset: we collected a snapshot of network traffic going through a server installed within Merit. Over 45 minutes on July 28, 2021, we sampled 1/30 of all flows passing through the server, resulting in 461 GB of traffic that corresponds to 221,534 flows with
\
\
\ full packet payloads. Refer to § 5 for details on how this data was handled to limit privacy risks. VPN Dataset: we collected traces from 20 commercial VPN providers as well as 2 self-hosted OpenVPN services (Streisand, OpenVPN Access Server) following the automated process described in Section 8. Note the 20 VPN providers do not overlap with the providers used in evaluation. For each provider, we repeated the trace collection process 50 times each in TCP and UDP mode, resulting in a 7.65 GB dataset comprised of 2,200 vanilla OpenVPN traces.
7.1 ACK Fingerprint ThresholdsWe quantify the exact ACK fingerprint based on the ISP and VPN Dataset. We only include flows with at least 150 data packets (15 bins), which leaves us with 24,069 ISP flows and 2,200 VPN flows. A classification decision tree is constructed based on the two labeled sets with weights applied to account for the imbalanced data size. Figure 7 shows the constructed tree (depth and leaf limited, a complete graph can be found in Appendix Figure 12). The ACK fingerprint is a sequence of thresholds based on the derived decision tree, as shown in Table 2. (Bin[i] refers to the number of ACK-size packets for i th Bin.)
\
:::info This paper is available on arxiv under CC BY 4.0 DEED license.
:::
\
All Rights Reserved. Copyright , Central Coast Communications, Inc.