logo EDITE Enrico BOCCHI
Identité
Enrico BOCCHI
État académique
Thèse soutenue le 2017-02-03
Sujet: Mesure et analyse des services de transport pour les applications cloud
Direction de thèse:
Laboratoire:
Voisinage
Ellipse bleue: doctorant, ellipse jaune: docteur, rectangle vert: permanent, rectangle jaune: HDR. Trait vert: encadrant de thèse, trait bleu: directeur de thèse, pointillé: jury d'évaluation à mi-parcours ou jury de thèse.
Productions scientifiques
oai:hal.archives-ouvertes.fr:hal-01253606
Impact of Carrier-Grade NAT on Web Browsing
International audience

Public IPv4 addresses are a scarce resource. WhileIPv6 adoption is lagging, Network Address Translation (NAT)technologies have been deployed over the last years to alleviateIPv4 exiguity and their high rental cost. In particular, Carrier-Grade NAT (CGN) is a well known solution to mask a wholeISP network behind a limited amount of public IP addresses,significantly reducing expenses.

Despite its economical benefits, CGN can introduce connectivityissues which have sprouted a considerable effort in research,development and standardization. However, to the best of ourknowledge, little effort has been dedicated to investigate theimpact that CGN deployment may have on users’ traffic. Thispaper fills the gap. We leverage passive measurements froman ISP network deploying CGN and, by means of the Jensen-Shannon divergence, we contrast several performance metricsconsidering customers being offered public or private addresses.In particular, we gauge the impact of CGN presence on users’web browsing experience.

Our results testify that CGN is a mature and stable technologyas, if properly deployed, it does not harm users’ web browsingexperience. Indeed, while our analysis lets emerge expectedstochastic differences of certain indexes (e.g., the difference inthe path hop count), the measurements related to the qualityof users’ browsing are otherwise unperturbed. Interestingly, wealso observe that CGN protects customers from unsolicited, oftenmalicious, traffic.


6th International Workshop on TRaffic Analysis and Characterization (TRAC) https://hal-institut-mines-telecom.archives-ouvertes.fr/hal-01253606 6th International Workshop on TRaffic Analysis and Characterization (TRAC), Aug 2015, Dubrovnik, Croatia. pp.532-537, 2015ARRAY(0x7f03ffd24f48) 2015-08
oai:hal.archives-ouvertes.fr:hal-01254241
Personal Cloud Storage: Usage, Performance and Impact of Terminals
International audience

Personal cloud storage services such as Dropbox and OneDrive are popular among Internet users. They help in sharing content and backing up data by relying on the cloud to store files. The rise of mobile terminals and the presence of new providers question whether the usage of cloud storage is evolving. This knowledge is essential to understand the workload these services need to handle, their performance, and implications. In this paper we present a comprehensive characterization of personal cloud storage services. Relying on traces collected for one month in an operational network, we show that users of each service present distinct behaviors. Dropbox is now threatened by competitors, with OneDrive and Google Drive reaching large market shares. However, the popularity of the latter services seems to be driven by their integration into Windows and Android. Indeed, around 50% of their users do not produce any workload. Considering performance, providers show distinct trade-offs, with bottlenecks that hardly allow users to fully exploit their access line bandwidth. Finally, usage of cloud services is now ordinary among mobile users, thanks to the automatic backup of pictures and media files.


4th IEEE International Conference on Cloud Networking (IEEE CloudNet 2015) https://hal-institut-mines-telecom.archives-ouvertes.fr/hal-01254241 4th IEEE International Conference on Cloud Networking (IEEE CloudNet 2015), Oct 2015, Niagara Falls, Canada. pp.106-111, 2015ARRAY(0x7f03ffd58aa0) 2015-10
oai:hal.archives-ouvertes.fr:hal-01254716
Network Connectivity Graph for Malicious Traffic Dissection
International audience
Malware is a major threat to security and privacy of network users. A huge variety of malware typically spreads over the Internet, evolving every day, and challenging the research community and security practitioners to improve the effectiveness of countermeasures. In this paper, we present a system that automatically extracts patterns of network activity related to a specific malicious event, i.e., a seed. Our system is based on a methodology that correlates network events of hosts normally connected to the Internet over (i) time (i.e., analyzing different samples of traffic from the same host), (ii) space (i.e., correlating patterns across different hosts), and (iii) network layers (e.g., HTTP, DNS, etc.). The result is a Network Connectivity Graph that captures the overall "network behavior" of the seed. That is a focused and enriched representation of the malicious pattern infected hosts exhibit, purified from ordinary network activities and background traffic. We applied our approach on a large dataset collected in a real commercial ISP where the aggregated traffic produced by more than 20,000 households has been monitored. A commercial IDS has been used to complement network data with alerts related to malicious activities. We use such alerts to trigger our processing system. Results shows that the richness of the Network Connectivity Graph provides a much more detailed picture of malicious activities, considerably enhancing our understanding.
2015 24th International Conference on Computer Communication and Networks (ICCCN) https://hal.archives-ouvertes.fr/hal-01254716 2015 24th International Conference on Computer Communication and Networks (ICCCN), Aug 2015, Las Vegas, United States. pp.1 -- 9 2015, <10.1109/ICCCN.2015.7288435>ARRAY(0x7f0400b19668) 2015-08-06
oai:hal.archives-ouvertes.fr:hal-01346257
Benchmarking personal cloud storage
International audience

Personal cloud storage services are data-intensive applications already producing a significant share of Internet traffic. Several solutions offered by different companies attract more and more people. However, little is known about each service capabilities, architecture and – most of all – performance implications of design choices. This paper presents a methodology to study cloud storage services. We apply our methodology to compare 5 popular offers, revealing different system architectures and capabilities. The implications on performance of different designs are assessed executing a series of benchmarks. Our results show no clear winner, with all services suffering from some limitations or having potential for improvement. In some scenarios, the upload of the same file set can take seven times more, wasting twice as much capacity. Our methodology and results are useful thus as both benchmark and guideline for system design.


ACM Internet Measurement Conference (ACM IMC 2013) ACM Internet Measurement Conference (ACM IMC 2013) https://hal-institut-mines-telecom.archives-ouvertes.fr/hal-01346257 ACM Internet Measurement Conference (ACM IMC 2013), Oct 2013, Barcellona, Spain. ACM Internet Measurement Conference (ACM IMC 2013), pp.205-212, 2013, <10.1145/2504730.2504762>ARRAY(0x7f04023a3528) 2013-10
oai:hal.archives-ouvertes.fr:hal-01346270
Cloud Storage Service Benchmarking: Methodologies and Experimentations
International audience

Data storage is one of today's fundamental services with companies, universities and research centers having the need of storing large amounts of data every day. Cloud storage services are emerging as strong alternative to local storage, allowing customers to save costs of buying and maintaining expensive hardware. Several solutions are available on the market, the most famous being Amazon S3. However it is rather difficult to access information about each service architecture, performance, and pricing. To shed light on storage services from the customer perspective, we propose a benchmarking methodology, apply it to four popular offers (Amazon S3, Amazon Glacier, Windows Azure Blob and Rackspace Cloud Files), and compare their performance. Each service is analysed as a black box and benchmarked through crafted workloads.We take the perspective of a customer located in Europe, looking for possible service providers and the optimal data center where to deploy its applications. At last, we complement the analysis by comparing the actual and forecast costs faced when using each service. According to collected results, all services show eventual weaknesses related to some workload, with no all-round eligible winner, e.g., some offers providing excellent or poor performance when exchanging large or small files. For all services, it is of paramount importance to accurately select the data center to where deploy the applications, with throughput that varies by factors from 2x to 10x. The methodology (and tools implementing it) here presented is instrumental for potential customers to identify the most suitable offer for their needs.


3rd IEEE International Conference on Cloud Networking (IEEE CloudNet 2014) 3rd IEEE International Conference on Cloud Networking (IEEE CloudNet 2014) https://hal-institut-mines-telecom.archives-ouvertes.fr/hal-01346270 3rd IEEE International Conference on Cloud Networking (IEEE CloudNet 2014), Oct 2014, Luxemburg, Luxembourg. 3rd IEEE International Conference on Cloud Networking (IEEE CloudNet 2014), pp.395-400, 2014, <10.1109/CloudNet.2014.6969027>ARRAY(0x7f0400b157e8) 2014-10
oai:hal.archives-ouvertes.fr:hal-01346360
Measuring the Quality of Experience of Web users
International audience

Measuring quality of Web users experience (WebQoE) faces the following trade-off. On the one hand, current practice is to resort to metrics, such as the document completion time (onLoad), that are simple to measure though knowingly inaccurate. On the other hand, there are metrics, like Google’s SpeedIndex, that are better correlated with the actual user experience, but are quite complex to evaluate and, as such, relegated to lab experiments. In this paper, we first provide a comprehensive state of the art on the metrics and tools available for WebQoE assessment. We then apply these metrics to a representative dataset (the Alexa top-100 webpages) to better illustrate their similarities, differences, advantages and limitations. We next introduce novel metrics, inspired by Google’s SpeedIndex, that (i) offer significant advantage in terms of computational complexity, (ii) while maintaining a high correlation with the SpeedIndex at the same time. These properties makes our proposed metrics highly relevant and of practical use.


2016 Workshop on QoE-based Analysis and Management of Data Communication Networks (SIGCOMM Workshops 2016) 2016 Workshop on QoE-based Analysis and Management of Data Communication Networks (SIGCOMM Workshops 2016) https://hal-institut-mines-telecom.archives-ouvertes.fr/hal-01346360 2016 Workshop on QoE-based Analysis and Management of Data Communication Networks (SIGCOMM Workshops 2016), Aug 2016, Florianopolis, Brazil. 2016 Workshop on QoE-based Analysis and Management of Data Communication Networks (SIGCOMM Workshops 2016), pp.37-42 2016, <10.1145/2940136.2940138>ARRAY(0x7f0400b13848) 2016-08
oai:hal.archives-ouvertes.fr:hal-01346615
Statistical Network Monitoring: Methodology and Application to Carrier-Grade NAT
International audience

When considering to passively collect and then process network traffic traces, the need to analyze raw data at several Gbps and to extract higher level indexes from the stream of packets poses typical BigData-like challenges. In this paper, we engineer a methodology to extract, collect and process passive traffic traces. In particular, we design and implement analytics that, based on a filtering process and on the building of empirical distributions, enable the comparison between two generic collections, e.g., data gathered from two different vantage points, from different populations, or at different times. The ultimate goal is to highlight statistically significant differences that could be useful to flag to incidents for the network manager.

After introducing the methodology, we apply it to assess the impact of Carrier-Grade NAT (CGN), a technology that Internet Service Providers (ISPs) deploy to limit the usage of expensive public IP addresses. Since CGN may introduce connectivity issues and performance degradation, we process a large dataset of passive measurements collected from an ISP using CGN for part of its customers. We first extract detailed per-flow information by processing packets from live links. Then, we derive higher level statistics that are significant for the end-users, e.g., TCP connection setup time, HTTP response time, or BitTorrent average download throughput. At last, we contrast figures of customers being offered public or private addresses, and look for statistically significant differences. Results show that CGN does not impair quality of service in the analyzed ISP deployment. In addition, we use the collected data to derive useful figures for the proper dimensioning of the CGN and the configuration of its parameters in order to avoid impairments on end-users’ experience.


Computer Networks "Special issue on Machine learning, data mining and Big Data frameworks for network monitoring and troubleshooting" https://hal-institut-mines-telecom.archives-ouvertes.fr/hal-01346615 Computer Networks "Special issue on Machine learning, data mining and Big Data frameworks for network monitoring and troubleshooting", 2016, <10.1016/j.comnet.2016.06.018>ARRAY(0x7f0400b17fc0) 2016-06
oai:hal.archives-ouvertes.fr:hal-01346612
Personal Cloud Storage Benchmarks and Comparison
International audience

The large amount of space offered by personal cloud storage services (e.g., Dropbox and OneDrive), together with the possibility of synchronizing devices seamlessly, keep attracting customers to the cloud. Despite the high public interest, little information about system design and actual implications on performance is available when selecting a cloud storage service. Systematic benchmarks to assist in comparing services and understanding the effects of design choices are still lacking. This paper proposes a methodology to understand and benchmark personal cloud storage services. Our methodology unveils their architecture and capabilities. Moreover, by means of repeatable and customizable tests, it allows the measurement of performance metrics under different workloads. The effectiveness of the methodology is shown in a case study in which 11 services are compared under the same conditions. Our case study reveals interesting differences in design choices. Their implications are assessed in a series of benchmarks. Results show no clear winner, with all services having potential for improving performance. In some scenarios, the synchronization of the same files can take 20 times longer. In other cases, we observe a wastage of twice as much network capacity, questioning the design of some services. Our methodology and results are thus useful both as benchmarks and as guidelines for system design.


IEEE Transactions on Cloud Computing (IEEE TCC) https://hal-institut-mines-telecom.archives-ouvertes.fr/hal-01346612 IEEE Transactions on Cloud Computing (IEEE TCC), 2015, <10.1109/TCC.2015.2427191>ARRAY(0x7f03ff0b3e58) 2015-04
oai:hal.archives-ouvertes.fr:hal-01351253
Macroscopic View of Malware in Home Networks
International audience

Malicious activities on the Web are increasingly threatening users in the Internet. Home networks are one of the prime targets of the attackers to host malwares, commonly exploited as a stepping stone to further launch a variety of attacks. Due to diversification, existing security solutions often fail to detect malicious activities which remain hidden and pose threats to users security and privacy. Characterizing behavioral patterns of known malwares can help to improve the classification accuracy of known threats. More important, since different malwares can share some commonalities, study the behavior of known malwares can enable the detection of previously unknown malicious activities. We pose the research question if it is possible to characterize such behavioral patterns analyzing the traffic from known infected clients. In this paper, we present our quest to discover such characterizations. Results show that commonalities arise but their identification may require some ingenuity. Also, more malicious activities can be found out from this analysis.


12th Annual IEEE Consumer Communications & Networking Conference (IEEE CCNC'15) 12th Annual IEEE Consumer Communications & Networking Conference (IEEE CCNC'15) https://hal-institut-mines-telecom.archives-ouvertes.fr/hal-01351253 12th Annual IEEE Consumer Communications & Networking Conference (IEEE CCNC'15), Jan 2015, Las Vegas, United States. 12th Annual IEEE Consumer Communications & Networking Conference (IEEE CCNC'15), pp.262 - 266, 2015, <10.1109/CCNC.2015.7157987>ARRAY(0x7f0400b16e28) 2015-01
oai:hal.archives-ouvertes.fr:hal-01351259
MAGMA: Network Behavior Classifier for Malware Traffic
International audience

Malware is a major threat to security and privacy of network users. A large variety of malware is typically spread over the Internet, hiding in benign traffic. New types of malware appear every day, challenging both the research community and security companies to improve malware identification techniques. In this paper we present MAGMA, MultilAyer Graphs for MAlware detection, a novel malware behavioral classifier. Our system is based on a Big Data methodology, driven by real-world data obtained from traffic traces collected in an operational network. The methodology we propose automatically extracts patterns related to a specific input event, i.e., a seed, from the enormous amount of events the network carries. By correlating such activities over (i) time, (ii) space, and (iii) network protocols, we build a Network Connectivity Graph that captures the overall “network behavior” of the seed. We next extract features from the Connectivity Graph and design a supervised classifier. We run MAGMA on a large dataset collected from a commercial Internet Provider where 20,000 Internet users generated more than 330 million events. Only 42,000 are flagged as malicious by a commercial IDS, which we consider as an oracle. Using this dataset, we experimentally evaluate MAGMA accuracy and robustness to parameter settings. Results indicate that MAGMA reaches 95% accuracy, with limited false positives. Furthermore, MAGMA proves able to identify suspicious network events that the IDS ignored.


Computer Networks "Special issue on Traffic and Performance in the Big Data Era" https://hal-institut-mines-telecom.archives-ouvertes.fr/hal-01351259 Computer Networks "Special issue on Traffic and Performance in the Big Data Era", 2016, <10.1016/j.comnet.2016.03.021>ARRAY(0x7f0400b17d08) 2016-04
Soutenance
Thèse: "Mesures du trafic réseau - Applications aux Services Internet et à la Sécurité"
Soutenance: 2017-02-03