f5 labs
26 TopicsWhat Are Scrapers and Why Should You Care?
Introduction Scrapers are automated tools designed to extract data from websites and APIs for various purposes, posing significant threats to organizations of all sizes. They can lead to intellectual property theft, competitive advantage erosion, website/API performance degradation, and legal liabilities. Scraping is one of the top 10 automated threats by OWASP, defined as using automation to collect application content and/or other data for use elsewhere. It impacts businesses across various industries and its legal status varies depending on geographic and legal jurisdictions. What is Scraping? Scraping involves requesting web pages, loading them, and parsing the HTML to extract the desired data and content. Examples of heavily scraped items include: Flights Hotel rooms Retail product prices Insurance rates Credit and mortgage interest rates Contact lists Store locations User profiles Scrapers use automation to make many smaller requests and put the data together in pieces, often with tens of thousands or even millions of individual requests. In the 2024 Bad Bots Review by F5 Labs, scraping bots were responsible for high levels of automation on two of the three most targeted flows, Search and Quotes, throughout 2023 across the entire F5 Bot Defense network. See figure 1 below. In addition, up to 70% of all search traffic originates from scrapers without advanced bot defense solutions. This percentage is based on the numerous proof of concept analyses done for enterprises with no advanced bot controls in place. Scraper versus Crawler or Spider Scrapers are different from crawlers or spiders in that they are mostly designed to get data and content from a website or API. Crawlers and spiders are used to list websites for search engines. Scrapers are designed to extract and exfiltrate data and content from the website or API, which can then be reused, resold, and otherwise repurposed as the scraper intends. Scraping is typically in violation of the terms and conditions of most websites and APIs, with some cases overturning previous rulings. Most scrapers target information on the web, but activity against APIs is on the rise. Business Models for Scraping There are many different parties active in the scraping business, with different business models and incentives for scraping content and data. Figure 2 below provides an overview of the various sources of scraping activity. The scraping industry involves various parties with different business models and incentives for scraping content and data. Search engine companies, such as Google, Bing, Facebook, Amazon, and Baidu, index content from websites to help users find things on the internet. Their business model is selling ads placed alongside search results. Competitors scrape content and data from each other to win customers, market share, and revenue. They use scraping to increase market share, competitive pricing, network scraping, inventory scraping, researchers, and investment firms, intellectual property owners, data aggregators, news aggregators, and AI companies. Competitors scrape pricing and availability of competitor products to win increased market share. Network scraping involves scraping the names, addresses, and contact details of a company's network partners, such as repair shops, doctors, hospitals, clinics, insurance agents, and brokers. Inventory scraping involves stealing valuable content and data from a competing site for use on their own site. Researchers and investment firms use scraping to gather data for their research and generate revenue by publishing and selling the results of their market research. Intellectual property owners use scraping to identify possible trademark or copyright infringements and ensure compliance with pricing and discounting guidelines. Data aggregators collect and aggregate data from various sources and sell it to interested parties. Some specialize in specific industries, while others use scrapers to pull news feeds, blogs, articles, and press releases from various websites and APIs. Artificial Intelligence (AI) companies scrape data across various industries, often without identifying themselves. As the AI space continues to grow, scraping traffic is expected to increase. Criminal organizations often scrape websites or applications for various malicious purposes--including phishing, vulnerability scanning, identity theft, and intermediation. Criminals use scrapers to create replicas of the victim’s website or app, requiring users to provide personal information (PII). They also use scrapers to test for vulnerabilities in the website or application, such as allowing them to access discounted rates or back-end systems. Costs of Scraping Direct costs of scraping include infrastructure costs, server performance, and outages, loss of revenue and market share, and intermediary-driven intermediation. Companies prefer direct relationships with customers for selling and marketing, customer retention, cross-selling, and upselling, and customer experience. However, indirect costs include loss of investment, intellectual property, reputational damage, legal liability, and questionable practices. Scraping can lead to a loss of revenue, profits, market share, and customer satisfaction. Indirect costs include the loss of intellectual property, reputational damage, legal liability, and questionable practices. Companies may lose control over the end-to-end customer experience when intermediaries are involved, leading to dissatisfied customers. Conclusion Scraping is a significant issue that affects enterprises worldwide in various industries. F5 Labs' research shows that almost 1 in 5 search and quote transactions are generated by scrapers. It is usually done by various entities, including search engines, competitors, AI companies, and malicious third parties. These costs result in the loss of revenue, profits, market share, and customer satisfaction. For a deeper dive into the impact of scraping on enterprises and effective mitigation strategies, read the full article on F5 Labs.211Views2likes0CommentsThis Month In Security for October, 2022
This Month In Security is a partnership between F5 Security Incident Response Team's AaronJB (Aaron Brailsford), F5 Labs' David Warburton and Tafara Muwandi and F5 DevCentral's AubreyKingF5. This month's news includes some Supply Chain Security, Guidance from CISA and a worrisome UEFI Bootkit.410Views2likes0CommentsF5 Labs Publishes October Update to Sensor Intel Series
F5 Labs just launched the October installment in our growing Sensor Intel Series. The sensors in question come from our data partners Efflux, and allow us to get a sense of what kinds of vulnerabilities attackers are targeting from month to month. In September, the top-targeted vulnerability was CVE-2018-13379, a credential disclosure vulnerability in various versions of two Fortinet SSL VPNs. While nobody likes to see security tools with vulnerabilities, it is a change from the PHP remote code execution and IoT vulnerabilities that have made up the bulk of the targeting traffic over the last several months. We’ve also debuted a new visualization type for all 41 tracked vulnerabilities, making it a little easier to identify vulnerabilities with dramatic changes in targeting volume. At various times in the last nine months, CVE-2017-18368, CVE-2022-22947, and the vulnerabilities CVE-2021-22986 and CVE-2022-1388 (which are indistinguishable without examining headers in the HTTP request) have all shown growth rates at or near three orders of magnitude over a period of six to eight weeks, making them the fastest growing vulnerabilities since we’ve started this project. Stay tuned for the publication of the October SIS in early November. We are always looking for new CVEs to add and new ways to visualize the attack data.1.4KViews2likes0CommentsSupplement To The 2021 App Protect Report
We frequently get requests to break down threats in a specific vertical. So, as a follow up to the F5 Labs, 2021 Application Protection Report (APR), we analyzed and visualized the attack chains of more than 700 data breaches looking for relationships between sectors or industries and the tactics and techniques attackers use against them. This effort produced the F5 Labs 2021 APR Supplement: Sectors and Vectors, where we found that while there are some attack patterns that correspond with sectors, the relationships appear indirect and partial, and counterexamples abound. The overall conclusion is that sectors can be useful for predicting an attack vector, but only in the absence of more precise information such as vulnerabilities or published exploits. This is because the types of data and vulnerabilities in the target environment, which determine an attacker’s approach, are no longer tightly correlated with the nature of the business. Look for more details about your sector (Finance, Education, Health Care, Scientific, Retail, etc) in the F5 Labs, 2021 APR Supplement: Of Sectors and Vectors.245Views2likes0CommentsWhat is Quantum Computing?
Quantum computing represents a significant shift in information processing. It leverages the principles of quantum mechanics to solve problems far beyond the capabilities of classical computers. Unlike classical computers, which use bits to represent either 0 or 1, quantum computers use qubits. This enables them to exist in multiple states simultaneously through superposition. Additional quantum properties like entanglement and quantum interference further enhance computational efficiency, making quantum systems uniquely equipped to tackle complex, intractable problems. This breakthrough has profound implications for cryptography. Many classical cryptosystems, such as RSA and ECC, rely on mathematical problems that are easy to compute but difficult to reverse without a secret key. Quantum algorithms like Shor’s Algorithm can solve these problems quickly. This makes traditional encryption vulnerable to quantum-based attacks. Similarly, Grover’s Algorithm increases the speed of brute-force searches, halving the effective security of symmetric cryptographic algorithms like AES. Quantum computing has caused the need for new cryptography systems. These systems are designed to protect against attacks from quantum computers. Notably, these systems don’t require quantum properties themselves; instead, they employ mathematical techniques robust against quantum algorithms. For example, lattice-based cryptography is considered one of the most promising approaches for ensuring future-proof security. As quantum computing capabilities progress, experts warn that classical encryption methods may soon reach the end of their "cryptographic cover time," the duration during which encrypted data remains secure. Data intercepted today could be decrypted retroactively by adversaries when quantum threat models mature—a concept referred to as "harvest now, decrypt later." This underscores the urgency of transitioning to quantum-resistant technologies. Post-quantum cryptographic algorithms, combined with hybrid approaches in protocols like TLS, can protect sensitive communications from future quantum threats. Given estimates that functional quantum computers capable of breaking RSA-2048 could emerge within the next decade, governments and organizations are advised to begin implementing these technologies now to ensure long-term data security. For a deeper exploration of quantum computing and its cryptographic implications, read the full F5 Labs article.94Views1like0CommentsThe State of Post-Quantum Crypto (PQC) on the Web
Introduction No one knows exactly when Q-day will arrive, but recent developments have seen the estimated number of Q-bits required to crack traditional encryption plummet from 1 billion in 2012, to 20 million in 2019, to just 1 million as of May 2025.2 Since Google is now predicting that sufficiently powerful quantum computers may be here by 2030, it may already be too late for many organizations to deploy post-quantum cryptography (PQC) to protect their web applications. READ THE FULL REPORT HERE Q-Day Arrival There is a growing disconnect between the rapid pace of advancements in quantum computing and the priority to which CISOs assign to the associated risk. Recent predictions now estimate the arrival of Q-Day (the date when quantum computers become powerful enough to break widely used public key cryptography) will happen as early as 2029. Yet, according to the ISACA Pulse of Quantum Computing poll, only 5% of CISOs say that post-quantum cryptography (PQC) is a ‘high business priority’ for the near future. 1 This report evaluates the current state of PQC adoption among the world’s top 1 million websites and the most commonly used web browsers and devices. Among the top one million websites, only 8.6% support hybrid PQC key exchange mechanisms. This reflects a broad hesitancy to transition and, more worryingly, 25% of websites still do not support TLS 1.3 at all, with 16% failing to implement quantum-resistant symmetric ciphers. Conversely, PQC adoption is more visible among the world’s most popular sites, with 42% of the top 100 supporting it, though this figure drops to 26% for ranks 100–200, and averages just 21.9% across the top 1,000. Support falls further to 13.9% for the top 10,000 sites and 8.4% for the top 100,000. Websites that support post-quantum cryptography The uptake of PQC is particularly low in some of the most security-sensitive sectors. Only 3% of banking websites support PQC, placing the industry among the lowest adopters—even within its own Financials sector (Figure 1). Healthcare and government websites are also lagging. Websites that support post-quantum cryptography (PQC) tend to have stronger overall TLS configurations. They offer fewer and more modern cipher suites while disabling outdated protocols like SSLv3 and TLSv1.0. Those with PQC enabled offered significantly fewer cipher suites (suggesting deliberate hardening) compared to non-PQC sites, which still commonly support weak and obsolete protocols. This contrast highlights PQC support as a strong proxy for broader cryptographic hygiene. Geographically, TLD analysis shows that countries like Australia (.au), Canada (.ca), and the UK (.uk) are leading in PQC deployment when considering both adoption rate and volume. However, when company headquarters are considered, the United States stands out as the global frontrunner. The UK, Canada, and Australia follow closely behind. Browser Support On the client side, browser support plays a major role in overall PQC readiness. While 93% of Chrome requests are PQC-ready, Safari’s lack of support reduces the global readiness rate to just 57%. Firefox, despite accounting for only 2% of requests, sees 85% of its traffic coming from PQC-capable versions. The data suggest that while technical capability for PQC adoption exists—especially given the widespread use of TLS 1.3—the practical rollout is lagging in many critical areas. For organizations with data that must remain confidential well into the future, failing to deploy PQC measures today may already be too late. Conclusion The full report explains the implications of quantum computing on TLS, the current state of PQC standards and protocols, which servers, industries, and geographies have been quickest to adopt, and suggests some steps to take if you have yet to begin your PQC journey. READ THE FULL REPORT HERE230Views1like0CommentsF5 Labs Top CWEs, CWE OWASP Top Ten Analysis, & May 2025 CVE Trends
For May’s vulnerability analysis (https://www.f5.com/labs/articles/threat-intelligence/f5-labs-top-cwes-owasp-top-ten-analysis), we examine the top ten CVEs most targeted, highlighting notable shifts and ongoing trends in exploitation activity. Additionally, we provide analysis of a year's worth of targeted CVE traffic through the lens of primary Common Weakness Enumerations (CWEs) and the OWASP Top Ten categories.194Views1like0CommentsUnderstanding The TikTok Ban, Salt Typhoon and More | AppSec Monthly January Ep.27
In this episode of AppSec Monthly, our host MegaZone is joined by m_heath, Merlyn Albery-Speyer, and AubreyKingF5, as they dive into the latest cybersecurity news. We explore the complexities of the TikTok ban, the impact of geopolitical decisions on internet freedom, and the nuances of data sovereignty. Our experts also discuss the implications of recent breaches by Chinese state actors and the importance of using end-to-end encrypted apps to protect your data. Additionally, we shed light on the fascinating history of internet control and how it continues to evolve with emerging technologies. Stay tuned until the end for insights on the upcoming VulnCon 2025 and how you can participate. Don’t forget to subscribe for more AppSec insights!93Views1like0CommentsContinued Intense Scanning From One IP in Lithuania
Welcome to the September 2024 installment of the Sensor Intelligence Series (SIS), our monthly summary of vulnerability intelligence based on distributed passive sensor data. Below are a few key takeaways from this month’s summary. Scanning for CVE-2017-9841 dropped by 10% (vs. August). CVE-2023-1389 continues to be the most scanned CVE we track, with a 400% increase over August. One IP address continues to be the most observed, accounting for 43% of overall scanning traffic observed. We see a spike in the scanning of CVE-2023-25157, a Critical vulnerability in the GeoServer software project. CVE Scanning Following on from our last month’s analysis, the scanning of CVE-2017-9841 has decreased by 10% compared to August and is down 99.8% from its high-water mark in June of 2024, and nearly vanishing from our visualizations. CVE-2023-1389, an RCE vulnerability in TP-Link Archer AX21 routers, has been the most scanned CVE for the last two months, increasing 400% over August. While this sort of swing in volume may seem remarkable, as we have noticed before, it’s not unusual when we analyze the shape of the scanning for a particular CVE over time. Following Up on an Aberration Last month, a pattern of scanning activity was identified coming from a specific IPv4 address (141.98.11.114), which was suspected to be the BotPoke scanner. Despite a slight decrease in scanning traffic, this IP continued to target the same URIs and regions where our sensors are located, accounting for 43% of the overall scanning traffic observed. A Brief Note on Malware Stagers Observed Our passive sensors, which do not respond to traffic, limit our ability to predict secondary actions after successful exploitation. However, we can show that some CVEs are attempted to be used and downloaded malware stagers. To view an example of the most common URL observed in September attempting to exploit CVE-2023-1389 visit F5 Labs to read the full summary. September Vulnerabilities by the Numbers Figure 1 shows September attack traffic for the top ten CVEs, with CVE-2023-1389 dominating. Increased scanning for this vulnerability throws off the proportionality of this view. However, see the logarithmic scale (figure 3) for an easier view. Figure 2 shows a significant increase in scanning for CVE-2023-1389 over the past year, while a decline in scanning for CVE-2017-9841 persists. Long-Term Trends Figure 3 shows the traffic for the top 19 CVEs, with CVE-2017-8941 and CVE-2023-1389 showing significant increases. The average of the other 110 CVEs has fallen dramatically. CVE-2023-25157, a critical vulnerability in the GeoServer software project, has seen a dramatic increase in scanning. The log scale helps show changes in other top 10 scanned CVEs. To find out more about September’s CVEs and for recommendations on how to stay ahead of the curve in cybersecurity, check out the full article here. We’ll see you next month!147Views1like0Comments