How the F5 SIRT Looks for Malware
Introduction Welcome to another article by F5 SIRT Security Engineer Kyle Fox. Today we explore the ways that F5 SIRT looks for malware on suspected compromised systems. We will explore some iHealth heuristics and one of the common ways malware is hidden on systems. First, we need to get some administrative matters out of the way. The F5 SIRT and F5 Support are not able to provide computer forensics services. There are a number of reasons for this including licensing, certifications and legal issues. We can assist in finding indicators of compromise, identifying suspect logs or data in QKViews, and answering questions from forensic analysts about F5 products and how they work. If you suspect that you have a compromised F5 system, you can Contact the F5 SIRT by opening up a service request and/or follow the guidance in K11438344: Considerations and guidance when you suspect a security compromise on a BIG-IP system. iHealth Heuristics H511618: Malware may have infected the BIG-IP system This heuristic compares process names with a list of known-good process names on BIG-IP systems. It does not always alert when malware has infected the system and also sometimes it alerts on expected processes such as networked FIPS HSM drivers and others. If it is seen, more investigation may be needed. There are three other heuristics that also help identify how a system was compromised, or if your looking to secure a system, identify how to secure it: H444724: The management interface is allowing access from public IP addresses This heuristic is checking the IP addresses of connections to the management interface and looking for public IP addresses. This will identify when the BIG-IP is getting such access, even if such connections are being IP or port forwarded to a private IP address on the BIG-IP. H458565: Public IP addresses configured as a BIG-IP Self IP This heuristic will identify when public IP addresses are configured as a Self-IP and not properly secured using port lockdown. See K17333: Overview of port lockdown behavior (12.x - 17.x) for details on Port Lockdown. H727910: The configuration contains user accounts with insecure passwords This heuristic looks for default and common passwords in the user accounts. Typically this will hit on admin or root being set to something like "admin", "default", or "root". But there are a limited number of other common passwords it will look for. Remember to change the password on admin and root to a stronger password even if you disable those accounts. Typical Malware What the F5 SIRT has observed is the typical malware we see does flooding, cryptocurrency mining, or both. This malware is typically named with random characters and is hidden by deleting the executable after it is run. Attackers may stop at deploying such malware or may continue to try and infiltrate the victims networks from there. Typically they avoid implanting malware when using the F5 system to pivot into a network. One way we can look for this malware without using the iHealth heuristics is by looking for executables that are deleted, in the lsof command. They look like this: From iHealth -> Commands -> Standard -> UNIX -> Networking -> lsof -n (text only) COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME [...] xDwJc 23367 root cwd DIR 253,12 4096 106664 /var/service/restjavad xDwJc 23367 root rtd DIR 253,9 1024 2 / xDwJc 23367 root txt REG 253,9 282 30733 /tmp/xDwJc (deleted) xDwJc 23367 root 0r CHR 1,3 0t0 1028 /dev/null xDwJc 23367 root 1r CHR 1,3 0t0 1028 /dev/null xDwJc 23367 root 2r CHR 1,3 0t0 1028 /dev/null xDwJc 23367 root 3u sock 0,7 0t0 2873878973 protocol: TCP xDwJc 23367 root 4u a_inode 0,10 0 6105 [eventfd] So, lets review what we see here. The main thing we are looking for is the line with the FD column named txt. This is the executable file the process is running from, and we see in the NAME column it says (deleted). In Linux, a file exists after it’s been deleted as long as it’s still open by one or more processes, so deleting the file does not cause the process to end. Malware writers have used this to hide malware from antimalware software that scans the filesystem for malicious executables. Let’s review all the columns in this listing: COMMAND - The command being executed. PID - The process ID, we will use this later. USER - The user the command is running as. FD - The file descriptor type. Here we see: cwd - Current working directory. rtd - Root directory. txt - Program executable. <number><letter> - file descriptors. The letter can be r for read, w for write and u for both. TYPE - Type of filesystem node associated with the file: DIR - Directory. REG - Regular File. CHR - Character special file, in this case a device called null. sock - A socket. a_inode - An anonymous inode (file system node) in this case a connection to the event forwarder daemon. DEVICE - The device numbers for the file. NODE - Node number of the file. NAME - The file name. From what we can see, the working directory is /var/service/restjavad. This strongly implies that the malware got on the system using an exploit for iControl REST, but it may also have gotten on the system through a longer route through some other part of TMUI. If you have a forensic analyst or otherwise want to analyze the malware, you can copy the executable even though it’s deleted by copying from /proc/<PID>/exe, in this case: cp /proc/23367/exe /var/tmp/malwarefile Typical Ways Intruders Break In In most cases of compromised systems, the attackers got in because the system had an Internet-facing Self-IP that was not secured. We recommend that you do not use Internet-facing Self-IPs for HA or administration, and to set Port Lockdown on Internet-facing Self-IPs to Allow None. See K17333: Overview of port lockdown behavior (12.x - 17.x) for details on Port Lockdown. Attackers typically use TMUI or iControl rest vulnerabilities to gain access to BIG-IP systems. The two major ones we see are CVE-2020-5902 and CVE-2022-1388, but many other vectors exist, so organizations should strive to keep their BIG-IPs software up to date. Attackers also try default passwords, and we have noted some systems that have default passwords in production. Make sure to change the admin and root passwords to strong ones on your BIG-IPs, even if you disable those accounts. Attackers who are trying passwords often will look for open SSH ports to try them against, so the same guidance above of not exposing TMUI/REST to the internet applies to SSH. Outro I hope this was helpful. If your dealing with a compromised system, you may find my article on Inspecting a UCS file from a compromised BIG-IP to be helpful in recovering from a compromise. Finally, again, if you suspect that you have a compromised F5 system, you can Contact the F5 SIRT by opening up a service request and/or follow the guidance in K11438344: Considerations and guidance when you suspect a security compromise on a BIG-IP system.9Views0likes0CommentsGC Document AI Transitive Access Abuse, make-me-root holes in VMWare fixed and more
Hello! ArvinF is your editor for this week's edition of TWIS covering 15-21 Sept 2024. Let's dive in. Google Cloud Document AI flaw (still) allows data theft despite bounty payout Google Cloud's Document AI service could be abused by data thieves to break into Cloud Storage buckets and steal sensitive information. Traxler of Vectra AI detailed this attack in research published alongside a proof-of-concept (POC) demonstrating how Document AI's access controls were bypassed, swiped a PDF from a source Google Cloud Storage bucket, altered the file and then returned it. https://www.vectra.ai/blog/transitive-access-abuse-data-exfiltration-via-document-ai https://github.com/KatTraxler/document-ai-samples/tree/main During batch processing, the service uses a Google-managed service account called a service agent. It's used as the identity in batch processing, and it ingests the data and outputs the results. Therein lies the problem.. The pre-set service agent permissions are too broad, and in batch-processing mode the service uses the service agent's permissions, not the caller's permissions. The permissions granted to the service agent allow it to access any Google Cloud Storage bucket within the same project, thus allowing the service to move data that the user normally wouldn't have access to. "This capability enables a malicious actor to exfiltrate data from GCS to an arbitrary Cloud Storage bucket, bypassing access controls and exfiltrating sensitive information," Traxler wrote. "Leveraging the service (and its identity) to exfiltrate data constitutes transitive access abuse, bypassing expected access controls and compromising data confidentiality." Google's initial assessment thru their Vulnerability Reward Program was the researcher's report did not "meet the bar for a financial reward". The researcher did receive an acknowledgement. Google changed the status of the reported bug as "fixed" and rewarded the bounty. However, follow up checks by the researcher showed that it can still be abused. Good on the researcher for validating the fix and providing feedback to ensure that the flaw cannot be abused. https://www.theregister.com/2024/09/17/google_cloud_document_ai_flaw/ VMware patches remote make-me-root holes in vCenter Server, Cloud Foundation Broadcom has emitted a pair of patches for vulnerabilities in VMware vCenter Server that a miscreant with network access to the software could exploit to completely commandeer a system. This also affects Cloud Foundation. The first flaw, CVE-2024-38812, is a heap overflow vulnerability in the Distributed Computing Environment/Remote Procedure Calls (DCERPC) system that could be exploited over the network to achieve remote code execution on unpatched systems. Corrupting the heap could allow an attacker to execute arbitrary code on the system. Broadcom rates it as a critical fix and it has a CVSS score of 9.8 out of 10. The second one, CVE-2024-38813, is a privilege escalation flaw that ranks a CVSS score of 7.5 and one that VMware-owned Broadcom rates as important. Someone with network access to VMware's vulnerable software could exploit this to gain root privileges on the system. Broadcom chose to pair the flaws together in its advisory and FAQ https://support.broadcom.com/web/ecx/support-content-notification/-/external/content/SecurityAdvisories/0/24968 https://blogs.vmware.com/cloud-foundation/2024/09/17/vmsa-2024-0019-questions-answers/ The discovery of both flaws stemmed from the Matrix Cup Cyber Security Competition, held in June in China, which was organized by 360 Digital Security Group and Beijing Huayunan Information Technology Company. Over 1,000 teams competed to report holes in products for $2.75 million in prizes. Zbl and srs of Team TZL at Tsinghua University were credited with discovering the bugs, which were disclosed to Broadcom to patch. https://web.archive.org/web/20240708061854/https:/360.net/about/news/article66836ac56ddf08001f91a723#menu The team bagged the competition's Best Vulnerability Award, along with a $59,360 payday, showing once again that bug bounties and competitive hacking really work. Disclosing vulnerabilities responsibly to affected vendors helps the vendor to fix the flaw and in turn help their customer base. It has a ripple effect - organizations running secure software minimizes their attack surface and contributes to the overall security of the services offered and data being protected. Chinese national accused by Feds of spear-phishing for NASA, military source code A Chinese national has been accused of conducting a years-long spear-phishing campaign that aimed to steal source code from the US Army and NASA, plus other highly sensitive software used in aerospace engineering and military applications. At least some of the spears hit their targets, and some of this restricted software made its way to China, according to a Department of Justice announcement and an indictment https://www.justice.gov/opa/pr/justice-department-announces-three-cases-tied-disruptive-technology-strike-force https://regmedia.co.uk/2024/09/16/song_wu_indictment.pdf The DoJ claims Song was employed as an engineer at Aviation Industry Corporation of China (AVIC), a Chinese state-owned aerospace and defense conglomerate headquartered in Beijing. While in that role, Song allegedly started to send phishing emails around January 2017 and continued through December 2021. One email cited in the indictment – sent on April 28, 2020 from one such "imposter email account" to "Victim 2" – requested NASCART-GT, which appears to be used in NASA projects. The email read: "Hi [Victim 2], I sent Stephen an email for a copy of NASCART-GT code, but got no response right now. He must be too busy. Will you help and sent (sic) it to me?" Some of the scams worked, according to the DoJ. While the indictment doesn't detail exactly what sensitive IP Song is alleged to have stolen, it does note that: "In some instances, the targeted victim, believing that defendant SONG … was a colleague, associate, or friend requesting the source code or software electronically transmitted the requested source code or software to defendant Song." If snared and convicted, Song faces a maximum penalty of 20 years in prison for each count of wire fraud. He also faces two-year penalties in prison for each count of aggravated identity theft. The age old technique of spear-phishing has been effective for a very long time. Granted, the spear-phishing activities were done 7 years ago and perhaps, organizations by now would have implemented technologies and safe guards against spear-phishing. Organizations should have implemented Security Awareness training on this as well. The victims of the spear-phishing on this report are likely very technical people in their fields which reminds us that we should always be vigilant and have that security mindset to identify potential spear-phishing attempts and report it per respective organizations IT policies. Security is everyone's responsibility and as end users that may potentially be targetted by such spear-phishing attempts, care and healthy dose of suspicion should be applied to suspicious looking emails. If in doubt, ask - ask and follow defined policies by your IT organization. https://www.theregister.com/2024/09/17/chinese_national_nasa_phishing_indictment/ 23andMe settles class-action breach lawsuit for $30 million Also: Apple to end NSO Group lawsuit; Malicious Python dev job offers; Dark web kingpins busted; and more Filed in a San Francisco federal court indicate 23andMe will fork over the pot of money to settle claims from any of the 6.4 million US citizens (per court documents) whose data was stolen during the incident. The settlement includes an agreement to provide three years of privacy, medical and genetic monitoring. https://regmedia.co.uk/2024/09/13/23andme-settlement.pdf 23andMe, which offers genetic testing services, suffered from a massive data breach in 2023 that saw millions of its customers' data stolen and put up for sale on the dark web. https://www.theregister.com/2023/10/19/latest_23andme_data_leak_takes/ It is never good to have personal information leaked as it opens up the opportunity for it to be used for fraud in the future, putting the original owner in a potentially uncomfortable scenarios. 30M split among the 6.4M affected users is roughly under 5 dollars. Having the privacy, medical and genetic monitoring included in the settlement helps. It would have been better if the breach did not happen in the first place. Apple drops suit against NSO Group Worried the case might ultimately do more harm than good, Apple has moved to drop its lawsuit against Pegasus spyware maker NSO Group. https://www.theregister.com/2021/11/23/apple_nso_group/ https://www.theregister.com/2024/03/01/nso_pegasus_source_code/ Court documents filed by Apple last Friday indicate the fruit cart is worried that the discovery process against Israel-based NSO Group would see sensitive Apple data reach in NSO and companies like it – enabling the creation of additional spyware tools used by nation states. https://www.theregister.com/2023/05/30/nso_owner_hacking/ Organizations would have to do what protects their interest. I will leave it at that. Beware that job offer, Pythonista: It could be a malware campaign Malware campaigns that mimic skills tests for developers are nothing new, but this one targeting Python developers is. Reported by researchers at ReversingLabs, the malware uses a similar tactic to previously spotted campaigns that try to trick developers into downloading malicious packages masquerading as skills tests. After the victim compiles the code and solves whatever problems the packages contain, their system is infected. https://www.reversinglabs.com/blog/fake-recruiter-coding-tests-target-devs-with-malicious-python-packages https://www.theregister.com/2023/10/04/lazarus_group_lightlesscan_malware_upgrade/ As reported, North Korean threat actors have been behind several campaigns using fake job offers to infect systems with backdoors and infostealers. In previous campaigns it's been fake jobs at Oracle, Disney or Amazon used as lures – this time it appears the attackers are posing as financial services firms. https://www.theregister.com/2022/03/25/chrome_exploits_north_korea/ I remember a similar news a few months back, likely this one, https://www.bleepingcomputer.com/news/security/fake-job-interviews-target-developers-with-new-python-backdoor/ where it also involves a fake job interview and the goal is to drop/install a RAT - remote access trojan. As in any engagements, due care should be done when installing or downloading and executing files from unknown sources. Also, be vigilant and confirm and verify that who you are talking to - in this case, a job interview - is indeed who they claim to be. Dark web kingpins indicted A pair of Russian and Kazakh nationals have been arrested and charged in connection to running dark web markets, forums and training facilities for criminals. Kazakhstani Alex Khodyrev and Russian Pavel Kublitskii were arrested in Miami and charged with conspiracy to commit access device fraud and conspiracy to commit wire fraud , elated to a site they ran for a decade called wwh[.]club[.]ws. https://www.justice.gov/usao-mdfl/pr/russian-and-kazakhstani-men-indicted-running-dark-web-criminal-marketplaces-forums-and WWH Club users could buy and sell stolen personal information, discuss best practices for conducting various types of illegal activity, and even take courses on how to commit fraud and other crimes. Khodyrev, Kublitskii and others involved in the site "profited through membership fees, tuition fees, and advertising revenue," the DoJ alleged. Good on the authorities taking down this fraudulent group. The stolen data, in my opinion is the most important information as it opens up opportunities for fraud activities and taking down the site lessens the chances for the already stolen data to further spread among fraud groups. https://www.theregister.com/2024/09/16/security_in_brief/ In closing I hope the news I shared has been educational and kept you up to date. If this is your first TWIS, you can always read past editions. You can also check out all of the content from the F5 SIRT. Thank You and till next time.. Stay safe and secure.78Views3likes0CommentsHow to Identify and Manage Scrapers (Pt. 2)
Introduction Welcome back to part two of the article on how to identify and manage scrapers. While part one focused on ways to identify and detect scrapers, part two will highlight various approaches to prevent, manage, and reduce scraping. 9 Ways to Manage Scrapers We'll start by highlighting some of the top methods used to manage scrapers in order to help you find the method best suited for your use case. 1. Robots.txt The robots.txt file on a website contains rules for bots and scrapers, but it lacks enforcement power. Often, scrapers ignore these rules, scraping data they want. Other scraper management techniques are needed to enforce compliance and prevent scrapers from ignoring these rules. 2. Site, App, and API Design to Limit Data Provided to Bare Minimum To manage scrapers, remove access to desired data, which may not always be feasible due to business-critical requirements. Designing websites, mobile apps, and APIs to limit or remove exposed data effectively reduces unwanted scraping. 3. CAPTCHA/reCAPTCHA CAPTCHAs (including reCAPTCHA and other tests) are used to manage and mitigate scrapers by presenting challenges to prove human identity. Passing these tests grants access to data. However, they cause friction and decrease conversion rates. With advancements in recognition, computer vision, and AI, scrapers and bots have become more adept at solving CAPTCHAs, making them ineffective against more sophisticated scrapers. 4. Honey Pot Links Scrapers, unlike humans, can see hidden elements on a web page, such as form fields and links. Security teams and web designers can add these to web pages, allowing them to respond to transactions performed by scrapers, such as forwarding them to a honeypot or providing incomplete results. 5. Require All Users to be Authenticated Most scraping occurs without authentication, making it difficult to enforce access limits. To improve control, all users should be authenticated before data requests. Less motivated scrapers may avoid creating accounts, while sophisticated scrapers may resort to fake account creation. F5 Labs published an entire article series focusing on fake account creation bots. These skilled scrapers distribute data requests among fake accounts, adhering to account-level request limits. Implementing authentication measures could discourage less-motivated scrapers and improve data security. 6. Cookie/Device Fingerprint-Based Controls To limit user requests, cookie-based tracking or device/TLS fingerprinting can be used, but they are invisible to legitimate users and can't be used for all users. Challenges include cookie deletion, collisions, and divisions. Advanced scrapers using tools like Browser Automation Studio (BAS) have anti-fingerprint capabilities including fingerprint switching, which can help them bypass these types of controls. 7. WAF Based Blocks and Rate Limits (UA and IP) Web Application Firewalls (WAFs) manage scrapers by creating rules based on user agent strings, headers, and IP addresses, but are ineffective against sophisticated scrapers who use common user agent strings, large numbers of IP addresses, and common header orders, making WAFs ineffective. 8. Basic Bot Defense Basic bot defense solutions use JavaScript, CAPTCHA, device fingerprinting, and user behavior analytics to identify scrapers. They don't obfuscate signals collection scripts, encrypt, or randomize them, making it easy for sophisticated scrapers to reverse engineer. IP reputation and geo-blocking are also used. However, these solutions can be bypassed using new generation automation tools like BAS and puppeteer, or using high-quality proxy networks with high reputation IP addresses. Advanced scrapers can easily craft spoofed packets to bypass the defense system. 9. Advanced Bot Defense Advanced enterprise-grade bot defense solutions use randomized, obfuscated signals collection to prevent reverse engineering and tamper protection. They use encryption and machine learning (ML) to build robust detection and mitigation systems. These solutions are effective against sophisticated scrapers, including AI companies, and adapt to varying automation techniques, providing long-term protection against both identified and unidentified scrapers. Scraper Management Methods/Controls Comparison and Evaluation Table 1 (below) evaluates scraper management methods and controls, providing a rating score (out of 5) for each, with higher scores indicating more effective control. Control Pros Cons Rating Robot.txt +Cheap +Easy to implement +Effective against ethical bots -No enforcement -Ignored by most scrapers 1 Application redesign +Cheap -Not always feasible due to business need 1.5 CAPTCHA +Cheap +Easy to implement -Not always feasible due to business need 1.5 Honey pot links +Cheap +Easy to implement -Easily bypassed by more sophisticated scrapers 1.5 Require authentication +Cheap +Easy to implement +Effective against less motivated scrapers -Not always feasible due to business need -Results in a fake account creation problem 1.5 Cookie/fingerprint based controls +Cheaper than other solutions +Easier to implement +Effective against low sophistication scrapers -High risk of false positives from collisions -Ineffective against high to medium sophistication scrapers 2 Web Application Firewall +Cheaper than other solutions +Effective against low to medium sophistication scrapers -High risk of false positives from UA, header or IP based rate limits -Ineffective against high to medium sophistication scrapers 2.5 Basic bot defense +Effective against low to medium sophistication scrapers -Relatively expensive -Ineffective against high sophistication scrapers -Poor long term efficacy -Complex to implement and manage 3.5 Advanced bot defense +Effective against the most sophisticated scrapers +Long term efficacy -Expensive -Complex to implement and manage 5 Conclusion There are many methods of identifying and managing scrapers, as highlighted above, each with its pros and cons. Advanced bot defense solutions, though costly and complex, are the most effective against all levels of scraper sophistication. To read the full article in its entirety, including more detail on all the management options described here, head over to our post on F5 Labs.32Views0likes1CommentHow to Identify and Manage Scrapers (Pt. 1)
Introduction The latest addition in our Scraper series focuses on how to identify and manage scrapers, but we’ll be splitting up the article into two parts. Part one will focus on outlining ways to identify and detect scrapers, while part two will focus on tactics to help manage scrapers. How to Identify Scraping Traffic The first step in identifying scraping traffic involves detecting various methods based on the scraper’s motivations and approaches. Some scrapers, like benign search bots, self-identify for network and security permission. Others, like AI companies, competitors, and malicious scrapers, hide themselves, making detection difficult. More sophisticated approaches are needed to combat these types of scrapers. Self-Identifying Scrapers There are several scrapers that announce themselves and make it very easy to identify them. These bots self-identify using the HTTP user agent string, indicating explicit permission or belief in providing valuable service. These bots can be classified into three categories. Search Engine Bots/Crawlers Performance or Security Monitoring Archiving Several scraper websites offer detailed information on their scrapers, including identification, IP addresses, and opt-out options. It's crucial to review these documents for scrapers of interest, as unscrupulous scrapers often impersonate known ones. Websites often provide tools to verify if a scraper is real or an imposter. Links to these documentation and screenshots are provided in our full blog on F5 Labs. Many scrapers identify themselves via the user agent string. A string is usually added to the user-agent string that contains the following. The name of the company, service or tool that is doing the scraping A website address for the company, service or tool that is doing the scraping A contact email for the administrator of the entity doing the scraping Other text explaining what the scraper is doing or who they are A key way to identify self-identifying scrapers is to search the user-agent field in your server logs for specific strings. Table 1 below outlines common strings you can look for. Table 1: Search strings to find self-identifying scrapers (* is a wildcard) Self Identification method Search String Name of the tool or service *Bot * or *bot* Website address *www* or *.com* Contact Email *@* Examples of User Agent Strings OpenAI searchbot user agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; OAI- SearchBot/1.0; +https://openai.com/searchbot Bing search bot user agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; + http://www.bing.com/bingbot.htm) Chrome/ These scrapers have both the name of the tool or service, as well as the website in the user-agent string and can be identified using two of the methods highlighted in Table 1 above. Impersonation Because user agents are self-reported, they are easily spoofed. Any scraper can pretend to be a known entity like Google bot by simply presenting the Google bot user agent string. We have observed countless examples of fake bots impersonating large known scrapers like Google, Bing and Facebook. As one example, Figure 1 below shows here the traffic overview of a fake Google scraper bot. This scraper was responsible for almost a hundred thousand requests per day against a large US hotel chain’s room search endpoints. The bot used the following user-agent string, which is identical to the one used by the real Google bot. Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) (765362) IP-based Identification Scrapers can identify themselves via their IP addresses. Whois lookups can reveal the IP address of a scraper, their organization, or their registered ASNs. While not revealing the identity of the actual entity, they can be useful in certain cases. Geolocation information can also be used to identify automated scraping activity. Reverse DNS lookups help identify the scraper’s identity by using the Domain Name System (DNS) to find the domain name associated with an IP address, which can be identified by using free online reverse DNS lookup services. Since IP address spoofing is non-trivial, identifying and allowlisting scrapers using IP addresses is more secure than simply using user agents. Artificial Intelligence (AI) Scrapers Artificial intelligence companies are increasingly using internet scraping to train models, causing a surge in data scraping. This data is often used for for-profit AI services, which sometimes compete with scraping victims. Several lawsuits are currently underway against these companies. A California class-action lawsuit has been filed by 16 claimants against OpenAI, alleging copyright infringement due to the scraping and use of their data for model training. Due to all the sensitivity around AI companies scraping data from the internet, a few things have happened. Growing scrutiny of these companies has forced them to start publishing details of their scraping activity and ways to both identify these AI scrapers as well as ways to opt out of your applications being scraped. AI companies have seen an increase in opt-outs from AI scraping, resulting in them being unable to access the data needed to power their apps. Some less ethical AI companies have since set up alternative “dark scrapers” which do not self-identify, and instead secretly continue to scrape the data needed to power their AI services. Unidentified Scrapers Most scrapers don't identify themselves or request explicit permission, leaving application, network, and security teams unaware of their activities on Web, Mobile, and API applications. Identifying scrapers can be challenging, but below you'll find two techniques that we have used in the past that can help identify the organization or actors behind them. To view additional techniques along with an in-depth explanation of each, head over toour blog post on F5 Labs. 1. Requests for Obscure or Non-Existent Resources Website scrapers crawl obscure or low-volume pages, requesting resources like flight availability and pricing. They construct requests manually, sending them directly to airline origin servers. Figure 2 shows an example of a scraper that was scraping an airline’s flights and requesting flights to and from a train station. 2. IP Infrastructure Analysis, Use of Hosting Infra or Corporate IP Ranges (Geo Location Matching) Scrapers distribute traffic via proxy networks or bot nets to avoid IP-based rate limits, making it easier to identify them. Some of these tactics include: Round-robin IP or UA usage Use of hosting IPs Use of low-reputations IPs Use of international IPs that do not match expected user locations The following are additional things to keep in mind when trying to identify scrapers. We provide an in-depth overview of each in our full article on F5 Labs. Conversion or look-to-book analysis Not downloading or fetching images and dependencies but just data Behavior/session analysis Conclusion We discussed two methods above that might be helpful in identifying a scraper. However, keep in mind that it's crucial to take into account the type of scraper and the sort of data they are targeting in order to correctly identify it. To read the full article on identifying scrapers, which includes more identification methods, head on over toour post on F5 Labs. Otherwise, continue on to part two where we’ll outline tactics to help manage scrapers.41Views1like0CommentsExploring the Zero Trust Models of AWS, Microsoft, and Google
In today’s world of distributed workforces, cloud services, and sophisticated cyber threats, the traditional security approach where everyone inside the network is trusted has become obsolete. The Zero Trust Model has emerged as the new paradigm, enforcing strict identity verification, granular access control, and continuous monitoring for all users, devices, and resources, regardless of their location. Big Cloud providers such as AWS (Amazon Web Services), Microsoft, and Google have each adopted their own version of Zero Trust architecture. In this article we will understand the basics of the mentioned Zero Trust models, its key principles and components. What is Zero Trust? Zero Trust is a security framework based on the principle of "never trust, always verify." Unlike traditional network security models, Zero Trust does not assume that users or devices inside the network are inherently trustworthy. Instead, every user, device, and request must be constantly verified and approved. They must be given the minimum amount of access they need based on their identity and security status. AWS Zero Trust Model The AWS Zero Trust model is a security framework that aims to protect resources by enforcing strict verification, regardless of whether access requests originate from inside or outside a traditional network perimeter. It focuses on continuous validation of trust, treating every access attempt as potentially hostile unless explicitly authenticated and authorized. To get more clearer picture, let's understand the key principles, components followed by an example. Key Principles: Identity-Centric Approach: AWS shifts the security focus to identities (users, devices, services), ensuring that every entity is authenticated and authorized for each action. Least Privilege Access: Access permissions are granted based on the minimal necessary level, reducing the impact of compromised accounts. Context-Aware Access: AWS evaluates additional signals like location, device health, and behavior before granting access to resources. Segmented and Isolated Resources: AWS employs segmentation to isolate workloads, limiting lateral movement if one component is compromised. Continuous Monitoring and Logging: AWS integrates real-time monitoring and logging to detect suspicious activities and adjust security policies dynamically. Key Components: AWS Identity and Access Management (IAM): Central to AWS's Zero Trust model, IAM allows you to manage fine-grained permissions and define access control policies for each user, role, and resource. Multi-Factor Authentication (MFA): AWS uses MFA to enforce stronger identity verification, requiring users to authenticate using something they know (password) and something they have (token or device). AWS CloudTrail and GuardDuty: These services provide continuous monitoring, logging, and threat detection, identifying unusual behavior and potential security risks. Encryption and Secure Communications: AWS enforces encryption both in transit and at rest, ensuring data integrity and confidentiality, with access controlled by encryption keys managed through AWS Key Management Service (KMS). Zero Trust Network Access (ZTNA): AWS offers solutions such as AWS PrivateLink and VPC endpoints to secure and isolate traffic, minimizing exposure to public networks. Example: Imagine an organization running an e-commerce platform on AWS, where sensitive customer data is stored in databases and accessed by employees, services, and third-party APIs. Instead of trusting access by default, AWS Zero Trust ensures that every access request is verified at each stage. If an employee attempts to access a customer database: IAM and MFA verify the employee’s identity and enforce role-based access control, ensuring they only have the necessary permissions. Device and Location Verification: AWS checks if the employee is using a trusted device from an expected location, applying additional security measures if an anomaly is detected (For example., logging in from an unusual location). Network Isolation: AWS VPC and PrivateLink ensures that database traffic remains isolated, preventing lateral movement even if other systems are compromised. Logging and Monitoring: AWS CloudTrail logs the access attempt, while AWS GuardDuty monitors for any suspicious behavior like abnormal data access patterns. If a threat is detected, the system can revoke access or trigger an alert. In this way, AWS Zero Trust minimizes the risk of unauthorized access and data breaches, providing continuous protection of resources, whether they are inside or outside the traditional network boundary. Microsoft Zero Trust Model The Microsoft Zero Trust model is built on real Microsoft features that work together to protect data and resources by eliminating implicit trust. The model continuously verifies identities, devices, and access requests across the entire environment, ensuring security for both internal and external access. To find out more about Microsoft's Zero Trust model, let's understand the key principles, components followed by an example. Key Principles: Verify Explicitly: Always authenticate and authorize using all available data points like identity, location, device, health, service, and anomaly detection. Least Privileged Access: Enforce least privilege by granting only the minimum level of access necessary for users to perform their tasks. Assume Breach: Operate with the mindset that a breach has already occurred, and implement strategies to limit lateral movement, detect anomalies, and mitigate risks. Key Components: Azure Active Directory (Azure AD): Azure AD provides identity verification through Single Sign-On (SSO), Multifactor Authentication (MFA), and Conditional Access, which adapts access policies based on the user’s context (For example., location, device compliance, or risk score). Microsoft Intune: For managing devices, Intune enforces compliance policies, ensuring that only secure and compliant devices can access resources. Through Mobile Device Management (MDM) and Mobile Application Management (MAM), it provides control over both corporate-owned and personal devices (BYOD). Microsoft Defender for Endpoint: This tool ensures device security by providing endpoint detection and response (EDR), identifying vulnerabilities and threats on devices, and enforcing security baselines. It continuously monitors and responds to potential breaches or compromised endpoints. Azure Information Protection (AIP): AIP helps protect sensitive data by classifying and labeling information. It also provides encryption and access control, ensuring data protection both at rest and in transit, regardless of where it is stored or shared. Microsoft Defender for Identity: This component integrates identity protection by continuously analyzing user activities and network signals to detect suspicious behaviors, compromised accounts, or insider threats. Microsoft Defender for Cloud: This feature secures cloud and hybrid infrastructure. It provides threat protection, vulnerability assessments, and compliance management across Azure and non-Azure environments, helping enforce Zero Trust principles on cloud workloads. Azure Sentinel: This is Microsoft's cloud-native Security Information and Event Management (SIEM) system, which provides intelligent security analytics and threat detection. It helps detect, prevent, and respond to security incidents by correlating data across multiple sources. Microsoft Endpoint Manager: This includes Intune and Configuration Manager, allowing centralized management of devices and applications while enforcing Zero Trust policies related to device compliance and security. Azure Network Security: Features like Azure Firewall, Azure DDoS Protection, Network Security Groups (NSGs), and Azure Private Link provide network-level segmentation and protection. These services prevent unauthorized lateral movement and secure network traffic through encryption and micro-segmentation. Example: Suppose a finance team member attempts to access a critical business application from a remote location. Here's how Microsoft's Zero Trust model enforces security: Identity Verification: Azure AD ensures the user's identity through MFA. A Conditional Access policy checks the user’s device compliance (managed through Intune) and location. If the login attempt is from an unusual place, additional security measures (like an extra MFA prompt) are applied. Device Compliance: Microsoft Defender for Endpoint checks if the user’s device meets security baselines (For example., updated OS, antivirus enabled). If the device is not compliant, access to the application is blocked or restricted until remediation. Access Control: Azure AD’s Conditional Access ensures that the user can only access the business application and not any other sensitive resources they don't need. Least-privilege access ensures this by restricting permissions based on role. Data Protection: Azure Information Protection encrypts any sensitive data accessed, preventing it from being exposed or mishandled even if downloaded or shared. AIP also tracks and audits access to the data. Monitoring and Threat Detection: Azure Sentinel continuously monitors the access session, using Microsoft Defender for Identity to detect any unusual or risky behavior (For example., multiple login attempts from different locations). If suspicious activity is detected, security alerts are triggered for investigation. In this way Microsoft features into the Zero Trust model ensures end-to-end protection, validating every access request and continuously monitoring for threats across identities, devices, data, and networks. Google Zero Trust Model (BeyondCorp) The Google Zero Trust model, also known as BeyondCorp, is a security framework that eliminates the need for a traditional network perimeter. Instead of assuming that internal networks are inherently secure, Google’s approach treats every access request—whether from within the corporate network or outside—as potentially risky. The model enforces “never trust, always verify” and emphasizes verifying users and devices at every step before granting access. Key Principles: Verify Every Access Request: Regardless of network location, every access request must be authenticated and authorized, using strong identity verification and device checks. Least Privilege Access: Limit user and device access to the minimum necessary, ensuring they can only access the resources required for their specific role. Continuous Monitoring: Continuously monitor users, devices, and behaviors to detect and respond to suspicious activity in real-time. Device Trust: Assess the security posture of the device before granting access, ensuring that only trusted, compliant devices are used. Key Components: Google Identity: Google’s identity system forms the basis of Zero Trust, enforcing strong identity verification with features like Single Sign-On (SSO) and Multi-Factor Authentication (MFA). It ensures that every user is authenticated before access is granted, whether the request originates from inside or outside the network. Access Proxy: This component of BeyondCorp acts as an intermediary between users and resources. Every access request is routed through this proxy, which enforces security policies and checks the identity, context, and device posture before granting access. Device Inventory and Management: Google maintains a detailed inventory of devices accessing corporate resources, ensuring that only compliant, up-to-date devices can connect. Device posture (For example., security patches, encryption status) is continuously assessed to maintain trust. Context-Aware Access: This feature dynamically adjusts access policies based on the user’s identity, device health, location, and risk factors. Google’s Access Control Policies are applied in real time, allowing access only if all conditions meet security requirements. Encryption and Secure Communication: All communication between users and resources is encrypted, ensuring data integrity and confidentiality. Google enforces encryption in transit and at rest for data protection. Continuous Monitoring and Threat Detection: Google uses extensive logging, monitoring, and machine learning to detect anomalies and security risks in real-time, enabling fast response to potential threats. Example: Imagine a scenario where a Google employee wants to access a sensitive cloud-based internal application while working from a public coffee shop. In a traditional security model, the internal network might trust access if the employee used a VPN. In Google’s Zero Trust model, no such implicit trust exists. Here’s how Google’s Zero Trust model would work: Identity and Device Verification: The employee attempts to log in through Google’s SSO, where their identity is verified using MFA. BeyondCorp checks if the device being used is a trusted, compliant device by consulting Google’s Device Inventory. If the device is missing a security update or is not encrypted, access is denied until the device is compliant. Context-Aware Access: Google’s Access Proxy examines additional context, such as the employee’s location (public Wi-Fi network) and device posture. Because the user is accessing from an untrusted network, the system applies stricter security policies. The employee may be asked for additional verification, such as a second MFA prompt, or have restricted access to only specific parts of the application. Real-Time Monitoring: While the employee is logged in, Google continuously monitors the session for any suspicious behavior, such as unusual data access patterns or changes in device posture. If abnormal activity is detected, Google’s system triggers an alert and can immediately terminate the session to prevent data compromise. Secure Access: Even while accessing sensitive data, the entire communication is encrypted both in transit and at rest, ensuring that no data is exposed on the public Wi-Fi network. Google’s encryption standards protect all data during access. In this way, Google's Zero Trust model ensures verification of identity, device, and context at every step and significantly reduces the risk of unauthorized access and breaches. I hope after reading the article up to this point, you are looking for information on F5 Zero Trust Security. I have collected links to some of the very good articles available on DevCentral and F5, which will definitely help you. Zero Trust Solutions What Is Zero Trust Security & Architecture? Secure Corporate Apps with a Zero Trust Security Model Zero Trust in an Application-Centric World Zero Trust - Making use of a powerfull Identity Aware Proxy Zero Trust Access with F5 Identity Aware Proxy and Crowdstrike Falcon | DevCentral Leverage Microsoft Intune endpoint Compliance with F5 BIG-IP APM Access - Building Zero Trust strategy Zero Trust building blocks - Leverage NGINX Plus Single Sign-On (SSO) with F5 XC Web App & API Protection (WAAP) Zero Trust building blocks - F5 BIG-IP Access Policy Manager (APM) and PingIdentity72Views1like0Comments(HTTP) Redirection via Arbitrary Host Header
Does that title sound familiar to you? It is something we see through in support cases; quite often when a customer has had a PCI audit or penetration test conducted against their web properties. It sounds alarming, but often has a very simple cause, and protecting against it is often also quite simple! What is the Host header? If we go way back to the earliest webservers and HTTP/1.0, RFC1945 didn’t include a specification for a Host header. Instead, it was assumed that the host (IP address) receiving the request was the only intended destination, and that the server was only serving a single website. Obviously, it became apparent to the architects of the modern world-wide web (Tim Berners-Lee and all the others named in the HTTP RFCs) that more flexibility was required, specifically, the ability for a single target IP address to host more than one website under more than one domain (OK, there’s more to it than that – the role of Proxies is also important here, but irrelevant to our current discussion.) To enable that, the “Host:” header was added to RFC2616, the HTTP/1.1 specification document, which would allow a single server to understand which “virtual host” an incoming request was destined for and, through that, serve multiple domains on one system. There are two ways to satisfy that requirement of HTTP/1.1: By sending a “Host:” header along with the request, specifying the desired target (see fig. 1.1) By sending an “Absolute URI” rather than a relative one, with the URI containing the hostname (see fig. 1.2) (See Section 19.6.1.1 of RFC2616 for more information) GET /index.html HTTP/1.1<CRLF> Host: www.example.com<CRLF> <CRLF> Fig 1.1: An example HTTP/1.1 request with Host header GET http://www.example.com/index.html HTTP/1.1<CRLF> <CRLF> Fig 1.2: An example HTTP/1.1 request with Absolute URI What could go wrong? Quite a lot of things, it turns out! There are all sorts of potential problems – many or most of which are now, thankfully, fixed in all of the common webserver and proxy software available today, but still, we must be wary of things like: Host header confusion If a request includes both a Host: header and an Absolute URI, which is used (the RFC is clear here) and do all systems in the request path agree? Server-Side Request Forgery (SSRF) attacks By including special characters (like @) in a URI, can we coerce a proxy to forward on a request which has been modified in an unexpected fashion? Password reset attacks An attacker might be able to abuse the password reset functionality on a legitimate website by manipulating the Host header, causing the website to send a manipulated, malicious password reset link to the victim’s user account contact details, thereby tricking the victim into visiting a phishing website rather than the legitimate site. Web cache poisoning attacks This is a large and complex topic and relates to much more than just the Host header, but a system which trusts a manipulated Host header may make cache poisoning easier for an attacker to perform. Malicious redirects Finally, we arrive at the topic which started this whole article: malicious redirects to an arbitrary destination. Let’s dive into that one more deeply than the others… Redirection via Arbitrary Host Header Let’s be honest for a moment – the real problem here isn’t that you can cause the target system to generate a redirect to an injected host. That’s perhaps not ideal but doesn’t describe any kind of vulnerability; an attacker can’t manipulate the host header on a victim’s system (without having already compromised the victim’s system in some way) and can’t have the reflected, malicious, host header sent to anyone but themselves… ...Unless they can. In the real world, utilizing such a flaw means carrying out one of the other kinds of attack I mentioned earlier; perhaps you can trigger the server to send a redirect (a 302 response with a Location: header) to your arbitrary malicious destination and cause that response to be cached by an intermediate proxy to be subsequently served to other users? Now you’ve poisoned a web cache and anyone you send to the legitimate site via a phishing attack will ultimately be redirected to your malicious domain. Alternatively, the over-trust in the Host header, shown by its use in the responses Location header, might just be a pointer to an attacker, letting the attacker know that they should try to get the vulnerable system to emit the malicious host in other content, like a password reset email. So, what am I saying? I’m saying that the “Redirection via Arbitrary Host Header Manipulation” result we commonly see in vulnerability scans is not, in and of itself, necessarily something to be alarmed about. An attacker being able to send a manipulated redirect back to themselves is next to useless, but it’s a pointer indicating a system might be vulnerable to other attacks that a scanner can’t easily determine in an automated fashion. Unfortunately for us, it’s also often a PCI audit failure, even if the application architecture isn’t vulnerable in a meaningful way. How do we fix it? In part, that depends on why you’re seeing the problem in the first place, so let’s examine some common scenarios: iRules It’s quite common to redirect from HTTP to HTTPS using an iRule – there’s even a built-in iRule on BIG-IP called _sys_https_redirect for that purpose – and without any other checks, the following kind of rule will result in a redirect being generated for whatever host name was received (in other words, you’ll get dinged for “Redirection via Arbitrary Host Header Manipulation” on your audit): when HTTP_REQUEST { HTTP::redirect https://[getfield [HTTP::host] ":" 1][HTTP::uri] } You could fix this by hard-coding the redirect response, of course, and having a single iRule per target application, and that is the most secure option assuming each virtual server only handles traffic for one application; something like this: when HTTP_REQUEST { HTTP::redirect https://www.example.com/[HTTP::uri] } If you need to support multiple applications per virtual server, then your next-best option would be to use a Data Group to define the valid allowed hostnames and then only redirect if the incoming Host header matches one of the hosts in the data group. There’s an excellent answer for this by Kai Wilke, here: https://community.f5.com/discussions/technicalforum/handling-www-with-host-name-redirects-in-irule/27048/replies/27050 BIG-IP Local Traffic Policies It is also quite common to use Local Traffic Policies to redirect HTTP requests, for example to perform an HTTP-to-HTTPS redirect in a more performant way than an iRule. You can still achieve safety here by using the same techniques as for iRules; define the redirect rule to only act when expected host names are received and to drop all other traffic, e.g.: BIG-IP Advanced WAF (ASM) To make preventing this kind of vulnerability incredibly easy, BIG-IP Advanced WAF has a feature called “HTTP redirection protection” which can be configured and enabled on any ASM policy. Configuring it is quite straightforward and is described in K04211103: Configuring HTTP redirection protection; just remember to make sure you have enabled blocking for the policy and enabled Block for the “Illegal redirection attempt” violation under Policy Building->Learning and Blocking Settings! NGINX For NGINX, you just need to be careful when setting up any redirects and use a hard-coded host element rather than taking the resulting hostname from the incoming (potentially attacker-supplied) host header. In other words, don’t do this: location / { return 302 https://$host$request_uri; } Do this instead: location / { return 302 https://example.com$request_uri; } Something else to point out here – it’s very common for administrators to use ‘$uri’ when constructing redirects, but doing so can open you up to header injection and/or response splitting; be sure to use ‘$request_uri’ instead, whenever possible. That’s all for now! That’s all I’m going to cover in this article – there are other ways you can be vulnerable to open redirects (for example if you take an HTTP parameter and use that to construct a subsequent redirect) which aren’t covered here and are a much broader topic. For this article, I chose to concentrate only on the exact report we see across so many PCI audits and vulnerability scans. I will say, though, that BIG-IP Advanced WAF’s HTTP redirect protection will protect you against many, if not all, of the other ways you can be vulnerable because that protection applies to the redirect itself, i.e., to the HTTP response, rather than the request. For that reason (and many, many others), I’d strongly recommend investigating BIG-IP Advanced WAF if you don’t already use it! As always, feel free to leave any comments or questions below and I’ll try to get back to everyone, and thanks for reading this far!63Views1like0CommentsWhat is Web Cache Exploitation?
Let’s talk about Web Cache Exploitation. There was a presentation done at BlackHat/DefCon 2024 discussing this, and here is the link to a writeup done by the presenter: https://portswigger.net/research/gotta-cache-em-all That article details how different HTTP servers and proxies react when presented with specially crafted URLs. These discrepancies have the potential to be used for use in different types of web cache attacks. My goal here is to give a brief overview and discuss further about how NGINX can be involved in this as well as mitigations that are possible. As such, it is a good idea to reference that article as I am only summarizing pieces of it here. Especially since the researcher did such a great job of writing this up. Definitions: First, here are a few terms that will be used in this article: Web caching — the process of storing copies of web files either on the user’s device or in a third-party device such as a proxy or Content Delivery Network (CDN). The purpose of this is to speed up the serving of static content by presenting it from the store instead of the backend server. This saves time and resources. Web caches use keys to determine which responses should be stored or not. These usually use the URL in some fashion, then map to the stored response. Web Cache Poisoning — the act of inserting fake content into the cache, causing clients to pull content they were not intending to inadvertently. Web Cache Deception — the act of tricking the backend server to place dynamic content into a cache thinking that it was static. This can be especially bad if the data is intended for an authenticated user. Delimiters — one or more characters in a sequence that indicate a separation (end/beginning) of the elements in a stream of text or data. An example of this could be the question mark in a URI indicating that a query is starting. Normalization - concerning web traffic, the process of standardizing data for consistency across network paths. We see this a lot with web traffic using % notation for certain characters, such as %20 for a space. Detecting Delimiters and Normalization: The article describes that the RFC (https://datatracker.ietf.org/doc/html/rfc3986) states which characters are used as delimiters. The issue is that the RFC is very permissive and allows each instance to add to that list. They then give a few examples of how to detect the delimiters that backend servers or caches use. This can then help to determine if there is a discrepancy between them. For example: the article shows sending a request for /home and then a request for /home$abcd to see if the response is the same or not. This can also be used to see if the cached request is served up when specific delimiters are used. The second discrepancy that the article discusses is with normalization. Using delimiters, the path is extracted and then it is normalized to determine any encoded character or dot-segments that may be used. I will explain what those are: Encoding is used sometimes when a delimiter character needs to be interpreted by the application rather than the HTTP parser. For example: %2F used instead of a forward slash /. Dot-segment normalization is a way to reference a resource from a relative path. Also referred to as a path traversal a lot of the time. For example: ../ used to move back to one directory. The RFC says how to code URLs and handle dot-segments. But it doesn’t say how a request should be forwarded or changed, which makes it hard to tell which vendors agree with each other. Similar to what was done in the delimiter section, the article gives different examples of how to detect discrepancies in decoding behavior. For example: the article gives a table that lists different cache proxies as well as HTTP servers and how each treats a request for /hello..%2fworld. NGINX resolves this to /world whereas Apache does not normalize it at all. Deception: Cache rules are used to determine if a resource is static and should be stored or not. The discrepancies mentioned in the last section can be leveraged to exploit cache rules possibly leading to dynamic content being stored. The article describes different data attributes that cache proxies may use to determine if a resource is static or not. These include static extensions, static directories, and static files. Static extensions may include file types such as .css, .js, .pdf, and more. Some proxies may have rules setup that cause these extensions to allow caching. An example given in the article is where the dollar sign is a delimiter on the backend server but not the proxy. This can cause the response to a specific path to be cached when it should not be. Normalization discrepancies can be used to exploit this as well by encoding a delimiter. Example: request for /account$static.css will be stored by the proxy due to the .css extension, but due to the delimiter, the response from the backend is for /account which may be a client's authorized account data. Static directory rules are those that match the path used for the request. Some common examples are /static, /shared, /media, and more.. This is similar to static extensions, where delimiter discrepancies and normalization discrepancies can be used for exploitation. This involves hiding a path traversal after a character that is a delimiter on the backend server. The static directory is then placed after the path traversal, causing the proxy to resolve it but not the backend server. Example: request: /account$/..%2Fstatic/any cache proxy sees: /static/any backend server sees: /account Static files are files that may not necessarily be in a static directory or have a static extension but are expected to stay static on every site. Examples of these files are /robots.txt or /favicon.ico. Exploiting these types of rules is similar to how static directories are exploited. In other words, this example would look like the previous except replace 'static/any' with 'robots.txt'. Poisoning: If the attacker can get a cache to store a specific response to the key that the cache is using, then they can steer users to that response when they visit. Delimiters and normalization can be exploited to carry out cache poisoning. By combining these with cache poisoning, it could be possible to modify a cache key to point to a highly visited site. There are many ways to combine these to try and use this. These include key normalization and delimiters used by both the backend server and the cache on the frontend. Key normalization may happen before the cache key is generated. This can allow for poisoning of the mapped resource if the backend server is interpreting the path differently. This is similar to our above example for static directories. If a path traversal is placed between the path for the backend server and the path you want cached, you may be able to map one to the other. Example: URL: /path/../../home Cache Key: /home Backend Server: /path As this shows, it is possible to create the cache with a key pointing to /home but returns the response for /path. So, when a user visits /home they will not receive the page expected, but instead they will get the page that the malicious actor wanted them to get. Server delimiters can be used for this when the cache is not using the same delimiter. This allows for the creation of a key for the response as the delimiter will prevent the backend server from fully resolving the path. This is similar to the last example, but with the delimiter placed before the path traversal. Example: URL: /path$/../home Cache Key: /home Backend Server: /path Cache delimiters are harder since special characters that the browser will allow are harder to find for web caches. The pound sign can do this, though, as some caches use it as a delimiter. This is similar to the previous example but would be the other way around as the backend server path would be last after the traversal. Example: URL: /path#../home Cache Key: /path Backend Server: /home Mitigation/Defense: The first thing to note is that none of this means that vendors are doing anything wrong with their products. The differences in how each handles normalization and delimiters is expected given the freedom to add their own options. Also, I mentioned that I would further discuss how NGINX could be involved in these kinds of attacks. Naturally, as NGINX can be used as a proxy and a web server, it can be involved in these types of transactions. So it really falls on how NGINX handles normalization and delimiters when compared to a web cache being used in the same path. The author of that article does a great job of comparing multiple vendors for backend servers, CDNs, and frameworks. The first defense would be to try and use products that will align in how they parse data to try and prevent as many opportunities as possible for this to happen. The next defense and probably the best design choice would be to add a cache control to your pages to prevent caching of pages that should never be cached. This would mean adding a 'Cache-Control' header with values of 'no-store' and 'private' to any dynamically generated responses. Then also ensure that any of the cache rules cannot override the header that is set. Another option would be to add a WAF into the path of the traffic. Just looking at a lot of the requests used in these examples, I can see that ASM/Advanced WAF or NGINX App Protect would be pretty effective at stopping a lot of these requests. Path traversal and meta-character One thing that was discussed in the article in regard to NGINX was how it handles the newline-encoded byte (%0A) in a rewrite rule. This byte is used as a path delimiter in NGINX. A common use of the rewrite rule is to use the regex of (.*) to write the rest of the path to then new location. For example: rewrite /path/.(*) /newpath/$1 break; This will work in most situations, but if the newline byte is added then it will stop at that delimiter. For example: /path/test%0abcde ---> /newpath/test You can see how it gets cut off after the encoded byte is hit. I did some research on this and found a similar situation with the return rule in NGINX.https://reversebrain.github.io/2021/03/29/The-story-of-Nginx-and-uri-variable/ This blog shows how the Carriage Return Line Feed (CRLF) can be used to inject a header into the response. I tested this by firing up an NGINX container, and adding a location configuration to my nginx.conf file like this: server { location /static/ { return 302 http://localhost$uri; } I then send a request with the encoded CRLF (%0D%0A) and then the header I want injected after that: curl "http://127.0.0.1:8081/static/%0d%0aX-Foo:%20CLRF" -v * Trying 127.0.0.1:8081... * Connected to 127.0.0.1 (127.0.0.1) port 8081 > GET /static/%0d%0aX-Foo:%20CLRF HTTP/1.1 > Host: 127.0.0.1:8081 > User-Agent: curl/8.6.0 > Accept: */* > < HTTP/1.1 302 Moved Temporarily < Server: nginx/1.27.0 < Date: Thu, 15 Aug 2024 18:15:46 GMT < Content-Type: text/html < Content-Length: 145 < Connection: keep-alive < Location: http://localhost/static/ < X-Foo: CLRF <-----header injected < <html> <head><title>302 Found</title></head> <body> <center><h1>302 Found</h1></center> <hr><center>nginx/1.27.0</center> </body> </html> * Connection #0 to host 127.0.0.1 left intact That blog also describes how to avoid that happening by changing the return directive to use $request_uri instead of $uri or $document_uri. This made me wonder if it was possible to similarly modify the rewrite directive to avoid the issue with the newline-encoded byte being used as a path delimiter. After searching, I found this page in GitHub:https://github.com/kubernetes/ingress-nginx/issues/11607 Which then links to: https://trac.nginx.org/nginx/ticket/2452 These pages are discussing this issue with using the newline-encoded byte as a delimiter. The response in the ticket was to use this regex (?s) to enable single-line mode. I re-configured my NGINX container to add another couple of locations so I could test this: server { location /static/ { return 302 http://localhost$uri; } location /user/ { rewrite /user/(.*) /account/$1 redirect; } location /test/ { rewrite /test/(?s)(.*) /account/$1 redirect; } So now I have two rewrite directives, one for testing the issue and one for testing the workaround. Now send a request and see if it works. curl "http://127.0.0.1:8081/user/%0d%0aX-Foo:%20CLRF" -v * Trying 127.0.0.1:8081... * Connected to 127.0.0.1 (127.0.0.1) port 8081 > GET /user/%0d%0aX-Foo:%20CLRF HTTP/1.1 > Host: 127.0.0.1:8081 > User-Agent: curl/8.6.0 > Accept: */* > < HTTP/1.1 302 Moved Temporarily < Server: nginx/1.27.0 < Date: Thu, 15 Aug 2024 18:56:48 GMT < Content-Type: text/html < Content-Length: 145 < Location: http://127.0.0.1/account/%0D <---Newline delimiter was hit. < Connection: keep-alive < <html> <head><title>302 Found</title></head> <body> <center><h1>302 Found</h1></center> <hr><center>nginx/1.27.0</center> </body> </html> * Connection #0 to host 127.0.0.1 left intact For the first test, it cutoff at the newline-encoded byte as expected. Now to test the workaround. curl "http://127.0.0.1:8081/test/%0d%0aX-Foo:%20CLRF" -v * Trying 127.0.0.1:8081... * Connected to 127.0.0.1 (127.0.0.1) port 8081 > GET /test/%0d%0aX-Foo:%20CLRF HTTP/1.1 > Host: 127.0.0.1:8081 > User-Agent: curl/8.6.0 > Accept: */* > < HTTP/1.1 302 Moved Temporarily < Server: nginx/1.27.0 < Date: Thu, 15 Aug 2024 19:32:50 GMT < Content-Type: text/html < Content-Length: 145 < Location: http://127.0.0.1/account/%0D%0AX-Foo:%20CLRF <-------Appears to have worked. < Connection: keep-alive < <html> <head><title>302 Found</title></head> <body> <center><h1>302 Found</h1></center> <hr><center>nginx/1.27.0</center> </body> </html> * Connection #0 to host 127.0.0.1 left intact Changing regular expressions to enable single-line mode prevents the possibility of any confusion being introduced by newline characters. This is just an FYI as I thought it was interesting to see issues raised in the past by others and what suggestions were given. Last Thoughts: First of all, I would like to thank Michael Hedges and Parker Green, both from F5 Networks for bringing this to our attention. As shown in the examples and the article written by the researcher, these types of attacks are not extremely difficult to carry out and can have very significant ramifications in specific scenarios. As such, taking this into account when setting up a site is key. This would include the configuration of pages to use cache controls and which vendors to use for both web servers as well as web caching proxies. The article I referenced at the beginning does a good job of breaking down how each vendor handles different scenarios. That makes for a great reference point to start with.71Views0likes0CommentsScuba Gear from CISA, ROBLOX Malware Campaign, and RUST backdoo-rs
Hello, this week Jordan_Zebor is your editor looking at the notable security news for Scuba Gear from CISA, a ROBLOX Malware Campaign, & a Rust based meterpreter named Backdoo-rs. Scuba Gear from CISA ScubaGear is a CISA-developed tool designed to assess and verify whether a Microsoft 365 (M365) tenant’s configuration aligns with the Secure Cloud Business Applications (SCuBA) Security Configuration Baseline. This tool ensures that organizations are following CISA’s recommended security settings for cloud environments, helping to identify vulnerabilities or misconfigurations in their M365 setup. The value of running ScubaGear lies in its ability to enhance an organization’s cybersecurity posture, mitigate risks, and maintain compliance with security standards, which is crucial for protecting sensitive data in cloud-based systems. ScubaGear addresses the growing need for secure cloud deployments by automating the assessment process, making it easier for IT and security teams to identify gaps and take corrective actions. Regular assessments with this tool can help reduce the chances of data breaches, unauthorized access, and other security threats, thereby maintaining the integrity and confidentiality of business operations. Additionally, it supports organizations in staying ahead of compliance requirements by ensuring they meet the security baselines recommended by CISA. ROBLOX Malware Campaign Checkmarx recently discovered a year-long malware campaign targeting Roblox developers through malicious npm packages that mimic the popular “noblox.js” library. The attackers used tactics like brandjacking and typosquatting to create malicious packages that appeared legitimate, aiming to steal sensitive data like Discord tokens, deploy additional payloads, and maintain persistence on compromised systems. Despite efforts to remove these packages, new versions keep appearing on the npm registry, indicating an ongoing threat. RUST backdoo-rs The article "Learning Rust for Fun and backdoo-rs" describes the author's journey of learning Rust by developing a custom meterpreter. While Rust is designed to avoid common programming errors, ensuring software is secure from the outset, the choice of using it to create red teaming tools is also a great use case. A key aspectI covered recently is how Rust helps eliminate vulnerabilities like buffer overflows and use-after-free errors. These are traditionally common in C and C++, but Rust's ownership model prevents such risks by ensuring safe memory usage. In addition, Rust's growing adoption in the cybersecurity community, driven by companies like Google and Microsoft, emphasizes its role in secure software development, underscoring the "secure by design" principles that CISA advocates for. Projects like "backdoo-rs" demonstrate Rust’s potential for secure, reliable development in any context.158Views2likes0Comments