Shape
9 TopicsUsing Shape IBD to protect OIDC IdP from Bot Attacks in Open Banking scenarios
Introduction Open Banking implementation standards reserve ample space for describing security controls that need to be in place to secure the access to APIs, with particular focus on end-user consent management. The mechanisms ensuring end-users can securely give banks their consent for third party fintechs to perform banking operations on their behalf, are the bedrock of Open Banking standards. The banks are required to put in place Strong Customer Authentication (SCA) methods allowing the end-user to login to the OIDC IdP/ OAuth Authorization Server to give their consent for the access token to be generated. This implies the existence of a login form, most often reinforced with multi-factor authentication methods. While these methods provide a good measure of defense against less sophisticated threats, bot networks are still capable of being platforms for launching application denial of service or advanced financial fraud attacks so warding off bots allows the defense to keep one step ahead of the attackers. Shape represents best-in-class bot defense available in the market today, relying on a managed service model backed by advanced AI/ML models and dedicated SOC teams. Integrated Bot Defense (IBD) is the first self-service offering from Shape, encapsulated in an easy-to-use form factor. Customers don't need to route their traffic to Shape cloud, IBD integration with Shape backend systems relying instead on API calls. Also, customers can manage the entire onboarding process through the self-service dashboard available on F5's Cloud Services portal, allowing quick addition of new applications to be protected. IBD supports BIG-IP as an insertion point in the customer environment, with more methods to be added. To assist with BIG-IP deployment of IBD, the onboarding process is making available for download a per-application customized iApp template that doesn't require deep BIG-IP expertise to install and provides a wizard-like way of configuring IBD. Setup The high-level diagram of the lab used to simulate an Open Banking deployment is shown below, along with the roles performed by each element: The Open Banking workflow for an authorized end-user is described below: The user logs into the Third Party Provider application ("client") and creates a new funds transfer The TPP application redirects the user to the OAuth Authorization Server / OIDC IdP - PingFederate The user provides its credentials to PingFederate and gets access to the consent management screen where the required "payments" scope will be listed If the user agrees to give consent to the TPP client to make payments out of his/her account, PingFederate will generate an authorization code (and an ID Token) and redirect the user to the TPP client The TPP client opens an MTLS connection to the IdP, authenticates itself with a client certificate, exchanges the authorization code for a user-constrained access token and attaches it as a bearer token to the /domestic-payments call sent to the API gateway over an MTLS session authenticated with the same client certificate The API Gateway terminates the MTLS session and obtains the client certificate authenticates the access token by downloading the JSON Web Keys from PingFederate, checks the hashed client certificate matches the value found in the token and grants conditional access to the backend application. The App Protect WAF and DoS modules perform advanced security checks before releasing the request to the backend server pod. OpenBullet2 launching credential stuffing attacks Integrated Bot Defense installation To install IBD, login to your F5 Cloud Services account, select Integrated Bot Defense and click on Add Application button. Select BIG-IP as the insertion point and click Next. Ensure Web App application type is selected (default) and input the name of the application. Click Save. Download the iApp template, import it into BIG-IP and create an Application. For step-by-step guidance and details on the iApp configuration options, consult the Integrated Bot Defense Configuration Guide for BIG-IP. Testing We used OpenBullet2 to simulate a malicious bot performing credential stuffing attacks against the OIDC IdP / OAuth Authorization Server, PingFederate. Although OpenBullet2 is configured to use valid credentials, no hit is registered after 3 attempts due to IBD blocking all login attempts generated by bots. Examining the F5 Cloud Services dashboard, we can see how IBD identifies and blocks bot sessions while allowing human sessions to pass through. The 3 malicious bots blocked correspond to the 3 attempts made by OpenBullet2 - the valid human traffic has been generated in the background. Conclusion Integrated Bot Defense brings the sophisticated AI/ML technologies used by Shape Defense in a package that is very easy to deploy and configure with minimal configuration changes to the infrastructure components, such as BIG-IP. The ability of IBD to detect advanced tools like OpenBullet2 makes it ideal for securing high-value targets such as Open Banking consent management infrastructure. In this article we demonstrated how IBD can be deployed inside customer's infrastructure, using a BIG-IP device as an insertion point, to protect a PingFederate server acting as OIDC IdP/ OAuth Authorization Server in Open Banking scenarios. Resources The UDF lab environment used to build this configuration can be found here.1.7KViews2likes0CommentsF5 Bot Defense for Salesforce Commerce Cloud – Protect Your E-Commerce Site From Unwanted Bots and Illegitimate Traffic (1 of 2)
This article is the first in a two-part series. Go to Part 2 here. Introduction Effective security matters to every retailer of every size because attacks continue to increase, whether engineered by humans or automated by bots. To help all our e-commerce customers succeed, F5 has made security easy to adopt and offers a wide range of integrations, including with cloud-based commerce platforms like Salesforce Commerce Cloud (SFCC). F5 Bot Defense integrates directly into the SFCC storefront and protects your digital business against unwanted bots and illegitimate traffic. Website owners and developers can gain full visibility, protect against credential stuffing, fraud & abuse attacks, and other advanced attacks that bypass traditional security controls. In this article, you will learn how to configure and customize the F5 Bot Defense solution for your SFCC site. The solution is delivered as a certified cartridge and supports both the legacy Site Genesis Salesforce Commerce Cloud eCommerce sites and the modern Storefront Reference Architecture (SFRA) sites. Note: This article contains a fair number of references to Shape Security-related offerings including Shape Enterprise Defense. Shape Security was acquired by F5 in 2020 and many of the products and offerings are currently undergoing a rebranding effort. During a period of time, you will continue to see the Shape branding reflected in the user interface, some settings, and occasionally in product references. Figure 1: F5 Bot Defense for Salesforce Commerce Cloud Deployment Steps The F5 cartridge can be deployed with either Storefront Reference Architecture (SFRA), a controller-based SiteGenesis site, or a pipeline-based SiteGenesis site. The deployment steps outlined below were tested for Salesforce B2C Version 21.7, Compatibility mode 21.2, and SFRA version 6.0.0. Prerequisites F5 Bot Defense requires an API key and a header prefix string for your e-commerce website to connect to the backend engine. Please contact your F5 account team or F5 customer services for any help in obtaining the API key and the prefix string. Step 1: Install the F5 Cartridge and Import the Metadata Firstly, you will install the F5 Cartridge and set up the business manager, for integrating F5 Bot Defense with SFCC. Download and Install the F5 Bot Defense Cartridge Deploy the F5 Bot Defense cartridge using Salesforce UX Studio for Eclipse. Alternatively, you can use Visual Studio code with the Prophet Debugger extension. Download the F5 cartridge from the SFCC LINK Marketplace by clicking on the Download Integration button. Establish a new digital server connection with your SFCC instance. Import cartridges to the workspace in Salesforce UX Studio. Figure 2: F5 Bot Defense cartridge imported into the workspace within Salesforce UX Studio Add the cartridge to the Project Reference of Server Connection. Figure 3: F5 Bot Defense cartridge added to the Project Reference of Server Connection Wait until the Studio completes the workspace build and uploads source codes to the sandbox. Assign the F5 Bot Defense Cartridge to the Storefront Site In the SFCC Business Manager portal, navigate to Administration > Sites > Manage Sites. On the Storefront Sites webpage, click on your site name. Next, click on the Settings tab on the site webpage. At the beginning of the cartridge path, add the following: int_f5:int_f5_sfra: When done, press the Apply button. Figure 4: F5 Bot Defense cartridge path added to the Storefront site Import the Metadata To add the newly configured setting to the Storefront site, you will need to import the pre-defined metadata: Open the downloaded cartridge package and navigate to the /metadata/f-five folder. Click on the Sites folder and rename the RefArch folder to the ID of your storefront site specified in the Business Manager. Then, zip the f-five folder. Navigate to Administration > Site Development > Site Import & Export. Under the Upload Archive section, upload the f-five.zip file and click on the Import button. Figure 5: Import the pre-defined metadata for the site using the ‘Site Import & Export’ feature Continue reading Part 2 here.2.2KViews1like0CommentsF5 Bot Defense for Salesforce Commerce Cloud – Protect Your E-Commerce Site From Unwanted Bots and Illegitimate Traffic (2 of 2)
This article is the second in a two-part series. Go to Part 1 here. Step 2: Setup the Integration You will identify the endpoints and customize several settings in the F5 cartridge. Custom Objects The integration uses custom objects to configure endpoints that should be protected. Custom objects are stored locally (per Site). Navigate to Merchant Tools > Custom Objects > Manage Custom Objects There are three custom object types. BotProtectedEndpoints - describes the protected endpoint behavior SAFEEndpoints - describes the protected endpoint behavior for SAFE mode GETScrapingEndpoints - describes the protected endpoint behavior ISTL BotProtectedEndpoints and GETScrapingEndpoints have the same structure. SAFEEndpoints have only ‘id’ and ‘paths’ fields. The custom object stores a list of all protected endpoints and describes their behavior for different F5 Shape solutions. The example below outlines how to configure the account-login-post object as a protected endpoint. Select the object type based on the subscribed mode and click on the Find button. In the results, click on the account-login-post object id and select a Mitigation Action. Figure 6: Sample configuration to define a protected endpoint Custom Site Preference Groups. Here, you will specify the values of various options to customize the F5 integration. Navigate to Merchant Tools > Custom Site Preferences Groups > Site Preferences > Custom Preferences and click on Shape. Enter the values for Telemetry Header Prefix, F5 Shape API hostname, and API key, obtained from F5. Figure 7: Sample configuration to specify the values for connecting to the F5 Bot Defense back-end engine Scroll down to Specify F5 Shape JS URL or Path. Enter the JS URL. In the Select location for JS tag(s) option, you will choose one of the following, based on your preferred location to insert the JS tag: After head (head) After tail (tail) Before script (script) Figure 8: Sample configuration to specify the values for F5 Shape JS URL and its path In the Insert JS tag(s) in only specific web pages (entry pages) option, select either Yes/ No. The No choice will insert the JS tag to all the webpages The Yes choice will provide an additional option to specify the web pages for which the JS tag needs to be inserted. Figure 9: Sample configuration to assign the JS tag to specific entry pages This completes the F5 cartridge configuration. When done, click on the Save button at the top right-hand cover of the web page. Step 3: Verification To test the F5 Bot Defense integration with SFCC, emulate a malicious request from a client machine to your e-commerce website. From Browser Access and log in to your SFCC site from the browser. Inspect the web page source; you will notice the JS inserted by the SFCC. Figure 10: JS insertion You will also notice the prefix string and the telemetry headers passed in the HTTP POST. Figure 11: Telemetry headers inserted in the HTTP POST Now, disable the JavaScript support in the setting of the client browser and log in to your site. The F5 Bot Defense will identify this HTTP request as malicious web traffic and will block the request ('Block' is the migration action selected for the account-login-post in the custom objects) Figure 12: F5 Bot Defense blocked the request from the JS disabled browser F5 Bot Protection Manager Access your F5 Bot Protection Manager portal to see all the client requests to your e-commerce site. You will notice all the shoppers' traffic to the storefront, the login request from the JavaScript disabled browser that was used to emulate bot traffic will be flagged by F5 Bot Defense in red as malicious. Figure 13: Malicious bot traffic detection by F5 Bot Defense The F5 Bot Defense integration with SFCC using the certified cartridge is an easy-to-deploy solution that seamlessly works with the Storefront Reference Architecture. With this industry-leading MI-driven security, your digital business is safeguarded in real-time with superior accuracy & long-term efficacy. Deploy the cartridge from the SFCC Link Marketplace to minimize the impact of Bots on your business, confidently. Additional Resources F5 Bot Defense integration for SFRA sites: Configuration Guide F5 Bot Defense integration for SiteGenesis sites: Configuration Guide Solution Lightboard: YouTube Video Salesforce partnership: Technology Alliance on F5.com736Views0likes0CommentsBuilding a Fraud Profile with Device ID+ (Part 2 - Analytics and Reporting)
Overview Today there are at least 4.9 million websites using reCAPTCHA, including 28% of the Alexa top 10,000 sites. Google’s reCAPTCHA service is a free offering and developers have been using it for years to try and defend against automation. Many cyber security vendors embed it in their core offering where customers pay subscription service fees for these vendors to “manage” reCAPTCHA and the data it produces. TL;DR - reCAPTCHA is probably causing revenue leakage, false positives and allowing abuse of the web properties that it’s deployed on. How do I know? Watch this demo video I put together the other day. The link is time bookmarked to start at 5:13. The video shows me logging in across 2 different browser sessions and reCAPTCHA returning false positives. Additionally, a simple search on Google or Github reveals the problems that developers face when using reCAPTCHA. reCAPTCHA is embedded into the web’s top sites because it’s free and seems to work. Or does it? What do I mean by “seems to work”? It depends on the business and the type of website, but “works” in this context typically means making the bot or automation problem go away. While developers have been laser focused on solving problems around bot nets, fraudulent users, and overall noise hitting the system, they’ve forgotten about user friction and revenue. In fact, revenue and user friction may be an afterthought for most developers because they don’t see the opportunity to remove friction and again, they have tunnel vision on fixing one particular problem. (For the record I’m not blaming developers, just stating the reality of most engineering organizations and how tasks are managed.) Let’s take a step back. Fraud detection is a framework within web applications and the creation and ongoing maintenance warrants a good bit of architecture and thought. That’s why I’ve been on a pursuit to research and expose the development of a “Fraud Framework” or “Fraud Profile” across the cybersecurity industry. I want to give developers a resource for greenfield projects and rewrites. A place that is open and information flows freely to make the web a safer place. That’s why I started the conversation in Part 1 and I plan on writing as many articles as possible to open up this well-kept secret across the industry. Experienced security engineers are highly sought after because they have been through the pains of developing these systems and architectures. Not only creating them, but making use of the fraud data that they generate. My goal is to make this knowledge freely accessible through this series of articles. This article and video serve as a stepping-stone in my research. In Part 1, we reviewed a simple NodeJS web application that implements a device identifier service to keep the existing fraud system in check, remove user friction and defend against fraudulent activity. The article demonstrates how to add F5’s Device ID+ to an existing application that already uses a basic bot defense or fraud scoring system – in this case reCAPTCHA. Let’s take a look at the live data from our example application deployed at deviceid.dev/v3 to: Analyze user login and transactional scoring data. Understand how well our fraud scoring system is working by looking at good user data. Gain insight into how Device ID+ improves Fraud analytics. Find areas to make changes to our application to remove user friction. Conclusion and Next Steps The example analysis of our Fraud Profile is just the beginning of what can be accomplished with a trustable device identifier. Analytics around user behavior and malicious intent can now be uncovered in new ways in fraud reporting. Device ID+ usage spans a broad set of use cases across the enterprise and is complementary to any existing fraud or bot solutions. If you have input or ideas on how you’d like me to extend this article series, please mention them in the comments below. For more information regarding the technical details around Device ID+, see the documentationhere. If you’d like to add Device ID+ to your own application, you can sign up for a free accounthere.451Views1like0Comments2021 Credential Stuffing Report on F5 Labs
Over the last few years, security researchers at F5 and elsewhere have identified credential stuffing as one of the foremost threats. In 2018 and 2019, the combined threats of phishing and credential stuffing made up roughly half of all publicly disclosed breaches in the United States. Now it is February 2021 and the tech industry is reeling from the twin shocks of the theft of FireEye’s red team tools and the SolarWinds Orion supply chain attack. We at Shape & F5 Labs anticipate there will be many more announcements and unwelcome discoveries surrounding credential spills and, it is important to point out, these campaigns also presented an opportunity for attackers to achieve persistence in the environments of thousands of organizations. For this year, we have renamed the Credential Spill Report (previously published by Shape Security, now part of F5) to 2021 Credential Stuffing Report. We did this in order to look at the entire lifecycle of credential abuse, dedicating much time and effort to not just quantifying the trends around credential theft but also to understanding the steps that cybercriminals take to adapt to and surmount enterprise defenses. Some Key Findings in the report include; The number of annual credential spill incidents nearly doubled between 2016 and 2020. Despite consensus about best practices, industry behaviors around password storage remain poor. Median time for discovering a credential spill between 2018 and 2020 was 120 days; the average time to discovery was 327 days. there are many more... Head over to the F5 Labs, 2021 Credential Stuffing Report to see more key findings, dive into the details around terminology and real-world data, look at lifecycle analysis around theft, fraud, sale, and abuse, and lastly - look at some steps you can take to minimize your exposure to the threats around credential stuffing.189Views1like0CommentsBuilding a Fraud Profile with Device ID+ (Part 1)
Overview End-to-end architecture for IT fraud and security systems is an opaque space and best practices are usually held within the silos of large corporate cybersecurity teams - and for good reason. Cyber vendors are often the only ones who can connect the dots across customers and find pain points that need to be solved. Luckily for me, I have been able to sit down with security experts across all major industry verticals to discuss those pain points. For years, I have assisted their usage of cybersecurity point solutions (e.g., WAF, Bot, Fraud, etc.) from the perspective of API security, server-side exploits, client-side vulnerabilities, and so on. One piece of technology that is common across all cybersecurity architectures is some form of end user or device identifier. It is the single thread that runs across the entire technology stack and each organization uses it to drive fraud prevention and critical business analytics. Creation of an identifier starts when users interact with an application and provide input to it. Normally this happens when the user logs in, creates a new account, or creates a post or comment. This identifier is typically a traditional cookie from a browser fingerprinting solution created in-house or supplied by a third-party service. It is the way organizations identify and track their users and ultimately how they improve their business. At F5, we help security teams across the world’s top organizations understand their users better. Are they lying about their identity? Are they a known good user? Are they committing fraud, or do they appear to be malicious? We have made a large investment (see Shape Security) in creating an identifier that is based on unique signals and, most importantly, trusted by the security and fraud teams who use it. This identifier is known as Device ID+ and it is now available as a free service to anyone who wants to use it. Device ID+ Device ID+ was created to address the following problems with existing web-based identifiers and fingerprinting solutions: Over 30% of users cannot be tracked due to cookie churn. Frequent changes due to the likelihood that one browser will create multiple identifiers. Identifiers are reset after users clear the cache or go into incognito mode. Device ID+ leverages JavaScript to create an identifier that solves the issues of traditional user tracking through cookies. Developers can include a simple JavaScript tag (as shown in the example code) and use it in their application to determine if a user account is good, bad, encountering a bad user experience, has been compromised, and more. One of the major strengths of Device ID+ is that it persists across users who clear or reset their browser and you’ll have an opportunity to see this in action below. The purpose of this article is to give you a quick rundown on what Device ID+ is, why it’s important, and how to use it within your application. As a demonstration, I am going to inject Device ID+ into an existing login form that uses Google’s reCAPTCHA service. Google reCAPTCHA Google reCAPTCHA is the service that shows you pictures of things to verify that you are human. I am not going to address some of the most critical shortcomings of the reCAPTCHAapproach but since it’s a free service and many websites use it to manage bots, I thought it would make a great example on how Device ID+ can be used to strengthen any existing bot or WAF solution. In later articles we’ll trace Device ID+ from its creation to its consumption in fraud analytics. Preventing Application Abuse Since all users are born or recognized at login, I’m going to start with a simple login form. Login is where most of the fraud and malicious activity start and that’s why reCAPTCHA has been used over the years as a free service to try to prevent abuse. Today we are going to create what’s known as a Fraud Profile with Device ID+ and we’ll use it in later articles to super charge our fraud analytics and gain visibility into things like: Fraudulent behavior of automated bots Fraudulent or malicious posts and commenting Fraudulent user account creation Good user friction and unnecessary CAPTCHA challenges About the Demo Application This is a very simple demo application that shows how to layer Device ID+ into an existing application. See it in action at https://deviceid.dev/v3 If you wish to run this example locally as a Docker container, you can deploy it with the following command after installing Docker: docker run -d -p 80:8000 wesleyhales/deviceid.dev Open a browser and visit: http://localhost/v3 Demo Walkthrough For starters, go ahead and login to the application with your email address or any made-up value for the username. There’s no need to enter a password. Fig. 1-1 After you click Submit, you will see a description of the data that was captured. This is our Fraud Profile (Fig 1-2) that we have created for our users. It uses Device ID+ to encapsulate the reCAPTCHA score along with a timestamp of when the transaction took place. Fig. 1-2 Fraud Profiles are viewed differently across the cybersecurity industry. Some security teams build Fraud Profiles around credit card transaction data and others build them throughout specific flows across web pages. Device ID+ can be applied to any Fraud Profile and is built to be used on every page of the application. The more you use it, the more you can enhance good user experiences and/or eliminate actual fraud. The following JSON shows how the example app adds a reCAPTCHA signal to our Device ID+ Fraud Profile: Example of Device ID+ based Fraud Profile Fig. 1-3 Normally, developers would simply capture the score returned from the server side reCAPTCHA API and take action (0.9 in Fig 1-3 above). This score might be used in the authentication logic within the application, simply allowing the user to login if it’s above 0.7. It might also be sent downstream with additional user data to be recorded in a SIEM. The Device ID+ based Fraud Profile provides a structure around existing “scores” or data. This gives us an extendable framework that is decoupled from existing solutions and makes the identifier technology abstract. In our Fraud Profile, the Device ID+ information is located immediately following the username for a couple of reasons: Now we can identify how many different devices a single username is using. Is this account being shared? Is it compromised? Does it violate our terms of service? All of this can be answered by using Device ID+ under the system wide unique identifier (usually this is the username, or an email address as seen in the example). It also brings visibility to important user experience unknowns. Is this a good user who spends money regularly but is encountering too many reCAPTCHA challenges? It is a way to keep your current bot or fraud verification system in check to ensure friction is removed for your good user interactions. The Differentiator As users log in, they will acquire a new Device ID+ cookie which contains the following values. Fig. 1-4 diA is known as the “residue-based identifier”. It is the main identifier used directly after the username in our example. This value is stored locally on the device and may be deleted if the user clears their local storage or cookies. diB is known as the “attribute-based identifier”. This value will remain the same even when the user clears local storage. Keep in mind, it can change if the user upgrades their browser version as it is based on environment signals that remain consistent across browser versions. One easy way to test this feature is to log into the demo application with the same email address twice but using two different browser sessions. Login once in your regular browser and login again with the same browser in incognito mode. Fig. 1-5 In Figure 1-5, we see that the Device ID+ residue values are different for a single username, but the Device ID+ attribute is the same. Conclusion and Next Steps Now that we understand what makes the Device ID+ identification service unique, we can begin to craft ways to take advantage of it in our business analytics. In part 2 of this article series, we are going to analyze the user data from the live demo at https://deviceid.dev/v3 to visualize anomalies and areas where user friction might be occurring. Device ID+ usage spans a broad set of use cases across the enterprise and is complementary to any existing fraud or bot solutions. If you have input or ideas on how you’d like me to extend this article series, please mention them in the comments below. For more information regarding the technical details around Device ID+, see the documentation here. If you’d like to add Device ID+ to your own application, you can sign up for a free account here.1.2KViews1like0CommentsMining JavaScript Events for Fun & (Preventing Someone Else’s) Profit
For a deep dive analysis on a specific attack or suspicious transactions on a customer’s web application, one of my favorite things to examine is the interaction with the website. There is a wealth of JavaScript information that can be collected to figure out how someone or some bot behaved on the page. In this post, I’ll review some of the key items I investigate to get a sense of the interaction with the page, specifically for desktop web browser data. What’s up with that mouse? I look at mouse clicks and mouse movements. For mouse clicks, one of my favorite phenomenons is what we call a “Magic Mouse.” A Magic Mouse can click in all the right places, but never travels between them. It just magically appears in different locations without any trail. Of course it’s possible to miss some mouse movements depending on how often the JavaScript polls to track them. But if you know the polling conditions, you can easily rule that out as the cause of missing mouse movements. Another interesting component to mouse clicks is where they occur. The best locations to see are negative coordinates, which means the click occurred off the visible screen. In case you are wondering how that’s humanly possible, the answer is - it’s a bot, not a human. If you’re a bot, the world is your oyster and you can click the right web elements, regardless of where they might be - on or off your screen. For mouse movements, I like to look for straight-line tracks. Is it theoretically possible for humans to move their mouse in a straight line? Sure. But if you think this actually happens in real life, then I invite you to draw a “straight line” with your mouse and let me know how it turns out. Moreover, I’d like to know the last time you logged into your bank account while also moving your mouse in a straight line. Beyond straight lines, I also look for identifiable patterns in tracks like fixed or otherwise predictable increases/decreases in the x or y direction. Polling frequency can matter a lot in terms of your ability to identify these patterns. They are generally more nuanced and difficult to spot but are very fun to watch. This process can also be generalized by considering the entropy of the movement. Human users typically have higher entropy in their movement data while bots may struggle to introduce the right amount of randomness in mouse moves. How about those key events? Under normal circumstances, every key movement should have three events: key down, key up, and key press. The first thing to check is whether every movement has all three of these events. It’s odd to find key down events with no key up events or key press events without either up or down events. Such anomalous key movements would be worth further investigation. Key events happening at the same time as mouse movements are also suspicious. It’s difficult for a human, but not for a bot. The next thing to examine for key events is timing. I like to think I’m a quick typer, and I can hit around 85 words per minute (please hold your applause). Since the average English word is 4.7 characters, that puts me at about seven characters per second. So if I see transactions that have seven characters in 10 milliseconds, that seems just a little bit suspicious. Conversely, taking 30 seconds to type seven characters also seems a little odd. That would involve a slow hunt and peck process. The timing between characters can also be interesting. Evenly spaced characters are typically suspicious because humans vary times between different keys based on typing strategy and keyboard layout. Sometimes I also take it a step further and look at the timing between key down and key up events. Too-regular and too-quick timings on these events can also be a red flag, as they fail to align with the organic timings from human typing. Along with general timing and cadence, copy and paste events are always of interest. Specifically, I mean the use of command/control-c and command/control-v, not a browser autofill. Depending on the particular webpage and the completed fields, the level of suspiciousness can vary significantly for copy and paste events. Copy and paste events for a password, for example, don’t usually sway me one way or another because I envision a lot of people who need to retrieve their password from another application. Applications that are good and bad for this purpose is the subject of a different post. Copy and paste events for a username, although slightly less common perhaps, are probably also not too interesting. But copy and paste events for a social security number, or a home address, well now we are talking. All good things with context, as they say. What are the screen dimensions? When I consider screen dimensions, I look at the dimensions for the available real estate, meaning the screen itself, and the browser window. As a single piece of information, the screen dimensions aren’t wildly informative. A lot of real users have their screens set up for multitasking or multi-viewing. And there isn’t always a lot of variation between the screen dimensions used by millions of real humans. But sometimes the screen dimensions are useful for tracking attack traffic over time if they are sufficiently uncommon. In addition, browser windows that are small in relation to the available real estate might set off red flags, particularly if combined with mouse events like those negative-coordinate clicks. And multi-tasking setups, combined with visibility events, may also seem suspicious. Speaking of visibility events, were there any? A visibility event occurs when the user clicks off the main screen and allows another window to become active. If you envision your normal login process, you probably don’t venture off the login page. There are of course legitimate reasons you might move away based on a distraction or retrieving a password. Combined with keyboard events though, like those suspicious copy and paste events, visibility events can be significant. Additional data as needed These categories including mouse events, key events, screen dimensions and visibility events cover the main items I investigate. But there is a treasure trove of behavioral information that can be collected and explored over JavaScript. Moreover, there is a lot more you can do once you consider groups of transactions. Semi-odd data seen once can typically be explained away. But if that same data is seen in multiple transactions, then more red flags may be raised. I hope the items discussed here provide you with a springboard for diving into your own data to investigate suspicious traffic.468Views3likes1CommentHave Hypothesis, Will Test
In this article I detail the approach we, members of the Shape Intelligence Center team, take when we do a rigorous analysis of customers’ threat data and automation. In this example I’ll walk you through a two-sample statistical test that will stand up to deep technical scrutiny, and provide valuable insight to our customers. When we provide quarterly threat briefings to customers, an inevitable question we get is: “How am I doing compared to my peers?” The easiest way to respond to this question is to provide aggregated comparisons for specific application and platform flows. For example, we can tell Customer X that 17% of their 2020 Q2 web login traffic was automated, while 13% of 2020 Q2 web login traffic for their peers in Industry Y was automated. So, one option for a response to this question is neatly summarized in a bar chart. Figure 1: Login Automation for 2020 Q2 for Customer X and Peers in Industry Y Although this graphic is a stunning example of the basics of Python’s matplotlib, I’m sure all math lovers like me took one look at it and immediately asked: “Is this difference significant?” Aggregations have a sneaky way of hiding the details, and Customer X’s overall automation might be influenced by one or two massive attacks. To answer this question, we need to dive into what happened during 2020 Q2 to decide if Customer X really has more automation than their peers. And this answer is usually what customers actually want to hear, even if their question wasn’t: “What are the results of two sample statistical tests that you ran to determine whether our automation is statistically different from that of our peers?” In this post, I’ll walk through this example to explain how to decide if 17% is truly bigger than 13%, and discuss how this work can actually tell customers how they’re doing. Step 1: Start at the beginning A natural way to compare all of 2020 Q2 for Customer X and their peers is to look at the data on a daily basis. So the first step is to acquire the daily web login automation percentages. Figure 2: Daily Login Automation % for 2020 Q2 for Customer X and Peers in Industry Y Automation tends to be erratic and unpredictable, and the scatter plot certainly reflects that. Both sample sets appear to have some outliers, but Customer X does seem to have more automation for the first half of 2020. Our next step is to review some summary statistics for both sample sets, namely the means, medians, and standard deviations. In both cases, we can see the mean is influenced by erratic spikes in automation and the median automation percentage is subsequently lower. This result is further verified with a high standard deviation for both Customer X and Peers in Industry Y. But the similarity between those two standard deviations means it is reasonable to compare these two sets. More importantly though, Customer X has a higher mean and a higher median, indicating it’s safe to test this hypothesis. Whether we care about the mean or median specifically depends on what test we select, which is our next step. Step 2: Assumptions can make a bad test selection out of you and me The basic statistical testing most of us are more familiar with typically assumes the data is approximately normal. I could expound upon what normal means for a long time, but that’s not the purpose of this post and I might accidentally start a war between parametric and nonparametric statisticians. For my work, I use two things: intuition about the data (usually supported by a scatter plot), and visual inspection of a probability plot. I intuitively believe daily automation percentages are non normal, and the scatter plot from earlier appears to back this claim. The stats portion of scipy provides a built-in function to generate probability plots, so we can easily inspect them for Customer X and the peer data. Figure 3: Probability Plotsfor Customer X and Peers in Industry Y These plots compare the quantiles of the two sample sets against the quantiles of a normal distribution. What we want to see, in order to claim the data is approximately normal, is a scatter plot that roughly falls around the best-fit line, shown in red. What we don’t want to see, in order to claim the data is approximately normal, is exactly what we do see: a clear shape to the scatter plot that is NOT on the best-fit line. The probability plots, combined with our initial idea about the data, indicate that we should not assume normality. As a result, we have to pick a test for non normal data. The second common assumption of many two sample statistical tests is that the two sets of data have equal variances. My approach is to avoid theoretical musings on this assumption, and test it directly using Levene’s test. Levene’s test is specifically designed to determine whether two or more groups have the same variance. It’s perfectly suited for figuring out what assumption is relevant for the data, and is readily available in scipy’s stats. The null hypothesis for Levene’s is that the variances are equal. And we can see with that massive p-value that we fail to reject the null hypothesis. As a result, we now have the two assumptions we need to select our test: our data is non normal, and our variances are equal. Step 3: Determine where to insert “statistically significant” into your groundbreaking results Given the assumptions, the test for us is Mann-Whitney U, which also goes by approximately 15 other names. Specifically, we want to test the hypothesis that Customer X’s login automation percentages are higher than their peers. In a surprise to no one, we can run this test easily with scipy’s stats. Wow, look at that tiny p-value! Our groundbreaking results are that Customer X’s daily login automation percentages for 2020 Q2 are statistically and significantly higher than their peer’s daily login automation percentages for 2020 Q2, based on two sample testing using the Mann-Whitney U test. But perhaps, this is not the exact groundbreaking results we want to convey to Customer X. Step 4: Translate those groundbreaking results into a statement that actually makes sense This translation can take many forms. My general approach when talking to Customer X is to say something along these lines: “Based on statistical testing, your login automation for 2020 Q2 was higher than your peers.” That’s really the key point we need to convey. And they probably are not interested in the exact methods I used, although I’m always happy to explain. Usually I would expound upon that statement a little bit more to say, “We analyzed all of the data for 2020 Q2, and confirmed that your higher automation level was not just the product of a few large attacks.” These two simple sentences convey to Customer X that we thoroughly compared how they were performing in relation to their peers. Although we are likely to still show them the simple bar chart above, we have done the work to rigorously support any conclusions we draw from that basic graphic. As you can see, two-sample statistical testing will really let you tell a customer how they are doing relative to their peers. It is important that you select the right test, so checking your assumptions (normal data, equal variances) will make sure you provide the right results. Questions or comments are welcome.459Views0likes2CommentsHow to Setup Shape Log Analysis in Fastly
Update 8/3: Shape Log Analysis is now a supported log streaming endpoint on Fastly. Read the full details here. Shape Log Analysis is a non-invasive technique used to analyze HTTP and application logs for a clearer view into attackers that are bypassing current security measures. Oftentimes bad actors, botnets, and drive by attacks will consume system resources and commit fraud against APIs in the form of Credential Stuffing, Scraping, Account Takeover and more. Without the proper defenses in place, these attacks are a pain to stop for most security teams who are forced to play “whack-a-mole" with solutions that are not built to permanently defeat fraudulent and automated attacks. Shape Security has a unique corpus of data from attacks that have been identified and blocked over the years for the world's largest banks, airlines, hotels and many other types of infrastructure exposed to the public internet. This anonymized attack data is used to examine application logs revealing automation and fraud that is bypassing perimeter security mechanisms and making its way to your origin servers. Through analyzing data points in Layer 7 traffic, Shape will create a threat assessment on old and new campaigns that are currently attacking specific parts of your applications. Log Analysis Example - Figure 1 The visualization shown in Figure 1 represents all malicious and fraudulent traffic against a specific application. The green pattern hidden in the back is the normal diurnal flow of legitimate user traffic. All other colors are automated attacks driving abuse of APIs and important parts of the application. This type of reporting can be used to not only understand types of attacks and abuse but can also be used to create a plan for integrating a mitigation solution. Types of attacks that will be uncovered: Credential Stuffing Account Takeover Scraping API Abuse System Resource Consumption Getting Started Shape Log Analysis is a free service that is now integrated with Fastly CDN. To avoid complications of compressing, securing and manually sending log data to Shape, we now have the ability to securely send logs to Shape through Fastly's real time log streaming configuration. This is a simple “flip of the switch” configuration, doesn't involve sending any PII data to Shape, and gives organizations the visibility required to take action and prevent these types of attacks. To configure Fastly CDN for Shape Log Analysis, follow these steps: 1)Request a secure S3 Bucket from Shape (send an email to fastly@f5.com with title "Fastly Log Streaming Setup") Once Shape has setup your designated S3 bucket, you will receive an email with a private access key that will be required to complete the configuration in the next step.Keep in mind that Shape uses network and security access controls between Fastly and AWS to ensure data is kept private and confidential. If there are any concerns around how log data is kept safe and secure, please ask in the setup request email. 2) Follow Fastly’s well written instructions on creating a new log endpoint and copy in the Shape specific configuration from below. Log format for Shape Log Analysis (Non-PII data) { "timestamp": "%{begin:%Y-%m-%dT%H:%M:%S%z}t", "ts": "%{time.start.sec}V", "id.orig_h": "%h", "status_code": "%>s", "method": "%m", "host": "%{Host}i", "uri": "%U%q", "accept_encoding": "%{Accept-Encoding}i", "request_body_len": "%{req.body_bytes_read}V", "response_body_len": "%{resp.body_bytes_written}V", "location": "%{Location}i", "x_forwarded_for": "%{X-Forwarded-For}i", "user_agent": "%{User-Agent}i", "referer": "%{Referer}i", "accept": "%{Accept}i", "accept_language": "%{Accept-Language}i", "content_type": "%{Content-Type}o", "geo_city": "%{client.geo.city}V", "geo_country_code": "%{client.geo.country_code}V", "is_tls": %{if(req.is_ssl, "true", "false")}V, "tls_version": "%{tls.client.protocol}V", "tls_cipher_request": "%{tls.client.cipher}V", "tls_cipher_req_hash": "%{tls.client.ciphers_sha}V", "tls_extension_identifiers_hash": "%{tls.client.tlsexts_sha}V" } S3 Bucket Details When you receive the S3 Bucket confirmation from the fastly@f5.com email address, it will contain the following 5 items that you'll need to insert into your Fastly configuration. 1.) Bucket Name 2.) Access Key 3.) Secret Key 4.) Path 5.) Domain Click on "Advanced options" and add the following: After completing the setup, your configuration summary for Shape Log Analysis will look like the following: Once the Fastly logging configuration is complete, logs will be sent to Shape's secure S3 bucket for analysis. Typically we collect around two weeks worth of log data to provide a comprehensive analysis of attack traffic. Additionally, an F5 or Shape representative will be available to provide support during the logging setup and a Threat Assessment Report will be provided as part of the service. Additional Information on Shape and Fastly6.1KViews0likes0Comments