White Box Web Application Pentesting


Posted on January 15, 2021 by 46o60

Tags: break info


Table of Contents #

Intro #

This writeup is all about so-called "white box" approach of pentesting web applications. This is sometimes called "[Secure|Source] Code [Review|Audit]", but my preferred way of naming it comes from the OWASP world of Dynamic Application Security Testing (DAST) and Static Application Security Testing (SAST) nomenclature. While DAST and SAST are more often used to describe automatic tools I like to use the same terminology to name manual pentesting activities, DAST being a typical black box or gray box approaches while SAST being Source Code Review. A new kid on the block is so-called Interactive Application Security Testing (IAST) mostly used again for automatic tools. As the way those tools function is close to what is usually referred to as "white box" approach I felt using IAST to name it fits very nicely in my OCD requirements. However, naming in IT and IT security is chaotic enough, so we will skip going deeper into this and continue using ambiguous "white box" name for what I am explaining here.

Regardless of the naming, I will define what kind of pentesting I am talking about in this writeup. There will be a couple of variations but in general these are the preconditions for a pentest to be called white box pentest:

  1. Pentester has access to source code of the application.
  2. Pentester can build and run the source code.
  3. Pentester can attach a debugger to the running application.

If you replace "Pentester" with "Developer" you end up with a list of developer's daily activities. This is why it is called white box as pentester will have access to everything that the application does. Sometimes combinations of the above preconditions are called white box. For example, having access to the source code and also running application but without ability to run the source code or attach the debugger. This can happen when the target application is proprietary and source code is obtain either by reverse engineering or decompiling or it was leaked in some information disclosure. In either case it is not possible to actually compile and run it for various reasons. More accurate naming would be "beige box" or "yellow-snow-white box" but let's not get back into the naming rabbit hole again.

Hackers "We are nameless"

But how to actually execute a pentest having all these capabilities? From where to start? How to get to the end? I am writing this because I want to share my methodology on how to do this.

The main motivation to create a proper methodology came from the time I was going through AWAE training and OSWE certification from Offensive Security and FSWA training from Steven Seeley. I realized that having a well-defined methodology is the only way to be efficient. I was occasionally performing source code reviews even before I went for those trainings but looking back now I feel that having a good methodology brought me light years forward in the quality of the delivered value with my pentests. The described methodology is absolutely applicable for any AWAE lab or exam machine. I created this methodology before the exam and used it for the exam which I successfully passed from the first try. In addition, I will enhance and extend it in the future as I will learn new techniques so keep an eye on any updates I make.

A big disclaimer is that my methodology is nothing new, all of it already exists out there. I just summarized it and presented it here in my own way.

So lets finally begin... :)

Notes #

Writing down notes in any pentest is very important but in a "white box" approach even more so. That is because the amount of information can be overwhelming, especially if the code base is huge. You want to write down everything that you learn about the application. Things like where is a specific configuration file located, list of discovered routes or locations of potential vulnerabilities, all of it.

One of the core skills you need to develop to be good at pentesting is to know how to prioritize things that you will spend your time on. You will do this prioritization and re-prioritization every few minutes as you are exploring how the application works. Your notes will be your main partner that will help you do this efficiently. Part of my methodology is a prioritized TODO list, starting from most important things towards the less important. Basically, whenever you notice something worth investigating you don't jump into analyzing it right away but instead you first put it on the list. Exactly where in the list the item is placed is determined based on the current information gathered during the pentest. As you learn more about the application, some items will get updated with new information, some removed from the list and some will be moved in higher or lower positions in the list. This structural approach will ensure your time is spent on most important things. Some people can jump around following whatever interesting thing they see but that was just not working for me.

In the following sections I will mention many questions that you should ask your self during the pentest, and many things that you should discover about the application while doing a white box approach. You should write down all of those answers and information into your notes. Sometimes I get lazy and just go and only start keeping most of it in my head and each and every time I regret it later. Either because I get another pentest in parallel, or the pentest for some reason has a pause like a vacation, sick leave or just a longer weekend. Everytime these kinds of situations happen, and I was writing proper notes, I am super grateful to me from the past.

Towards the end of the writeup I provide my personal template that I use for my notes. You can, and probably should, adopt it to your way of thinking but if you are just starting, take it as it is and see how it works for you.

Mission #

First thing that needs to be established is the high level mission that you need to carry in your head the whole time.

The Mission

I am sure there are different people with different views and motivation but for me, being somewhat of a control freak, what worked the best is to have constant attitude of trying to understand how the application works. What I mean by this is that the goal should always be to hook into the application as perversely as possible: attaching debugger, seeing application logs and database logs, examining every HTTP request and response (and in some cases every TCP packet), configuration files and environmental variables, enumerating used frameworks and libraries, details about operating system, processes running, ports open... we want to know EVERYTHING. NSA and Great Firewall should be nothing comparing to insight we want to have towards the target application. Only by understanding the application can we understand its vulnerabilities.

How to start #

Starting is hard. Especially when you are new to something. Some people like to have numerous lists of strings, each string representing different coding patterns which are associated with different vulnerabilities. They then start searching the source code for these patterns and when they find some they start their analysis from those code sections working forwards and backwards through the call hierarchy. I will call this "the grepper's approach". While this is a valid approach, I am using it in my methodology as middle or even ending activity, not as starting activity. As stated in our mission, we want to understand the application, and the more time spent on trying to understand it, the better. Grepping for strings will not give us deep understanding, it will focus our time on triaging false positives from false negatives. However, there is important value from it, so we will use it but only as a sanity check at the end, to ensure we did not miss something or as a tactic to get us from situations where we feel we hit a wall. The only situation where I would use this as a starting point is when the time allocated for the pentest is too small for the size of it, which in real life is not a rare thing.

There are two possible setups for the environment that you are testing. The ideal case is where you get access to the source code and then you build the running environment by yourself. This way you have control early on, you can see and learn about application already while you are setting it up. Also, you can make some configuration choices that will make your life easier, like running the application in debug mode, configuring multiple users with different roles, turning security features off, etc. Finally, some things can be flexible, like on which operating system it will run, and you can choose what makes you most comfortable to work with. The other less ideal case is when you already have application installed on some environment. This is the case with AWAE, but it can also happen in real life likewise. Sometimes you are testing an appliance where even setting up debugging is hard or impossible to do, or you are reverse engineering an application. In these cases your knowledge of the system is very limited, and you need to first enumerate everything. The following sections contain a union of things to do in both cases, use them appropriately as they make sense.

Decomposition #

We will start by investigating and breaking down the target into components and gathering basic information about each component individually. Remember, this is white box, all information and access privilege is there, you just need to organize it so it makes sense to you.

  • Operating system
    • Name and version.
    • Running processes and services.
    • Open ports and what listens on them.
    • Location of logs.
  • Web server
    • Name and version.
    • List of used configuration files.
    • Location of environmental variable declarations and their values.
    • Location of logs.
  • Database
    • Name and version.
    • Establish CLI (or GUI) connection to the database as highest privileged user.
    • List of created databases.
    • List of tables and their columns.
    • Location of logs.
  • Web application
    • Name and version.
    • Programming language.
    • Location of source code and web root.
    • List of all used frameworks, libraries, templating engines etc., both in frontend and backend.
      • Name and version for each.
    • Location of logs.

Some of this information will perhaps be discovered later. For example, usage of templating engine maybe will not be obvious at start. That is fine, as everything else in this methodology, the flow is flexible and going back and forwards through it is expected.

Inspection #

Once you know the components of the application you can start hooking into them to start seeing what is happening inside. The most intrusive way is of course to have attached debugger but there are other ways to examine what is happening in the application.

Debugging #

As you are building basic knowledge about the application you will be more and more confident in how to setup debugging. How to actually do it depends on the programming language of the application and IDE that is being used. However, this is out of scope for this writeup, so I am writing here only a very short overview of what debugging is all about. Most importantly, I am not a master blaster of debugging techniques, so the smartest thing I can say here is that Google is your friend.

To get started you want to accomplish three things as a starting point:

  • actually connect a debugger
  • successfully set a break point
  • hit that break point

Stop program execution

Standard debugging works by opening source code in your IDE and running the application with an attached debugger. If you have added breakpoints on some code lines your application will stop execution, and your IDE will be in debug mode showing a bunch of useful information about the application state at that moment. You can then step line by line through the execution and see exactly what is happening, tracking down how different values are propagating through the code. In this case IDE did all the work of connecting the debugger with the running application but sometimes you will have to do it manually.

Remote debugging

Remote debugging means that you have source code locally, but the running instance of the application, where debugger is attached to, is running remotely. The application could be started by you or it could be started already, as it is the case in AWAE labs and exam, and you just perform the attaching step. There is also a version of remote debugging where both source code, and the running instance of the application are located remotely, on the same or different endpoints. This works by only having IDE locally on your machine, and then some part of IDE, usually a plugin or part of plugin, being deployed to the remote location. This deployment and then later viewing of the source and debugging information is all done through some protocol between your local machine and remote location. An example is Visual Studio Code with Remote SSH plugin, again extensively used in AWAE training.

There are other examples of remote debugging tools, general advice is to make sure the tools you are using are stable and as fast as possible as any significant lag will break your focus while reading the code. Explore different options to find what works the best.

Echo "debugging"

If the application's programming language is interpreted language then you can edit directly the source files from where they are being interpreted and executed, usually in the web server's web root. You can use this approach to just indicate that a particular code section was reached based on your input, or you can even manually dump variable values to standard output or log files. This approach can be called "poor man's debugging" and was often seen in context of PHP applications as a command die("HERE"); but can be applied to any other interpreted language.

Poor man

Useful when no alternative is possible but spending time trying to find an alternative is time well spent.

Logging #

Debugging is basically a process of understanding what is happening in the application in real time as it is running. Examining logs would then be a process of understanding what has happened to the application in the past. If you don't have a debugger attached then logs are an even more important source of information. Locations of all the logs should already be known if you decomposed the application properly. What is left is to investigate what is inside the logs. Start with application logs and database logs but don't be afraid of opening any logs that you have access to.

Depending on the source of the log, being from web application, database, web server or even operating system, different formats will be present inside. You want to understand everything that is written inside. Check the relevant technology to understand their logging formats. If custom format is used, find where it is defined and examine it.

Logging level should be under your control. Find where it is configured and turn on maximal logging that is possible. Traffic you generate will usually be low in a white box approach, so we don't have to worry about filling up the disk with logs. If the output is too much, that is fine, use tools like grep to filter in or out what you need. One trick for this is to use some unique values when you are interacting with the web application, for example adding XXX1337XXX into your inputs. That way you can filter only for log entries that have that unique value.

In some cases the application has so-called "debug mode". Usually it is a configuration item or environmental variable and the application will output the maximum level of logs and in some cases you could even see in frontend very detailed debug information when exceptions are triggered. Be sure to identify what it is and turn it on.

Finally, we want to shorten that time delta between when the log was created and when you examine it. One way to almost get a feel of running a debugger while only examining logs is to watch the tail of the logs live while they are being created, for example with a command tail -f /path/to/log.log. In combination with filtering for unique values as suggested above you can instantly see what is happening in the application when you send a specific input.

Proxy #

Web application means HTTP traffic. Keeping in line with our methodology, we want to see and know everything. That is why you want to setup an HTTP proxy between the frontend in the browser on the client side and backend web application on the server side. You should use whatever gives you capabilities to see, modify and reissue all generated HTTP requests. If the web application uses web sockets then you also want to be able to see and modify those requests. Tool that is de-facto standard is Burp Suite, the community version being completely acceptable, but others are also out there.

Burp Suite specifics

How to set it up and use it is beyond this writeup, however, I will note a few features that are super useful:

  • Install extension Copy as Python requests. After finding vulnerabilities you want to create an exploit script to execute the attack as later sections will explain in detail. This extension will speed up your exploit development as it can automatically generate Python code for any HTTP request that was captured in Burp.
  • Turn off timeout for Repeater requests. Hitting a break point and examining code manually takes time and by default Burp will timeout after some time. This means once you resume the execution you will not see the actual response in the Repeater.

Other #

Sometimes additional intrusive tactics will be needed, usually when you don't have a debugger attached. Using procexp or procmon on Windows systems or tools like strace on Linux systems could reveal important information on how the application is working.

Finally, if you want to truly see what is going on, on the network level, then a proper packet sniffer is the way to go. Reserved for exotic cases or situations when you feel you are about to go crazy.

Desperate

Interaction #

You have a good high level understanding of the web application, and you are hooked into its execution, now is the time to start generating some traffic. Open the browser and start interacting with the application. Basic steps that you want to do here are almost the same ones that pentesters do in "black box" and "gray box" approaches.

The main thing you want to accomplish is to start getting more and more familiar with the web application, more specifically with available functionalities. The first step is to understand what the application's purpose is. After gaining understanding, try to actually do whatever the application is built for. This is the main critical functional path and is a good place to start going deeper. Then expand to whichever other functionality is present while building your priority TODO list.

Examine JavaScript, make sure you know what libraries are used. Make sure you know what is from third party vendors, like jQuery, Bootstrap etc., and what is custom from that specific web application.

Examine how HTTP requests look like and how they are created on the frontend side. Understand the structure of URL endpoints. Check out names of parameters and their values. Identify how the application state is tracked on the client side (session cookies, ViewState, etc.).

At any point in time, look at source code, hit a break point, check the logs. After all, that is why all that work before was done, so that we see what is going on when we do some action. Focus also on structure of the source code and how it is organized. What is the directory structure and what are the names of directories and files. Observe how the code is organized inside those source code files. What conventions are being used for anything and everything. Once you are sure you understand the convention, write down if you notice something that jumps out of conventional format. Still, don't go into deep analysis, write down in your notes interesting observations. At this stage we want to level up from basic understanding that was gained while discovering the basic components to intermediate level where we understand the different functionalities that the application offers and also get first impressions on source code and how the application truly behaves on our inputs.

Analyze #

Now is the time to jump deep into the search for vulnerabilities. Categorization, naming and definition of vulnerabilities can sometimes be ambiguous. Our prime target is breaking through the web application to gain access to the underlying operating system. The best way to do this is to find a vulnerability called RCE. The acronym is used, again ambiguously, for both Remote Code Execution as well as for Remote Command Execution. There is a distinct difference which you can look it up if you don't know it but in both cases the desirable result is to get a reverse shell call back which would give us access to the system. Finding RCE is the primary goal of AWAE lab and exam machines and if you find it in real life pentests... well, you will be pleased with yourself that day. :)

However, RCE vulnerabilities often happen in more complex parts of the application which are probably behind some authorization and authentication protection mechanisms. To access it you would need a valid user account which is a precondition that would be nice not to have. To circumvent this obstacle, you want to find a so-called auth bypass vulnerability, 'auth' in this context being used for both authorization and authentication. This will give you capability to reach functionality that would otherwise not be accessible to you.

Finding auth bypass and RCE and creating a script that chains both vulnerabilities to accomplish one click to reverse shell is a wet dream of every white box pentester. We start by first exploring the public part of the web application, focusing on authorization and authentication and then moving on to protected parts of the application. This approach will work for both AWAE and also real engagement. Remember that in real pentests you want to report everything that you find that is vulnerable in some way, while in AWAE this is not obligatory. Keeping good notes will make it super easy to do that at the end of the pentest.

Routes #

Every application has some kind of URL scheme that it follows. The scheme could be defined by the used framework or manually by the developers. From the interaction phase you should already have basic familiarity with routes in the target web application. Now you want to find all of them.

Depending on how they are defined it could mean finding a config file like web.xml and extracting it from there or directly in the source code by finding decorators where they are defined. In any case, you first find a few of the routes and examine how they are defined. After that use regex with your IDE search capabilities or grep to find all of them. At the end you want to have as complete of a list as possible.

Have the process of finding those routes easily repeatable and if possible automated in a script or one liner. If you are pentesting a complex application you will probably miss some routes in your first pass. If you have automated or semi-automated the process of finding the routes you can then later easily add new information from your observations to get a more complete list of the routes without losing time. Try to identify what is old and what is new in your outputs compared with previous runs and if it makes sense put the newly discovered routes higher in your priority TODO list.

Auth bypass #

Authorization

Every web application has some way to enforce access control or putting it simply, logic to decide if access is granted or blocked for something. As you figure out how routing works in the application your next step is to understand how the application checks who can access which route and resource. First focus only on identification where authorization is enforced and who is given access and who is not. Combining this knowledge with your list of all routes, as the simplest case, you will be able to have lists of routes that are publicly available and routes for which some form of authorization is done. In more complex scenarios, application could have multiple user levels or access could be granted based on roles. You want to know who can access what.

Once you know this, start the audit from public parts towards private parts. We start with public parts because they have the biggest exposure and are thus more likely to be attacked by real attackers.

Authentication

If there is authorization then there is authentication also. Difference between each can sometimes be hard to understand but there are plenty of resources out there that explain this well. To start searching vulnerabilities in authentication you want to focus on the login, registration, password reset and similar functionalities. As a more general idea, you want to know everything about information on which authorization from the previous section makes decision that someone is authorized to access something or not. Things like cookies, authorization headers, password reset tokens etc. For each you want to understand:

  1. Points of origin - where and how it is created and very importantly what the structure of it is.
  2. Travel paths - how it is transferred between different places and where and how it is stored.
  3. Points of verification - where and how the information is verified.

If there is an Auth bypass vulnerability, now you should be able to find it based on your understanding about authorization and authentication. Exact vulnerability will greatly vary between applications, frameworks and programming language used.

RCE #

Now comes the search for the "finish him" blow.

Finish him!

Depending on the application, finding the RCE could be easier or harder than finding Auth bypass. If you are super lucky you could even find pre-auth RCE skipping the need to have Auth bypass for full compromise. Whatever the situation is, to find the RCE you need to go through everything and everything can be a lot. Big application could have huge code base with many functionalities built in. There are two different approaches here, both valid in my opinion, and they can be combined by switching from one to another or using them in the same time.

Focus on vulnerability

First one is to focus on vulnerability types. Searching for SQL injection is a good example. Queries towards the database will happen from both public and private parts of the application. By examining how the queries are executed you could end up by gaining access to the database. What you find in the database then can lead towards the next step in your chain towards RCE or even directly allow you to escalate from the database directly to the host.

To find SQL injection you need to be thorough. It could be that all SQL queries are vulnerable and then when you pick first one you are done. However, today this happens only in really bad applications, and most commonly developers do think about SQL injections and try to prevent them. In addition, usage of frameworks and libraries also helps in SQL injection prevention as concern about it is delegated to the developers of those specific frameworks and libraries. However, what still happens today is that either there is deliberate misuse of the framework or library, or there are parts of the application that are for some reason customized. To find these code patterns you need to be able to enumerate all instances of SQL queries towards the database. And then manually examine them all. Yes, all of them... or at least the ones that end up on top of your priority TODO list before the time for the pentest runs out.

Clever usage of regex is what will save you precious time. If you see in the source that a specific function is called with SQL query as parameter, for example run_q(SELECT username FROM users), search the whole source for locations where that function run_q is called and then examine if there is some input to the query that you control. If there are too many results try to narrow down on specific ones that have string concatenation when the query is constructed, for example run_q(SELECT email FROM users where username = " + username), instead of wasting time on ones that are parametrized. How to write regex that will do this narrowing down depends on the code in question. However, be careful with regexes as they can also backfire and make you miss something. For example, you discover that absolutely the whole application uses only one specific function run_q to send queries towards the database but there are hundreds of occurrences in the code. You write a great regex to search for lines where that function is called and has concatenation in it, something like run_q\(.+\+.+\). You feel good about it but actually, you miss the SQL injection because in a few instances, one of which is vulnerable, the query string was so long that the developers split it in multiple lines, and your regex missed it as it assumed the query will always be in the same line as the call to the function.

SQL injection being only one example, vulnerability types can be various. As a starting point you can use the ones AWAE course covers:

  • XSS to RCE
  • File Upload
  • SQL Injection
  • Code Injection (JavaScript, PHP, etc.)
  • Deserialization
  • XXE
  • SSTI
Focus on functionality

A different approach in handling huge codebase when searching for RCE is to focus on functionalities instead of vulnerabilities. To manage the time appropriately, you want to start from the most complex functionalities. Code complexity is correlated with the probability of vulnerability presence so that is where you want to focus your efforts first.

Focusing on functionalities is also directly connected to focusing on vulnerabilities. Answering a question like should you first search for Deserialization or XXE is not really possible. Ideally both, practically sometimes not possible because of time restrictions. Using functionality complexity as a prioritization mechanism is useful to make sure your focus is on the right place.

How to evaluate the complexity of a functionality is not straightforward. In straightforward cases like the complexity of plugin system functionality vs updating user profile functionality it is simple to decide what to check first. However, often two parts of the application could seem equally complex. Should you first focus on custom API or interface for "power user" user role... Hard to say, and it is something for which you need to develop instinct through practice.

If you already invested a good amount of effort in decomposition, inspection and interaction you should already have a list of functionalities in your notes that should be examined first. One example is seeing while interacting with the application some custom data formats being passed around, maybe binary data that could be serialized objects. Or another example would be that during decomposition steps you saw that the application takes some inputs from a different protocol. You could then prioritize these functionalities first.

Sometimes RCE is achieved not through finding a traditional vulnerability but by abusing built-in functionalities. A nice example is the ability to upload plugins to the application that do something. A plugin system is quite complex functionality of a web application and implementing it securely is not an easy task, so you could find a way to abuse it to gain RCE. There are even applications that have SQL injection "as design" as part of the application. This is sometimes a legitimate functionality to have in a web application after all, but if you find Auth bypass and can chain that extremely powerful built-in functionality it can trigger discussions at your customer on actual necessity to have that functionality as part of the application. Or at least make them aware that having something like this in the application means extra efforts should be done on other defenses or that at least limitation on what kind of queries should be only available.

Hail Mary #

If everything fails, and you still don't have anything concrete start throwing Hail Mary passes.

Hail Mary throw

Example of a Hail Mary approach is assuming that what is considered secure, is actually not.

For example, while searching for SQL injection, if you ignored the parametrized queries, go back and examine them also. Maybe the lines in question are actually calls to stored procedures or user defined functions and from application code it seems secure but if you check the stored procedure in the database you see that it has concatenation inside it.

Another example is assumption that third party libraries are secure. Your job is to break the web application and nothing is off limits. If developers use library that is unsafe, it is your task to discover and exploit this even if this means finding previously unknown vulnerabilities in those libraries.

Sometimes something that looks like a call to a standard function is actually a call to some custom function because the developers redefined the standard function. Verifying every single line that you see in the code is not something you can usually do in parallel to all other activities you are executing. It would just be too distracting if you went down the rabbit hole constantly to double check each call to standard functions if it is legitimate. However, if it is Hail Mary time then it is good to go back and specifically look over everything through super sceptical eyes.

Exploit script #

Everybody who went for AWAE know that the one of the main exam goals is to write a zero to hero script that exploits auth bypass and RCE to completely compromise the target web application. Other than endorphin explosion happening on a turnkey exploit script producing a reverse shell there is an important reason why you want to create the exploit script and that is impact.

For anybody that already performs pentests this is maybe obvious, but finding a vulnerability is only half of the job. You are searching for vulnerabilities so that they can be fixed and unfortunately we are not living in a world where everything gets fixed immediately. We are not even living in a world where everything gets fixed. So to ensure your efforts actually have some meaning the second half of the job is persuading the customer that what you found is actually important. One of the best ways to show the impact is to create a nice exploit script that can be run by anybody.

In addition to showing the impact, it is also a great way for you to show the other side what steps are taken to exploit the vulnerabilities. Anybody can read the script and see exactly what is happening instead of losing time and energy in endless email threads, bad screenshots and demo's that don't work because of a misclick.

The end #

Ideally, you will go through all functionalities and all vulnerability types and by the end of the pentest you will feel confident everything is covered.

Report #

Because it is a white box approach your report can have snippets and screenshots of exact locations in the code where vulnerabilities are located but other than that there is not a big difference from a standard pentest report. Just assume whoever will be reading your report needs to be able to understand and reproduce everything you did, assuming basic technical knowledge.

Lessons learned #

You learn by doing. You learn better by doing and analyzing what you are doing. If you have time, you can record everything you are doing and do the analysis later. However, timewise that is a luxury so in practice you have to do the analysis on the fly. What works for me is to, as soon as I gained some new insight or acquired new knowledge I would write it down. Later I can easily reflect back to the pentest notes and go through this section to remind myself on the all "aha" moments and enhance their imprint in my head. Don't limit this only to successes, the best thing to learn from are failures and writing them down helps to avoid them in the future.

Miscellaneous #

IDE #

How to setup debugging is the most important aspect when choosing what IDE you will use to examine the application code. As stated before, explaining how to setup debugging for different IDEs and programming languages is not my goal. However, there are a few other things that should be considered when choosing in which IDE you will examine the code.

The most important feature of IDE for white box approach is good search capabilities. I would even say that good is not enough, it should be amazing. Big majority of what you will be doing is browsing the code and having a good built-in search capabilities is what can make a difference in finding or missing vulnerability. Here are some things that should be supported:

  • lightning fast search by having implemented good file content indexing - so that you do not lose focus and concentration waiting for results to show or even worse, waiting for GUI to respond to your clicks
  • support for regex searches - so that you can snipe down on specific lines of codes
  • parallel independent search windows - so that you can investigate in parallel instead of closing one search result only to start another one
  • custom scope definitions (also with regex support) - so that you can include or exclude files and folders that, for example, are not part of the scope and reduce the noise
  • copy list of search results - for building notes and priority TODO lists
  • search result code view - so that you can immediately start examining the code around the search results with all other perks of your IDE

In addition to kick-ass search functionality standard IDE features are needed like code click through, opening class and method definitions, getting lists of call references, views that show structure of classes, basically anything that helps you better understand and navigate the code.

Other nice quality of life things to have in your IDE can be features like custom bookmarks so that you can mark specific places of interest in the code although this can be done even by adding to the code comments with some unique string tag. Even better is to have ability to easily copy file path together with current line number (ex. controllers/MyBigController.java:487). This one is slightly more work as you need to put it to your notes with optionally commenting what is interesting there but having this habit is independent of the IDE and additional benefit is that this information will be part of your notes file instead of IDE project files.

Depending on what programming language you are dealing with you will have to use different IDEs. Sometimes you can use one IDE for multiple languages but do this only if you are not sacrificing debugging capabilities for it. In the end, nothing prevents you to use two or more IDEs at the same time. See what works for you and for your particular pentest. Whatever you use, invest time to get familiar with all the features of the IDE. Learning keyboard shortcuts is a great time investment as well as reading IDE documentations to learn those hidden features that could be super useful to you.

Preparing for pentest #

So far I was only talking about the pentest itself. There is also time between different pentests that you can use wisely.

Programming languages #

To really enhance your white box approach a deeper knowledge about programming languages you are auditing is something you should strive for. As application and frameworks, programming languages also have their minor and LTS versions and release cycles. Knowing differences between different versions is knowledge that will make you stand out. Also, following the newest features that are introduced in programming languages will help you understand the application source code quicker and better once developers start using those new features. New features means new opportunities for misuse and creation of new security issues, and it is your job to stay on top of it all. Finally, take note of things that are deprecated and why they are deprecated.

Frameworks #

Even if the scope is only one programming language, there are still many MVC frameworks out there. Being on top of them all is not really feasible. However, you could spend time on following some of them and the best way is to start from the most used ones. You can find lists of the most used frameworks for specific programming languages easily on the internet. Spending time on understanding a security fix in the newest released version of a framework can help you in your work greatly.

Or just spend time understanding how they work, how to set them up, where configuration files are etc. Spending time to learn this saves time later when you are executing the pentest.

Grep lists #

So far I emphasized understanding on how the application functions, however, I also mentioned that using lists with search strings is in some cases beneficial. Having a list of, for example, classes that are vulnerable to deserialization attacks in C# is something that is useful to have. So time should be spent to compose and maintain those lists. There are public repositories that do this you can also use those for start.

Really the end #

If you managed to get to here, congratulations, as I am not really a good writer. XD I would appreciate if you have any feedback about anything written here to contact me on Twitter, either to tell me that I am wrong about something or that I missed something.

Good luck on you future white box web application pentests, and I hope what you have read here will help you discover more vulnerabilities!

Notes template #

# Activities

1. Investigate components of the application
    1. Operating system? Web server? Database?
    2. How does web server run web application? How does application connect to database?
    3. Where are configuration files? Application log files? Database log files?
2. Configure debugging.
    1. Connect a debugger, set a break point to entry point (or equivalent) and hit the break point.
    2. Turn on logging to maximum.
3. Start interacting with the application
    1. Click all the things.
    2. Check all HTTP traffic.
4. Look at the web application surface - routes, endpoints, parameter values etc.
5. Figure out how authorization works - what is public and what is behind authorization
6. Figure out how authentication works
7. Hunt for specific vulnerabilities
  * XSS to RCE
  * File Upload
  * SQL Injection - normal, blind (logical or time based)
  * Code Injection (JavaScript, PHP, etc.)
  * Deserialization
  * XXE
  * SSTI
  * Mass Assignment
  * PHP Type Juggling

### TODO:

- 

# Target information

## General

IP address, DNS, SSH access...

### Source code

Location on the disk, working directories...

### Web application logins

Normal user, admin user...

## Operating System

[NAME and VERSION]

### Network

What ports are open, what is application, what is database...

### Processes

What is running with what privileges, command line arguments...

## Web Server

[NAME and VERSION]

### Configuration files

Location, values of interesting configuration items...

### Environment variables file

Location, values of interesting environment variables...

### Log file

Location...

### Other

## Web Application

Port, index route...

### Frontend - JavaScript libraries

List of libraries, what is vendor and what is custom...

### Backend

Web root, ports and portocols...

#### Configuration files

Location, values of interesting configuration items...

## Programming language

[NAME and VERSION]

## Frameworks

[NAME and VERSION] for each framework

### Templating systems

[NAME and VERSION] for each templating system

## Data

### ORM

Connection between application and database...

### Database

[NAME and VERSION]

#### Credentials

Where are they located, what are the values...

#### Config file

Location, interesting configuration items...

## Debugging

Debugging setup notes, connecting to Debugger machine if there is one...

## Routes

List of all routes, how to extract them from source code...

## Authorization

How it works, relevant code locations...

### Public routes

What is public...

### Private routes

What is private...

#### User X routes

#### Admin routes

## Authentication

How it works, relevant code locations...

# Temporary notes

## [TOPIC X]

## [FINDING X]

# Lessons learned

Anything you learned during the pentest...


Worth sharing? Tweet this blog post

Tags