Let’s say a machine in your corporate fleet gets infected with malware. How would you detect it? How could you find out what happened on the machine? What did the malware do? Did it steal your browser’s passwords? What network connections did the malware make? Was it looking for crypto currency? By having good telemetry and a good host monitoring solution for your machines you can collect the context necessary to answer these important questions.
Proper host monitoring on macOS can be very difficult for some organizations. It can be hard to find mature tools that proactively detect security incidents. Even when you do find a tool that fits all your needs, you may run into unexpected performance issues that make the machine nearly unusable by your employees. You might also experience issues like having hosts unexpectedly shut down due to a kernel panic. Even if you are able to pinpoint the cause of these issues you may still be unable to configure the tool to prevent the issue from recurring. Due to difficulties like these at Dropbox, we set out to find an alternative solution.
One of the first things we did was create a list of requirements and success criteria:
- Stability and minimal performance impact
- Kernel panics and obvious delays or other lockups are certainly not acceptable
- Record interesting activity on the host
- Process spawning
- Filesystem Modifications
- Network activity
- Details about configuration settings and installed applications
- Record details about these observables which would tell us:
- Date and time
- How observations are related (parent-child relationships, or shared keys which connect events, like process id)
- Additional details to assess the relevance or impact of the event
During the investigation we reviewed a number of tools that could solve some of our problems, but none of the tools could solve all of our problems. After careful review we decided that we didn’t want to reinvent the wheel and that having multiple tools that each solved a specific requirement would better serve our needs.
We eventually landed on 3 open source tools: osquery, Santa, and the OpenBSM/Audit system; with each tool serving a specific purpose:
- osquery provides periodic snapshots describing changes to the state of a machine
- Santa provides real-time process launch events containing details about the executing binary
- OpenBSM/Audit is real-time system call monitoring module in the macOS kernel that can provide networking, file operations, administrative events, and other system interactions.
osquery is an open source operating system instrumentation framework for Windows, macOS, Linux, and FreeBSD by Facebook. This tool allows users to query the state of their system via a SQL interface. Some of the useful features of this service are:
- The ability to parse preference and configuration files, list installed applications, current running processes, file path information, and installed browser plugins.
- This is useful if we are looking for suspicious applications or if we want to know if a machine has some specific configuration settings.
- osquery by default comes with several packs of useful queries and the core application is regularly being updated to include new features.
Using osquery we can perform queries to search for IOCs (Indicators of Compromise) on a host such as the recent Proton malware:
With osquery, we can get a lot of information about the current state and possibly the previous states of the machine. This still leaves us with a gap; what about events that occur between scheduled OSQuery queries?
Here comes Santa
Santa is an open source tool developed by Google specifically for macOS. It provides information on executed processes and some disk events. For processes, Santa can provide the following info:
- sha256 hashes of the executed binary
- Quarantine URLs — The full URL for where the binary came from if it was downloaded
- PID — process id
- PPID — parent process id, which is important for building process trees
- The “Common Name” field and sha256 hash of the cert used to sign the binary
Another powerful feature that we won’t cover here is Santa’s ability to prevent execution of binaries (binary blacklisting and whitelisting).
Using the data we collect from Santa we can investigate most execution actions performed on hosts. Interestingly, this lets us see execution events from the recent Proton malware such as the exfiltration (“exfil”) process:
We can even see what was exfil’d from hosts, such as 1Password vaults, Chrome browser history, etc.
Using the sha256 hashes provided in the Santa logs we can investigate the reputation of some of the files dropped by Proton.
With osquery and Santa we have a really good picture of the executions that occur on a host. However, we are still missing some information about what actions are performed by specific applications with respect to network connections and filesystem interactions. osquery can give us some of this information querying the process_open_files table or the process_open_sockets table but there is still a chance we could miss events that happen between query intervals. Therefore, we need a real-time pipeline like the one that Santa gives us.
To get this data we leverage the OpenBSM/Audit (or audit) system. This subsystem is built into the macOS kernel and is based on OpenBSM. OpenBSM/Audit provides a real-time stream of information about the host’s activities. During configuration of audit, we will tell audit which audit class of system calls you want it to monitor. For example, if you wanted to monitor network events you would utilize the
nt audit class. The
nt audit class will create a stream of data in a binary format where you can use another tool provided with audit called auditreduce. This gives the ability to filter out information to specific audit events from the class and convert the binary data to human readable XML-formatted logs.
After setting up the appropriate logging services for audit, you can configure audit to produce events for the missing pieces of our puzzle. You can make it monitor for file read, file create, file write, and network events to get a better understanding of system activity that a process is making.
All of the above interactions could be seen using individual events, which is great. However, what if we combine these events into something more?
By using timestamps, PID, PPID, Network events, and File events we can create process trees. Each of these process trees can tell a story about what happened when this process was executed. The example above is a common attack technique using office documents with malicious macros to pull malware from the internet and compromise hosts. Once we have a clear picture of what happened via the process trees we can make judgment calls on actions performed by applications. These applications could look legitimate but, in the observed execution context, may be malicious.
At Dropbox we are strong proponents for open source software and even stronger proponents for security. Being worthy of trust is our #1 cultural value and is core to our mission as a security team. If you’re interested in working on hard problems, our security team is always hiring talented people.
- https://www.dropbox.com/jobs/listing/1074834 (San Francisco)
- https://www.dropbox.com/jobs/listing/1078978 (Seattle)