A Gozi Banking Malware Detector – Zeek Roulette #3

I had talked about Gozi malware in our eCrimeBytes podcast here:

Last Man From Gozi Banking Malware Group Sentenced To Three Years – eCrimeBytes Nibble #51

In my technical real life job at Corelight, I ran into a sample of the Gozi banking malware in the wild here:


You can download a PCAP of the infection from this link too. This is the same PCAP I used to develop this detection logic in Zeek.

According to the notes at Malware Traffic Analysis, the malware C2 information is summarized by:


- port 80 - diwdjndsfnj.ru - GET /uploaded/[long base64 string with backslashes and underscores].pct
- port 80 - diwdjndsfnj.ru - POST /uploaded/[long base64 string with backslashes and underscores].dib
- port 80 - diwdjndsfnj.ru - GET /uploaded/[long base64 string with backslashes and underscores].pmg
- port 80 - iwqdndomdn.su - GET /uploaded/[long base64 string with backslashes and underscores].pmg
- port 80 - iwqdndomdn.su - POST /uploaded/[long base64 string with backslashes and underscores].dib


- port 80 - - GET /vnc32.rar
- port 80 - - GET /vnc64.rar
- port 80 - - GET /stilak32.rar
- port 80 - - GET /stilak64.rar
- port 80 - - GET /cook32.rar
- port 80 - - GET /cook64.rar

At first I thought it would be too simple to detect this malware family through the RAR files it downloads, but after searching several customer networks I monitor for several months I found these RAR files to be unique to this malware! I chose to include these file names in my detection methodology by looking for the regular expression: /\/(stilak|cook|vnc)(32|64)\.rar$/

The other detection methodology is to look for long URLs that are base64 encoded in the unique manner Gozi uses. First, Gozi uses a real word for the first URL subdirectory. It is “uploaded” in the sample above, but I’ve seen other words used here. This word is unimportant to the malware and can be ignored for our purposes.

Then, Gozi will base64 encode the encrypted C2 data and add several random forward slashes to make it look like a URL. Gozi will eventually remove these slashes when it decodes this C2 string.

In addition, Gozi encodes the base64 “+”, “/”, “\n”, and “\r” characters as “_2B”, “_2F”, “_0A”, and “_0D”. Putting all of that information together leads us to a regular expression of: /^\/\w+\/([a-zA-Z0-9\/]|_\/?2\/?F|_\/?2\/?B|_\/?0\/?A|_\/?0\/?D){200,}\.[a-zA-Z0-9]+$/

However, I saw some rare collisions with the prior regular expression that look like false positives on a customer network. Therefore, I only alert on URLs that have at least 10 forward slashes. When I added this condition to my filter, the false positive collisions went away. The full Unix find command combining both detection methodologies follows (note, use gawk on MacOS):

find /logs -name "http*" | parallel -j 10 zcat {} :::: - | zeek-cut host uri | awk -F '\t' '$2 ~ /\/(stilak|cook|vnc)(32|64)\.rar$/ || ($2 ~ /^\/\w+\/([a-zA-Z0-9\/]|_\/?2\/?F|_\/?2\/?B|_\/?0\/?A|_\/?0\/?D){200,}\.[a-zA-Z0-9]+$/ && gsub(/\//, "/", $2) > 10)'

You will see this regular expression detects 94 lines in the Zeek http.log file for our PCAP sample above:

$ cat http.log | zeek-cut host uri | gawk -F '\t' '$2 ~ /\/(stilak|cook|vnc)(32|64)\.rar$/ || ($2 ~ /^\/\w+\/([a-zA-Z0-9\/]|_\/?2\/?F|_\/?2\/?B|_\/?0\/?A|_\/?0\/?D){200,}\.[a-zA-Z0-9]+$/ && gsub(/\//, "/", $2) > 10)' | wc -l

We now need to move this logic into Zeek code so we can run it on a live sensor.

We will catch the URIs in Zeek’s HTTP request event. The HTTP request event is documented at:


Translating the logic from the find command naturally leads us to the rest of the code here:

module GoziMalwareDetector;

export {
	## Log stream identifier.
	redef enum Log::ID += {

	## The notice when the C2 is observed.
	redef enum Notice::Type += {

	## Record type containing the column fields of the log.
	type Info: record {
		## Timestamp for when the activity happened.
		ts: time &log;
		## Unique ID for the connection.
		uid: string &log;
		## The connection's 4-tuple of endpoint addresses/ports.
		id: conn_id &log;
		## The Gozi C2 HTTP method.
		http_method: string &log &optional;
		## The Gozi C2 command, still encoded and encrypted.
		payload: string &log &optional;

	## Default hook into Gozi logging.
	global log_gozi: event(rec: Info);

	## A default logging policy hook for the stream.
	global log_policy: Log::PolicyHook;

# Regex - make them globals so they are compiled only once!
global rar_regex = /.*\/(stilak|cook|vnc)(32|64)\.rar$/;
global b64_regex = /^\/[^[:blank:]]+\/([a-zA-Z0-9\/]|_\/?2\/?F|_\/?2\/?B|_\/?0\/?A|_\/?0\/?D){200,}\.[a-zA-Z0-9]+$/;

redef record connection += {
	gozi: Info &optional;

# Initialize logging state.
hook set_session(c: connection)
	if ( c?$gozi )

	c$gozi = Info($ts=network_time(), $uid=c$uid, $id=c$id);

function log_gozi_detected(c: connection)
	if ( ! c?$gozi )

	Log::write(GoziMalwareDetector::LOG, c$gozi);

	    $msg=fmt("Potential Gozi banking malware activity between source %s and dest %s with method %s and URI %s", c$id$orig_h, c$id$resp_h, c$gozi$http_method, c$gozi$payload),
	    $identifier=cat(c$id$orig_h, c$id$resp_h)]);

	delete c$gozi;

event http_request(c: connection, method: string, original_URI: string,
    unescaped_URI: string, version: string)
	hook set_session(c);

	local uri: string = to_lower(unescaped_URI);

	# We use the entropy check below to throw out long "normal" URIs that might make it through our checks.
	# Since the underlying Gozi C2 data is encrypted, entropy should be higher than "normal".  I chose this threshold based upon empirical tests.
	if ( uri == rar_regex || ( unescaped_URI == b64_regex && count_substr(unescaped_URI, "/") > 10 && find_entropy(unescaped_URI)$entropy > 4 ) ) {
		c$gozi$http_method = method;
		c$gozi$payload = unescaped_URI;

event zeek_init() &priority=5
	Log::create_stream(GoziMalwareDetector::LOG, [

After running my initial logic on a network for a while, I saw a handful of false positives with very deep URLs with many subdirectories.  Since Gozi C2 traffic is encrypted before it is base64 encoded, the entropy should be high on the C2 traffic.  Therefore, I added an entropy test on those base64 string candidates and only allow detections when the entropy is greater than 4 bits per character.

The next question becomes: How many of Gozi’s variants will this logic detect?

Good question.

I found some of the below variants of Gozi in Any.run, and when I spot checked the PCAPs Gozi was indeed detected:

  • Ursnif
  • Snifula
  • ISFB
  • Dreambot
  • Papras
  • sniful

You can view, download and install the full source code via Corelight’s repository here:


Below are examples of Gozi detections in Zeek logs. These logs are also found in Readme.md in the link above.


#separator \x09
#set_separator	,
#empty_field	(empty)
#unset_field	-
#path	gozi
#open	2023-07-24-19-00-13
#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	http_method	payload
#types	time	string	addr	port	addr	port	string	string
1689201023.169002	ClEkJM2Vm5giqnMf4h	49799	80	GET	/uploaded/8jvrTb2D/c4CxLmogLgZQGC_2FQ_2B2b/Ma3ylmhq8i/MeT_2Fmtq1zDpHZZQ/2OknTIetuvPf/SqlzkcwbzWM/aFx0b70stnXODu/WDQ2wUhiUaYRbirzPbAvc/2V_2Feb1BeDPQaU0/WZs_2FbMUKJ37c4/Gf5YYgB_2B8BS7mcYa/jECotoj7R/7bH1bkdIXbwbqpU0Nryv/_2BfEwnTZ0On333QjdJ/fdFWYGQpQofXObilmWG0P_/2BS5YP7Tcj18X/cQoSxxb6/FYqDyT3sva2N6amcI32HXsv/n6pffb_2FO/UxvFILD91uIk2oNQx/DEmgyRV_2FLi/3pF67pmCpVP/mFnn0G63A1Sv9N/05KeIqFG3zHYuPPOU/1P.pct
1689201024.619787	ClEkJM2Vm5giqnMf4h	49799	80	GET	/uploaded/IfjXSAQlkk/LWNR4tkuOMiXdy5H_/2F55TZ5Ulkmf/KZSJQNnlKNL/oXNw6o8qzjnHJh/42iAsB5AuQ8n_2FyMP9YN/iAK5Z4rM1kiabXH3/Tbf6F39oZk8a8mY/9CVAQTHkAlTev6Vstr/ir_2FniSr/BvZGpW5mVB5nr9pTRisy/xPxWcp7jW1doRbtDMhT/9qJZuk9sdFnLKTsj12Ald9/3B3TYfDuu18i7/7LNZfim3/7nt5T5HqNSKfJKcDB7vJpeR/BoPpPolfrA/Z_2F56SDXpQJw4TYJ/45pcw80raC9d/yeDiXFQDftm/RIxazSR_2BPJ0F/wL6l8KasDC6CAzVNLSL_2/F535_2BDNH1/tcGkkEMCZrS/i.pct
1689201025.545122	ClEkJM2Vm5giqnMf4h	49799	80	GET	/uploaded/MsY1SyBRpp3LKx93fMeITx/w33QoEW7tSfHh/8_2Fqcx1/mAOJzt3teF4AUCrc8IBMkbm/8laKdMc_2F/A0s_2B6tfYEvyAcP_/2Bk_2Buvkjcq/80G4xrhnFAN/TCW76WZrbPDSMY/_2FhKbl7f8pmHGBET0Y_2/BvcMUzja7B7xdYYK/qwpg5fbrMu6AsJ8/Dejh0cXcFfYe8h75Yx/Fn3LZFhMy/29bQzFKZyZHoHgfyV1XJ/Lw1D14ehZUiPYfQxTWv/LlBNcxqMhthgso7PYl8tLq/JhO8H_2FKdEez/1uHkLyhp/SD1X5u0VHgJvzUkDhpUUhKb/Q6cyGyHP3A/UPnKN8bJKuY7qHpRC/r94GeIuEWP3a/G9Zo2.pct
1689201032.086164	C4J4Th3PJpwUYZZ6gc	49800	80	GET	/uploaded/DjeJ0blPQ/_2BkfrDEoFQgD04wO2F7/Ojqqobto35jEVZ1IQyU/G7zu4_2BFUfhIMJcKkibbg/fjjRaEElICvmR/e5DoJnsG/vsx3T8eOiuXp0AlWknwttvf/A_2FrNprrb/bHnMsv4916Q0BUf_2/B3XIECBmUK_2/FW3G5XPXaPV/ySf6P_2BIXQe7C/q0IvZNIlHZt2c8lCjnMGY/BP81zPWMMzAUn3VS/Y_2BCg7CLJsM0vz/MloZ0Th38yNZOadE6L/qwrs9PKza/13Lw9jqWnbyh08rIXwcG/mMM0HdBwcj6NPi6_2FH/5qnQe2GM1T/ZuHEvxYT/j.pmg
1689201032.792223	CtPZjS20MLrsMUOJi2	49801	80	GET	/vnc32.rar
1689201033.823106	CtPZjS20MLrsMUOJi2	49801	80	GET	/vnc64.rar
1689201045.338855	C4J4Th3PJpwUYZZ6gc	49800	80	POST	/uploaded/3lFSQwjxUfg8HgrTtqSS/ZKopMRdt0Jtv6ehOunO/ppBgdtF3YUX5Co9W0vX2OQ/UNZWu2BHmWHLi/4Pta3IUW/h8js8gl66mR4P51kp_2B1rV/l_2BDjFJoW/DhHjLJGFTtgPfY5qz/0jnqt8GbX_2F/M2R1QPMjPhA/1LDVR2FItKJXdJ/1qIKPwGyBNh80d_2BqfF_/2F6OR9A8MzQ8A1Mu/qqkL0hlaa4U4qkx/TrAV_2BpGP_2B6FnQc/TZAppMESi/SnlO5AU6khpDGRiOveBi/R4uVGlbArQZk3cDBamb/rekwL.dib
1689201092.103798	C37jN32gN3y3AZzyf6	


#close	2023-07-24-19-00-14


#separator \x09
#set_separator	,
#empty_field	(empty)
#unset_field	-
#path	notice
#open	2023-07-24-19-00-13
#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	fuid	file_mime_type	file_desc	proto	note	msg	sub	src	dst	p	n	peer_descr	actions	email_dest	suppress_for	remote_location.country_code	remote_location.region	remote_location.city	remote_location.latitude	remote_location.longitude
#types	time	string	addr	port	addr	port	string	string	string	enum	enum	string	string	addr	addr	port	count	string	set[enum]	set[string]	interval	string	string	string	double	double
1689201023.169002	ClEkJM2Vm5giqnMf4h	49799	80	-	-	-	tcp	GoziMalwareDetector::GoziActivity	Potential Gozi banking malware activity between source and dest with method GET and URI /uploaded/8jvrTb2D/c4CxLmogLgZQGC_2FQ_2B2b/Ma3ylmhq8i/MeT_2Fmtq1zDpHZZQ/2OknTIetuvPf/SqlzkcwbzWM/aFx0b70stnXODu/WDQ2wUhiUaYRbirzPbAvc/2V_2Feb1BeDPQaU0/WZs_2FbMUKJ37c4/Gf5YYgB_2B8BS7mcYa/jECotoj7R/7bH1bkdIXbwbqpU0Nryv/_2BfEwnTZ0On333QjdJ/fdFWYGQpQofXObilmWG0P_/2BS5YP7Tcj18X/cQoSxxb6/FYqDyT3sva2N6amcI32HXsv/n6pffb_2FO/UxvFILD91uIk2oNQx/DEmgyRV_2FLi/3pF67pmCpVP/mFnn0G63A1Sv9N/05KeIqFG3zHYuPPOU/1P.pct	-	80	-	-	Notice::ACTION_LOG	(empty)	3600.000000	-	-	-	-	-
1689201032.792223	CtPZjS20MLrsMUOJi2	49801	80	-	-	-	tcp	GoziMalwareDetector::GoziActivity	Potential Gozi banking malware activity between source and dest with method GET and URI /vnc32.rar	-	80	-	-	Notice::ACTION_LOG	(empty)	3600.000000	-	-	-	-	-
1689204683.938169	CilF4d1b6woayAE906	50413	80	-	-	-	tcp	GoziMalwareDetector::GoziActivity	Potential Gozi banking malware activity between source and dest with method GET and URI /uploaded/bc1bm932Gi8AkB7CMdoi/dV6NGeXlcWdoFbsJnFC/FQZnIx7u0wQZs8gXewgk2O/_2BqEL_2F5vUH/VjpcgRrm/NgFQBF4d0ml2qUjPnz05CNh/WBeUsAyX8h/DYU_2Be6ioc_2FA9o/bkBn4zKxTE1N/1SebIT_2BB2/5hdp86s2tIvCxD/tZmUgcI6tODvZKdoJyys1/e4sK8m7GZS45MABe/ErGNCS3Es2MiUbX/8g2aKSm8DCj6uNwDbP/RnZ8rlx9a/lrUNLg4HRLMm6ysR4pwf/VtZBd5aLMF2Q_2FrqZr/qT7tLbEkySOgW4gIG9cEFr/b3tW5AqoOLBZY/_2FK4oG1/WJk_2B.pmg	-	80	-	-	Notice::ACTION_LOG	(empty)	3600.000000	-	-	-	-	-
#close	2023-07-24-19-00-14

Stay tuned to my blog and I will post updates if I find ways to improve the code.


Video Transcript:

00:00:00:00 – 00:00:23:05
Hey, today we’re going to talk about how to detect the Gozi banking malware with Zeek. And this is the third type of malware that I just randomly picked up and tried to detect with Zeek. So I’m calling this my Zeek roulette number three. All right, so what I want to tell you about this malware is that it’s been around a while.

00:00:23:06 – 00:00:48:20
It’s been, it’s pretty popular. I believe it was in 2020, it was one of the more it was like in the top three or five different malware infections out there. So getting a detection for this would be very useful for a lot of people. So what I did is I found a pcap online that had an infection in it and I studied it and did some online research and put together a detector.

00:00:48:20 – 00:01:12:04
And this video, I’m going to walk you through my methodology, good or bad. I’m going to walk you through my methodology of putting a detection together for Zeek, for the Gozi malware. And I will say Gozi has a lot of variants and spoiler alert, we’ll be able to detect variants too. So that will be a very cool side effect of what we’re putting together.

00:01:12:07 – 00:01:36:18
Okay, so what I’m going to do is switch your screen here and what you see on your screen is a draft blog that I’m putting together for this work that I’ve done. It’s likely going to change by the time this video is produced and published. So just be aware of that. I’m just using it to point out certain aspects of what I developed and the meat of what you’re seeing here should be in a blog.

00:01:36:18 – 00:02:05:23
It just may look a little different. Okay, so the very first place you’ll want to visit is this website that I’m putting my cursor on here, the malware traffic analysis website. And if you go to that website, there is a text file full of notes and there is a pcap there. And if you download both of them and the password is infected, you can open the zips and read the text file and you can look at the Pcap in Wireshark.

00:02:05:25 – 00:02:40:00
As you watch me do this video. This is the pcap that I used to write this logic. So if you want to follow along, this is the the content you’ll be seeing on your screen. Okay. So if you open the notes file from malware traffic analysis, you’re going to see what I have on your screen right there, which is some notes on the C2 traffic and some notes and the components that the malware downloads in order to have its logic or functionality.

00:02:40:03 – 00:03:03:21
This is written by the author of malware traffic analysis dot net. I just put it in here verbatim, so we had it as a reference. So there’s two types of traffic you’re going to see on the wire when Gozi is active. One is going to be its C2 and I’m going to highlight it here. That’s the top part.

00:03:03:23 – 00:03:28:06
And the other activity is going to be it downloading the different components that needs to run. And there’s like a VNC component and other components that I haven’t even gone into and studied deeply. I just know that it calls a bunch of components that we’re going to use that fact in our detector. So at the end of the day, what I’m going to do is I’m going to write a regular expression that’s going to detect all this stuff on the top.

00:03:28:13 – 00:03:52:12
And I’m going to write another regular expression that’s going to detect all this stuff on the bottom. Then I’m going to put them together with an or like logically an OR and it’s going to make one big regular expression that’s going to detect all this stuff. Okay. So let’s talk about the two different types of activity. The first, let’s talk about the easiest one first, the components.

00:03:52:12 – 00:04:26:17
And that’s the bottom part here. Now, when Gozi installs itself on a victim, it goes out on the Internet and downloads its components via HTTP. And when it does this, it makes it look like RAR files. But I’ll tell you, it’s not it’s not RAR files. If you were to run these files through Zeek and look at the mine types and all that kind of stuff, it doesn’t come up as rar it comes up with blank because it’s just encrypted binaries that are named to look like RAR files.

00:04:26:20 – 00:04:50:26
Now, I started looking at these RAR files first, because it was obviously the easiest part of the detection. And I said to myself, What are the chances that these RAR files are actually used in normal real traffic out there? That if I were to use them to detect the malware, I’d be running into false positives. So what I did is I went to a couple of customer networks I’m allowed to monitor and I search for

00:04:50:26 – 00:05:15:08
these RAR files. I totally expected to get a ton of hits because these are large universities. They have every type of traffic you would imagine. Nothing. Absolutely nothing. I was shocked. I went to another customer network search it. Nothing went to another customer network, searched it, nothing. So I said, you know what? This might be a good methodology to detect this malware.

00:05:15:08 – 00:05:41:08
Now, yes, the malware can change these names and we wouldn’t detect it, but we got two different methodologies and it’s going to be a lot harder for the malware to change its C2 traffic than it would be to just change these names. Okay. So the regular expression that I came up with for these RAR files is pretty short. It’s right down here, and I’m gonna highlight it for you.

00:05:41:11 – 00:06:05:03
So it’s looking for three words, and I’m going to mispronounce probably the first one. It’s stilak, cook and VNC. Then there’s a 32 or a 64, and then there’s a dot rar. I put a slash on the front, which I had to escape with a backward slash, and I put a dollar sign at the end, which means I want to find this RAR file at the end of the string.

00:06:05:03 – 00:06:33:05
And I don’t really care about what comes in front of that slash. So there can be other subdirectories up there. I don’t really care. I just want to find these RAR files. So that regular expression right there, that short, regular expression, will detect this component activity for Gozi. So that’s now done for us. Let’s focus our attention on this C2 traffic up here.

00:06:33:08 – 00:06:52:01
So there’s a couple of things you want to know out of here. And I’m going to point out to you right in the logs here, there is and this is all again, HTTP traffic. So you have your HTTP method like a get or a post and then the C2 traffic always starts out with some human readable word.

00:06:52:06 – 00:07:07:22
I’ve seen the word uploaded, I’ve seen other ones. I think it’s like zero to hero. I’ve seen just names like a person’s name, like Drew. It could be anything. And that’s actually not part of the C2. You throw

00:07:07:22 – 00:07:13:23
that part of the string away. I’m only pointing it out to let you know that you need to throw that part away.

00:07:13:25 – 00:07:36:22
The part we’re looking for is what I’m going to highlight here, and it’s between the brackets and it’s the other part of the URL. It’s going to have a base64 string and it’s going to have some backslashes and some underscores. Now, this is the notes from the person that runs malware traffic analysis, dot net. I will tell you, because I study this, that it’s not that simple.

00:07:36:29 – 00:07:59:28
You can’t just put underscores anywhere in there because Gozi uses underscores very specifically and we’re going to use that in our detector. So that way we’re only detecting Gozi and we’re trying to throw away anything else that could be false positives of just regular web traffic. The extension on here, all we need to know is that there is an extension.

00:07:59:28 – 00:08:25:01
The actual extension is not that important because I’ve seen it be a bunch of different types of extensions. So the fact that you see a PCT or a DIB or a PMG, not a big deal, just know that we’re going to be looking for an extension. Any extension. Okay, So when Gozi sends it C2 data, it encrypts it.

00:08:25:04 – 00:08:49:21
So there is an encryption key inside the Gozi malware that it uses and the encryption scheme that it uses and escapes me. I think it’s serpentine, but I could be thinking of a different piece of malware, but it uses a encryption algorithm, encrypts the data, then base64 the data. So you can imagine now it looks like base64 string.

00:08:49:24 – 00:09:29:24
So kind of random looking characters. Then it will put underscore is in there, but it uses underscores to encode base64 data. Okay, I know this is probably kind of hard to understand, but there’s two characters, the plus and the forward slash and base64 data that Gozi encodes using the underscore character and two other characters afterwards. So for instance if it’s the plus sign, if that’s what it was in Base64, Gozi would encode that as underscore 2B.

00:09:29:27 – 00:10:02:11
If it was the forward slash sign, it would encode it with underscore 2F, if it’s a new line underscore 0A and if it’s the carriage return, it’s underscore 0D. This is the reason why you can’t just look for random underscores because underscores are used very specifically in this manner for Gozi and this helps us find that needle in the haystack when we have a whole bunch of web traffic out there and we’re only looking for the Gozi C2 web traffic.

00:10:02:13 – 00:10:31:22
Now, I’m about to get more complex on you. So if we put a regular expression together for this big base64 string, which includes that human readable string up front and the slashes and the weird underscores, it’s going to get pretty big. And I’ll highlight it for you. Now, what this regular expression is saying is up front, it’s saying, look at the beginning match at the beginning of the string.

00:10:31:23 – 00:11:06:21
That’s what the carat means. Look for a slash, it’s actually escaped with the backslash look for the word and that’s just slash little W plus. So just look for characters that represent a word. And this is the uploaded in our string above. Here, I’ll show you up here. You see that uploaded? That’s what we’re matching with that word. Okay, so then we start looking for the base64 characters, and that’s just a-z lowercase uppercase in digits.

00:11:06:24 – 00:11:34:04
We’re not looking for the plus because that’s encoded. We are looking for the forward slash because Gozi just randomly throws that in the URL just in random spots. Now here’s the caveat. It can throw it that forward slash in where we have our underscore or 2B 2F 0A and 0D, so that makes our regular expression a little bigger.

00:11:34:06 – 00:12:08:02
And what I’m doing here is I’m showing you I’m doing an or and you start to see my underscore and then you see this slash question mark and it’s escaped with the backslash. What that says is there could be a slash here and the question mark says or not, and then you’ve got the two and then you’ve got a possible slash and then you got the F, and then you have an or and then you have the underscore possible slash 2 possible slash B or and it continues on for all the underscore possibilities.

00:12:08:04 – 00:12:38:19
So that gets us through that part of the regular expression. So over here you see 200. Now I did analysis and a lot of web traffic and I saw that Gozi tended to have more than 200 characters in it for the URL, and normal web traffic would tend to have less than 200 characters. So I use that my detector and I said it needs at least 200 characters.

00:12:38:21 – 00:13:01:18
And then I go on to say I need some type of file extension, just any file extension. And I made it alphanumeric and it’s got the dot in front of it. And then I put the dollar sign at the end, which says this should be the end of the string. All right. So what I did is I put a find command together and I’m going to show you that in a second.

00:13:01:21 – 00:13:25:23
But I put a find command together and ran it through parallel. And I ran this regular expression, actually, both of these regular depression across a ton of Zeek logs that I had available on customer networks. You may want to do that for yourself. I’ve included this command for you and is right here. I’ll highlight it for you. Now, a couple of caveats.

00:13:25:23 – 00:14:03:21
The first time I ran this, I pretty much almost had zero false positives except for about five lines worth of false positives. And I was like, Shoot, how do I get these false positives out of my hits? And I started studying the false positive versus the Gozi traffic and I noticed that real traffic doesn’t have as many subdirectories as Gozi traffic because Gozi throws in a ton of slashes just randomly to make it look like web traffic and it throws in like 20 or 30 of them usually, which is a lot more than normal web traffic will usually use.

00:14:03:23 – 00:14:37:15
So what I did is I went and I looked for more slashes than normal and I said, that number is ten or more. And I just found that empirically, by looking at data on a network, How did I do that? Well, let me explain this command to you. Now we’ve got our find command that will look in slash logs, pull out all your HTTP logs, pass the log names to the parallel command, which runs ten jobs and Zcats

00:14:37:15 – 00:15:00:10
your logs. And this says I want to read my input from standard in, which is over here. We run our data through Zeek cut and we pull out just the host and the URI, and then we dump it through AWK and tell you I know enough to be dangerous and I’m going to teach you some things. I’m not a guru or an expert in it.

00:15:00:10 – 00:15:04:15
I just know how to do some certain things to pull data for me

00:15:04:15 – 00:15:22:10
to see what I need to see in data. So I’ll show you something that there may be a better way for. And if you know of a better way, please do tell me I’ll make my blogs better. But I had to add the ability to look for more slashes when our base64 string was long.

00:15:22:18 – 00:15:58:01
So that way we’re only picking out the Gozi activity and not some false positive HTTP activity. So before we get to that, let me just walk through my command real quick for you. This says run awk and my delimiter is tabs. Look at my second field which is our URI remember we cut it out run this regular expression go right to there. This regular expression is the same regular expression that we talked about earlier.

00:15:58:03 – 00:16:28:19
It’s the two regular expressions and we can continue on with or we combine them with the or Boolean operation. So one or the other should be possible in order for us to say this is Gozi activity. Now what I did is I added an ampersand on the base 64 side of the regular expression and said gsub, which is a global substitution.

00:16:28:19 – 00:16:36:20
And I said, Look for forward slashes, which in regular expression language, that’s how you say forward slash because you got to escape it,

00:16:36:20 – 00:16:52:12
replace it with a forward slash. So basically do nothing. I do realize it’s probably doing something in the CPU, but for our purposes of the string, it’s doing nothing and we’re looking at the second field.

00:16:52:14 – 00:17:24:15
So we’re now counting the number for forward slashes in our URI. And we only detect when there’s more than ten of them. That’s what that’s doing for you. So if I take that logic and I go back to that pcap at the very beginning of this video that I said go download this from malware traffic analysis, if I run that through Zeek, I get an HTTP log, then I run it through this logic, just like I showed you above in that box above.

00:17:24:17 – 00:17:52:06
And I will tell you I ran this on macos, so I recommend running GAWK. So that way you’re on par with other Linux systems because it’ll be exactly the same. And I plugged in my regular expression, but you look here and I am finishing the command with a wc dash L which says count the number of lines that come out.

00:17:52:09 – 00:18:18:13
So what this whole command should do is detect any Gozi activity in HTTP log and count the number of lines. And when that happens, it says you have 94 lines that match. Yay. So it means we matched Gozi on our on our example pcap, which is what we need to do. So now that we have it working in Unix land we need to get this working in Zeek land.

00:18:18:15 – 00:18:41:26
So to do that, what we need to do is watch HTTP traffic and look for those URLs using those regular expressions that we talked about. You could do that via the event http_requests and I put the link for you here if you want to go read more about it. And then I just went through the find command and made it into Zeek language.

00:18:41:26 – 00:19:10:00
And I’m going to go through this pretty quickly because a lot of this, like most of this is boilerplate. I name the module GoziMalwareDetector. We have the module exporting a bunch of stuff. We have a log file that’s created, this is going to be your Gozi.log, it’s output is going to match this format. And the output record is pretty simple as all the usual suspects of the connection information are up front.

00:19:10:00 – 00:19:26:19
And then you have the HTTP method and the payload, which would be those RAR URLs or the base64 URLs. We have a notice. So not only do we make a log file, but we will fire a notice if we see Gozi as well.

00:19:26:22 – 00:19:49:21
Now, down here I gave you the Gozi logging event, so that way if you want to catch the information before it went to a log, you could via an event. And if you want to mess with the filtering on the log, you can do that through the logging policy. Again, this is all boilerplate Zeek code stuff, more boilerplate.

00:19:49:21 – 00:20:18:24
We are going to create a Gozi record on the connection record using our info record. The set_session is like most other set_session hooks that you’ve probably seen before that populates this record on the connection record. Now the next function. This is really the meat of what we did. This will log our Gozi detection and what it says is if we don’t have Gozi data, don’t do anything.

00:20:18:26 – 00:20:48:09
And if we do send it to the log and generate a notice. So both of the log and the notice will be generated, then the data on that temporary record that we created will get deleted and we’re done. The other event that’s the meat of what we added is this http_request event. Now here, like most Zeek boilerplate we set session.

00:20:48:09 – 00:20:56:03
So it sets up that temporary record for us. I do a little trickery here to make our URL lowercase so that way

00:20:56:03 – 00:21:15:01
one of my regular expressions only needs to be in lowercase and then I run the regular expression through. So this is it right here. And you can see I take the lowercase of the URL and I run that RAR regular expression through there and I don’t have uppercase of anything.

00:21:15:01 – 00:21:16:05
And that automatically

00:21:16:05 – 00:21:43:14
takes care of any permutations in case sensitivity on those words. Then we or it with the base64, regular expression. And there it’s mostly the same, except a couple of changes. One is the dash W doesn’t exist in Zeek, but there is this blank character class. And I said, Don’t use blank.

00:21:43:14 – 00:22:07:07
That’s what the carat is there for. So I’m basically kind of faking the slash w in Zeek land with this character class. And the other one is we don’t have to replace a string in order to count it. We can actually call the function called count _substring and look for the number of slashes and only detect when it’s greater than ten.

00:22:07:09 – 00:22:31:12
So a little simpler in Zeek land. Now down here, all this does is it sets up the info record and then it sends it out to our function that logs it and then it returns. Pretty simple. This event down here, all that does is it sets up our log, our gozi.log with the logging policy and the logging event.

00:22:31:15 – 00:22:59:16
So immediately you should say, hey, how many variants does this detect? And I will say good question. I tested about six that I could find on any.run and it detected all of them because they all use that base64 formatted string for their C2. So I invite you to go on any.run. Look for these different types of malware, download their Pcaps and run them through here and see what tdetects and what doesn’t detect.

00:22:59:20 – 00:23:23:18
If you find something that misses, hey, let me know. We’ll try to add another detection to it. Now, if you really don’t care about all the stuff that I told you and you just want to get to the source code, I got that for you to on our, I say our, the Corelight GitHub accounts under Zeek Gozi detector. I put a couple examples of the logs here so we could talk through them.

00:23:23:21 – 00:23:47:12
Now, again, I told you there was a Gozi dot log that’s has lines generated every time there is a Gozi activity seen on the network. And it looks like your usual Zeek log with usual suspects up front. And what I did is I tried to highlight the relevant information for you, and you can see the big base64 strings are all highlighted here.

00:23:47:15 – 00:24:08:07
And that’s to point out, these are the strings that we were able to detect as Gozi activity on the network. Now, if you look down here, you can see we also detected the RAR files. So there’s a VNC32 and VNC64 dot RAR also part of the Gozi malware infection. And you go down here and you see this triple dot and you’re like, why is that there

00:24:08:07 – 00:24:41:26
Keith? And that’s because we had 94 lines of detection. So most of them are these really long base64 strings. I didn’t want to keep copying and copying the data in here and have you have to scroll down my blog. So I just I’m telling you, there’s a lot of data here that you can look through. Now, our notice log looks like your general notice log, and I highlighted the human readable string for you that says there’s potential Gozi banking malware activity between this source and this destination with this method and this URL.

00:24:41:28 – 00:25:09:26
And you can see the long base64 string and you can also see the RAR files down here. Now, I have all the references for everything that I’ve used to build this methodology and you can flip through them. I have the malware traffic analysis original site where I got the pcaps. I’ve got Palo Alto Networks has a write up on this.

00:25:09:26 – 00:25:34:28
So does Medium, so does Microsoft, so does BlackBerry. Now, there are two tools that I have in the middle here that I highlighted for you. These tools, if you know the encryption key to the traffic you’ve seen on your network, you can run the data that I’ve put into Gozi dot log for you through these tools and potentially decrypt the C2 traffic, which is pretty cool.

00:25:35:00 – 00:25:53:18
I will say the malware author can change the encryption keys. So the very first thing you should do is go out to the Internet and pull all the default keys for Gozi and its variants and try those and if none of those work, then you probably have to analyze the malware and pull out what the true key is.

00:25:53:21 – 00:26:17:16
But I did when I was doing analysis out there, I did see people have lists of they would put lists of potential defaults, Gozi encryption passwords out there. And if I remember correctly, the one of these projects has a listing in the Python code too. Okay. And then there’s a bunch of any.run links down here and I put those there

00:26:17:16 – 00:26:43:20
so if you want to look at some variants and then want to do the searches yourself, these are a bunch of different variants I took a look at and ran it through the detector and they came out as detected. So with that, I really hope you got something out of this. It was a fun project to put together and if you know of any better ways to do that forward slash searching in AWK, please let me know.

00:26:43:22 – 00:26:49:15
And otherwise I hope to see on the next Zeek detection that I write. All right. Thanks. Bye.

Leave a Reply

Your email address will not be published. Required fields are marked *