- Tên Ebook: Antivirus-Hackers-Handbook-Joxean-Koret
- Loại file: PDF
- Dung lượng: 5 MB
- Số trang:
LINH TẢI:
TRÍCH DẪN:
www.it-ebooks.info www.it-ebooks.info The Antivirus Hacker's Handbook www.it-ebooks.info www.it-ebooks.info The Antivirus Hacker's Handbook Joxean Koret Elias Bachaalany www.it-ebooks.info The Antivirus Hacker's Handbook Published by John Wiley & Sons, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright © 2015 by John Wiley & Sons, Inc., Indianapolis, Indiana Published simultaneously in Canada ISBN: 978-1-119-02875-8 ISBN: 978-1-119-02876-5 (ebk) ISBN: 978-1-119-02878-9 (ebk) Manufactured in the United States of America 10 9 8 7 6 5 4 3 2 1 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permis- sion of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley .com/go/permissions. Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or war- ranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disappeared between when this work was written and when it is read. For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002. Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com. Library of Congress Control Number: 2015945503 Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book. www.it-ebooks.info About the Authors Joxean Koret has been working for the past +15 years in many different com- puting areas. He started as a database software developer and DBA, working with a number of different RDBMSs. Afterward he got interested in reverse- engineering and applied this knowledge to the DBs he was working with. He has discovered dozens of vulnerabilities in products from the major database vendors, especially in Oracle software. He also worked in other security areas, such as developing IDA Pro at Hex-Rays or doing malware analysis and anti- malware software development for an antivirus company, knowledge that was applied afterward to reverse-engineer and break over 14 AV products in roughly one year. He is currently a security researcher in Coseinc. Elias Bachaalany has been a computer programmer, a reverse-engineer, an occa- sional reverse-engineering trainer, and a technical writer for the past 14 years. Elias has also co-authored the book Practical Reverse Engineering, published by Wiley (ISBN: 978-111-8-78731-1). He has worked with various technologies and programming languages including writing scripts, doing web development, working with database design and programming, writing Windows device drivers and low-level code such as boot loaders or minimal operating systems, writing managed code, assessing software protections, and writing reverse- engineering and desktop security tools. Elias has also presented twice at REcon Montreal (2012 and 2013). While working for Hex-Rays SA in Belgium, Elias helped improve and add new features to IDA Pro. During that period, he authored various technical blog posts, provided IDA Pro training, developed various debugger plug-ins, amped up IDA Pro's scripting facilities, and contributed to the IDAPython project. Elias currently works at Microsoft. www.it-ebooks.info v www.it-ebooks.info Credits Project Editor Sydney Argenta Technical Editor Daniel Pistelli Production Editor Saleem Hameed Sulthan Copy Editor Marylouise Wiack Manager of Content Development & Assembly Mary Beth Wakefield Production Manager Kathleen Wisor Marketing Director David Mayhew Marketing Manager Carrie Sherrill Professional Technology & Strategy Director Barry Pruett Business Manager Amy Knies Associate Publisher Jim Minatel Project Coordinator, Cover Brent Savage Proofreader Nicole Hirschman Indexer Nancy Guenther Cover Designer Wiley Cover Image Wiley; Shield © iStock.com/DSGpro www.it-ebooks.info vii www.it-ebooks.info Acknowledgments I would like to acknowledge Mario Ballano, Ruben Santamarta, and Victor Manual Alvarez, as well as all my friends who helped me write this book, shared their opinions and criticisms, and discussed ideas. I am most thankful to my girlfriend for her understanding and support during the time that I spent on this book. Many thanks to Elias Bachaalany; without his help, this book would not have been possible. Also, special thanks to everyone at Wiley; it has been a great pleasure to work with you on this book. I am grateful for the help and support of Daniel Pistelli, Carol Long, Sydney Argenta, Nicole Hirschman, and Marylouise Wiack. www.it-ebooks.info ix www.it-ebooks.info Contents at a Glance Introduction xix Part I Antivirus Basics 1 Chapter 1 Introduction to Antivirus Software 3 Chapter 2 Reverse-Engineering the Core 15 Chapter 3 The Plug-ins System 57 Chapter 4 Understanding Antivirus Signatures 77 Chapter 5 The Update System 87 Part II Antivirus Software Evasion 103 Chapter 6 Antivirus Software Evasion 105 Chapter 7 Evading Signatures 117 Chapter 8 Evading Scanners 133 Chapter 9 Evading Heuristic Engines 165 Chapter 10 Identifying the Attack Surface 183 Chapter 11 Denial of Service 207 Part III Analysis and Exploitation 217 Chapter 12 Static Analysis 219 Chapter 13 Dynamic Analysis 235 Chapter 14 Local Exploitation 269 Chapter 15 Remote Exploitation 297 www.it-ebooks.info xi xii Contents at a Glance Part IV Current Trends and Recommendations 321 Chapter 16 Current Trends in Antivirus Protection 323 Chapter 17 Recommendations and the Possible Future 331 Index 347 www.it-ebooks.info Contents Introduction xix Part I Antivirus Basics 1 Chapter 1 Introduction to Antivirus Software 3 What Is Antivirus Software? 3 Antivirus Software: Past and Present 4 Antivirus Scanners, Kernels, and Products 5 Typical Misconceptions about Antivirus Software 6 Antivirus Features 7 Basic Features 7 Making Use of Native Languages 7 Scanners 8 Signatures 8 Compressors and Archives 9 Unpackers 10 Emulators 10 Miscellaneous File Formats 11 Advanced Features 11 Packet Filters and Firewalls 11 Self-Protection 12 Anti-Exploiting 12 Summary 13 Chapter 2 Reverse-Engineering the Core 15 Reverse-Engineering Tools 15 Command-Line Tools versus GUI Tools 16 Debugging Symbols 17 Tricks for Retrieving Debugging Symbols 17 Debugging Tricks 20 www.it-ebooks.info xiii xiv Contents Backdoors and Configuration Settings 21 Kernel Debugging 23 Debugging User-Mode Processes with a Kernel-Mode Debugger 25 Analyzing AV Software with Command-Line Tools 27 Porting the Core 28 A Practical Example: Writing Basic Python Bindings for Avast for Linux 29 A Brief Look at Avast for Linux 29 Writing Simple Python Bindings for Avast for Linux 32 The Final Version of the Python Bindings 37 A Practical Example: Writing Native C/C++ Tools for Comodo Antivirus for Linux 37 Other Components Loaded by the Kernel 55 Summary 56 Chapter 3 The Plug-ins System 57 Understanding How Plug-ins Are Loaded 58 A Full-Featured Linker in Antivirus Software 58 Understanding Dynamic Loading 59 Advantages and Disadvantages of the Approaches for Packaging Plug-ins 60 Types of Plug-ins 62 Scanners and Generic Routines 63 File Format and Protocol Support 64 Heuristics 65 Bayesian Networks 66 Bloom Filters 67 Weights-Based Heuristics 68 Some Advanced Plug-ins 69 Memory Scanners 69 Non-native Code 70 Scripting Languages 72 Emulators 73 Summary 74 Chapter 4 Understanding Antivirus Signatures 77 Typical Signatures 77 Byte-Streams 78 Checksums 78 Custom Checksums 79 Cryptographic Hashes 80 Advanced Signatures 80 Fuzzy Hashing 81 Graph-Based Hashes for Executable Files 83 Summary 85 www.it-ebooks.info Contents xv Chapter 5 The Update System 87 Understanding the Update Protocols 88 Support for SSL/TLS 89 Verifying the Update Files 91 Dissecting an Update Protocol 92 When Protection Is Done Wrong 100 Summary 101 Part II Antivirus Software Evasion 103 Chapter 6 Antivirus Software Evasion 105 Who Uses Antivirus Evasion Techniques? 106 Discovering Where and How Malware Is Detected 107 Old Tricks for Determining Where Malware Is Detected: Divide and Conquer 107 Evading a Simple Signature-Based Detection with the Divide and Conquer Trick 108 Binary Instrumentation and Taint Analysis 113 Summary 114 Chapter 7 Evading Signatures 117 File Formats: Corner Cases and Undocumented Cases 118 Evading a Real Signature 118 Evasion Tips and Tricks for Specific File Formats 124 PE Files 124 JavaScript 126 String Encoding 127 Executing Code on the Fly 128 Hiding the Logic: Opaque Predicates and Junk Code 128 PDF 129 Summary 131 Chapter 8 Evading Scanners 133 Generic Evasion Tips and Tricks 133 Fingerprinting Emulators 134 Advanced Evasion Tricks 136 Taking Advantage of File Format Weaknesses 136 Using Anti-emulation Techniques 137 Using Anti-disassembling Techniques 142 Disrupting Code Analyzers through Anti-analysis 144 More Anti-Anti-Anti... 147 Causing File Format Confusion 148 Automating Evasion of Scanners 148 Initial Steps 149 Installing ClamAV 150 Installing Avast 150 Installing AVG 151 www.it-ebooks.info xvi Contents Installing F-Prot 152 Installing Comodo 153 Installing Zoner Antivirus 154 MultiAV Configuration 154 peCloak 158 Writing the Final Tool 160 Summary 162 Chapter 9 Evading Heuristic Engines 165 Heuristic Engine Types 165 Static Heuristic Engines 166 Bypassing a Simplistic Static Heuristic Engine 166 Dynamic Heuristic Engines 173 Userland Hooks 173 Bypassing a Userland HIPS 176 Kernel-Land Hooks 178 Summary 180 Chapter 10 Identifying the Attack Surface 183 Understanding the Local Attack Surface 185 Finding Weaknesses in File and Directory Privileges 185 Escalation of Privileges 186 Incorrect Privileges in Files and Folders 186 Incorrect Access Control Lists 187 Kernel-Level Vulnerabilities 187 Exotic Bugs 188 Exploiting SUID and SGID Binaries on Unix-Based Platforms 189 ASLR and DEP Status for Programs and Binaries 190 Exploiting Incorrect Privileges on Windows Objects 193 Exploiting Logical Flaws 196 Understanding the Remote Attack Surface 197 File Parsers 198 Generic Detection and File Disinfection Code 199 Network Services, Administration Panels, and Consoles 199 Firewalls, Intrusion Detection Systems, and Their Parsers 200 Update Services 201 Browser Plug-ins 201 Security Enhanced Software 202 Summary 203 Chapter 11 Denial of Service 207 Local Denial-of-Service Attacks 208 Compression Bombs 208 Creating a Simple Compression Bomb 209 Bugs in File Format Parsers 212 Attacks against Kernel Drivers 213 Remote Denial-of-Service Attacks 214 Compression Bombs 214 Bugs in File Format Parsers 215 Summary 215 www.it-ebooks.info Contents xvii Part III Analysis and Exploitation 217 Chapter 12 Static Analysis 219 Performing a Manual Binary Audit 219 File Format Parsers 220 Remote Services 228 Summary 233 Chapter 13 Dynamic Analysis 235 Fuzzing 235 What Is a Fuzzer? 236 Simple Fuzzing 237 Automating Fuzzing of Antivirus Products 239 Using Command-Line Tools 240 Porting Antivirus Kernels to Unix 243 Fuzzing with Wine 244 Problems, Problems, and More Problems 247 Finding Good Templates 248 Finding Template Files 250 Maximizing Code Coverage 252 Blind Code Coverage Fuzzer 253 Using Blind Code Coverage Fuzzer 254 Nightmare, the Fuzzing Suite 259 Configuring Nightmare 260 Finding Samples 262 Configuring and Running the Fuzzer 262 Summary 266 Chapter 14 Local Exploitation 269 Exploiting Backdoors and Hidden Features 270 Finding Invalid Privileges, Permissions, and ACLs 274 Searching Kernel-Land for Hidden Features 279 More Logical Kernel Vulnerabilities 285 Summary 295 Chapter 15 Remote Exploitation 297 Implementing Client-Side Exploitation 297 Exploiting Weakness in Sandboxing 297 Exploiting ASLR, DEP, and RWX Pages at Fixed Addresses 298 Writing Complex Payloads 300 Taking Advantage of Emulators 301 Exploiting Archive Files 302 Finding Weaknesses in Intel x86, AMD x86_64, and ARM Emulators 303 Using JavaScript, VBScript, or ActionScript 303 Determining What an Antivirus Supports 304 Launching the Final Payload 306 Exploiting the Update Services 307 Writing an Exploit for an Update Service 308 Server-Side Exploitation 317 www.it-ebooks.info xviii Contents Differences between Client-Side and Server-Side Exploitation 317 Exploiting ASLR, DEP, and RWX Pages at Fixed Addresses 318 Summary 318 Part IV Current Trends and Recommendations 321 Chapter 16 Current Trends in Antivirus Protection 323 Matching the Attack Technique with the Target 324 The Diversity of Antivirus Products 324 Zero-Day Bugs 324 Patched Bugs 325 Targeting Home Users 325 Targeting Small to Medium-Sized Companies 326 Targeting Governments and Big Companies 326 The Targets of Governments 327 Summary 328 Chapter 17 Recommendations and the Possible Future 331 Recommendations for Users of Antivirus Products 331 Blind Trust Is a Mistake 332 Isolating Machines Improves Protection 337 Auditing Security Products 338 Recommendations for Antivirus Vendors 338 Engineering Is Different from Security 339 Exploiting Antivirus Software Is Trivial 339 Perform Audits 340 Fuzzing 340 Use Privileges Safely 341 Reduce Dangerous Code in Parsers 342 Improve the Safety of Update Services and Protocols 342 Remove or Disable Old Code 343 Summary 344 Index 347 www.it-ebooks.info Introduction Welcome to The Antivirus Hacker's Handbook. With this book, you can increase your knowledge about antivirus products and reverse-engineering in general; while the reverse-engineering techniques and tools discussed in this book are applied to antivirus software, they can also be used with any other software products. Security researchers, penetration testers, and other information secu- rity professionals can benefit from this book. Antivirus developers will benefit as well because they will learn more about how antivirus products are analyzed, how they can be broken into parts, and how to prevent it from being broken or make it harder to break. I want to stress that although this book is, naturally, focused on antivirus products, it also contains practical examples that show how to apply reverse-engineering, vulnerability discovery, and exploitation techniques to real-world applications. Overview of the Book and Technology This book is designed for individuals who need to better understand the func- tionality of antivirus products, regardless of which side of the fence they are on: offensive or defensive. Its objective is to help you learn when and how specific techniques and tools should be used and what specific parts of antivirus prod- ucts you should focus on, based on the specific tasks you want to accomplish. This book is for you if any of the following statements are true: ■ You want to learn more about the security of antivirus products. ■ You want to learn more about reverse-engineering, perhaps with the aim of reverse-engineering antivirus products. ■ You want to bypass antivirus software. ■ You want to break antivirus software into pieces. www.it-ebooks.info xix xx Introduction ■ You want to write exploits for antivirus software. ■ You want to evaluate antivirus products. ■ You want to increase the overall security of your own antivirus products, or you want to know how to write security-aware code that will deal with hostile code. ■ You love to tinker with code, or you want to expand your skills and knowledge in the information security field. How This Book Is Organized The contents of this book are structured as follows: ■ Chapter 1, "Introduction to Antivirus Software"—Guides you through the history of antivirus software to the present, and discusses the most typical features available in antivirus products, as well as some less com- mon ones. ■ Chapter 2, "Reverse-Engineering the Core"—Describes how to reverse- engineer antivirus software, with tricks that can be used to debug the software or disable its self-protection mechanisms. This chapter also discusses how to apply this knowledge to create Python bindings for Avast for Linux, as well as a native C/C++ tool and unofficial SDK for the Comodo for Linux antivirus. ■ Chapter 3, "The Plug-ins System"—Discusses how antivirus products use plug-ins, how they are loaded, and how they are distributed, as well as the purpose of antivirus plug-ins. ■ Chapter 4, "Understanding Antivirus Signatures"—Explores the most typical signature types used in antivirus products, as well as some more advanced ones. ■ Chapter 5, "The Update System"—Describes how antivirus software is updated, how the update systems are developed, and how update pro- tocols work. This chapter concludes by showing a practical example of how to reverse-engineer an easy update protocol. ■ Chapter 6, "Antivirus Software Evasion"—Gives a basic overview of how to bypass antivirus software, so that files can evade detection. Some general tricks are discussed, as well as techniques that should be avoided. ■ Chapter 7, "Evading Signatures"—Continues where Chapter 4 left off and explores how to bypass various kinds of signatures. ■ Chapter 8, "Evading Scanners"—Continues the discussion of how to bypass antivirus products, this time focusing on scanners. This chapter looks at how to bypass some static heuristic engines, anti-disassembling, anti-emulation, and other "anti-" tricks, as well as how to write an auto- matic tool for portable executable file format evasion of antivirus scanners. www.it-ebooks.info Introduction xxi ■ Chapter 9, "Evading Heuristic Engines"—Finishes the discussion on evasion by showing how to bypass both static and dynamic heuristic engines implemented by antivirus products. ■ Chapter 10, "Identifying the Attack Surface"—Introduces techniques used to attack antivirus products. This chapter will guide you through the process of identifying both the local and remote attack surfaces exposed by antivirus software. ■ Chapter 11, "Denial of Service"—Starts with a discussion about perform- ing denial-of-service attacks against antivirus software. This chapter dis- cusses how such attacks can be launched against antivirus products both locally and remotely by exploiting their vulnerabilities and weaknesses. ■ Chapter 12, "Static Analysis"—Guides you through the process of stati- cally auditing antivirus software to discover vulnerabilities, including real-world vulnerabilities. ■ Chapter 13, "Dynamic Analysis"—Continues with the discussion of finding vulnerabilities in antivirus products, but this time using dynamic analysis techniques. This chapter looks specifically at fuzzing, the most popular technique used to discover vulnerabilities today. Throughout this chapter, you will learn how to set up a distributed fuzzer with central administration to automatically discover bugs in antivirus products and be able to analyze them. ■ Chapter 14, "Local Exploitation"—Guides you through the process of exploiting local vulnerabilities while putting special emphasis on logical flaws, backdoors, and unexpected usages of kernel-exposed functionality. ■ Chapter 15, "Remote Exploitation"—Discusses how to write exploits for memory corruption issues by taking advantage of typical mistakes in antivirus products. This chapter also shows how to target update services and shows a full exploit for one update service protocol. ■ Chapter 16, "Current Trends in Antivirus Protection"—Discusses which antivirus product users can be targeted by actors that use flaws in anti- virus software, and which users are unlikely to be targeted with such techniques. This chapter also briefly discusses the dark world in which such bugs are developed. ■ Chapter 17, "Recommendations and the Possible Future"—Concludes this book by making some recommendations to both antivirus users and antivirus vendors, and discusses which strategies can be adopted in the future by antivirus products. Who Should Read This Book This book is designed for individual developers and reverse-engineers with intermediate skills, although the seasoned reverse-engineer will also benefit www.it-ebooks.info xxii Introduction from the techniques discussed here. If you are an antivirus engineer or a mal- ware reverse-engineer, this book will help you to understand how attackers will try to exploit your software. It will also describe how to avoid undesirable situations, such as exploits for your antivirus product being used in targeted attacks against the users you are supposed to protect. More advanced individuals can use specific chapters to gain additional skills and knowledge. As an example, if you want to learn more about writing local or remote exploits for antivirus products, proceed to Part III, "Analysis and Exploitation," where you will be guided through almost the entire process of discovering an attack surface, finding vulnerabilities, and exploiting them. If you are interested in antivirus evasion, then Part II, "Antivirus Software Evasion," is for you. So, whereas some readers may want to read the book from start to finish, there is nothing to prevent you from moving around as needed. Tools You Will Need Your desire to learn is the most important thing you have as you start to read this book. Although I try to use open-source "free" software, this is not always possible. For example, I used the commercial tool IDA in a lot of cases; because antivirus programs are, with only one exception, closed-source commercial products, you need to use a reverse-engineering tool, and IDA is the de facto one. Other tools that you will need include compilers, interpreters (such as Python), and some tools that are not open source but that can be freely downloaded, such as the Sysinternals tools. What's on the Wiley Website To make it as easy as possible for you to get started, some of the basic tools you will need are available on the Wiley website, which has been set up for this book at www.wiley.com/go/antivirushackershandbook. Summary (From Here, Up Next, and So On) The Antivirus Hacker's Handbook is designed to help readers become aware of what antivirus products are, what they are not, and what to expect from them; this information is not usually available to the public. Rather than discussing how antivirus products work in general, it shows real bugs, exploits, and tech- niques for real-world products that you may be using right now and provides real-world techniques for evasion, vulnerability discovery, and exploitation. Learning how to break antivirus software not only helps attackers but also helps you to understand how antivirus products can be enhanced and how antivirus users can best protect themselves. www.it-ebooks.info Part I Antivirus Basics In This Part Chapter 1: Introduction to Antivirus Software Chapter 2: Reverse-Engineering the Core Chapter 3: The Plug-ins System Chapter 4: Understanding Antivirus Signatures Chapter 5: The Update System www.it-ebooks.info www.it-ebooks.info CHAPTER 1 Introduction to Antivirus Software Antivirus software is designed to prevent computer infections by detecting malicious software, commonly called malware, on your computer and, when appropriate, removing the malware and disinfecting the computer. Malware, also referred to as samples in this book, can be classified into various kinds, namely, Trojans, viruses (infectors), rootkits, droppers, worms, and so on. This chapter covers what antivirus (AV) software is and how it works. It offers a brief history of AV software and a short analysis of how it evolved over time. What Is Antivirus Software? Antivirus software is special security software that aims to give better protec- tion than that offered by the underlying operating system (such as Windows or Mac OS X). In most cases, it is used as a preventive solution. However, when that fails, the AV software is used to disinfect the infected programs or to completely clean malicious software from the operating system. AV software uses various techniques to identify malicious software, which often self-protects and hides deep in an operating system. Advanced malware may use undocumented operating system functionality and obscure techniques in order to persist and avoid being detected. Because of the large attack surface these days, AV software is designed to deal with all kinds of malicious payloads coming from both trusted and untrusted sources. Some malicious inputs that www.it-ebooks.info 3 Antivirus 4 Part I ■ Basics AV software tries to protect an operating system from, with varying degrees of success, are network packets, email attachments, and exploits for browsers and document readers, as well as executable programs running on the operat- ing system. Antivirus Software: Past and Present The earliest AV products were simply called scanners because they were command- line scanners that tried to identify malicious patterns in executable programs. AV software has changed a lot since then. For example, many AV products no longer include command-line scanners. Most AV products now use graphical user interface (GUI) scanners that check every single file that is created, modi- fied, or accessed by the operating system or by user programs. They also install firewalls to detect malicious software that uses the network to infect computers, install browser add-ons to detect web-based exploits, isolate browsers for safe payment, create kernel drivers for AV self-protection or sandboxing, and so on. Since the old days of Microsoft DOS and other antiquated operating systems, software products have evolved alongside the operating systems, as is natural. However, AV software has evolved at a remarkable rate since the old days because of the incredible amount of malware that has been created. During the 1990s, an AV company would receive only a handful of malware programs in the space of a week, and these were typically file infectors (or viruses). Now, an AV company will receive thousands of unique malicious files (unique con- sidering their cryptographic hash, like MD5 or SHA-1) daily. This has forced the AV industry to focus on automatic detection and on creating heuristics for detection of as-yet-unknown malicious software by both dynamic and static means. Chapters 3 and 4 discuss how AV software works in more depth. The rapid evolution of malware and anti-malware software products is driven by a very simple motivator: money. In the early days, virus creators (also called vxers) used to write a special kind of file infector that focused on performing functions not previously done by others in order to gain recognition or just as a personal challenge. Today, malware development is a highly profitable business used to extort money from computer users, as well as steal their credentials for various online services such as eBay, Amazon, and Google Mail, as well as banks and payment platforms (PayPal, for example); the common goal is to make as much money as possible. Some players in the malware industry can steal email credentials for your Yahoo or Gmail accounts and use them to send spam or malicious software to thousands of users in your name. They can also use your stolen credit card information to issue payments to other bank accounts controlled by them or to pay mules to move the stolen money from dirty bank accounts to clean ones, so their criminal activity becomes harder to trace. www.it-ebooks.info Chapter 1 ■ Introduction to Antivirus Software 5 Another increasingly common type of malware is created by governments, shady organizations, or companies that sell malware (spying software) to govern- ments, who in turn spy on their own people's communications. Some software is designed to sabotage foreign countries' infrastructures. For example, the notorious Stuxnet computer worm managed to sabotage Iran's Natanz nuclear plant, using up to five zero-day exploits. Another example of sabotage is between countries and companies that are in direct competition with another company or country or countries, such as the cyberattack on Saudi Aramco, a sabotage campaign attributed to Iran that targeted the biggest oil company in Saudi Arabia. Software can also be created simply to spy on government networks, cor- porations, or citizens; organizations like the National Security Agency (NSA) and Britain's Government Communications Headquarters (GCHQ), as well as hackers from the Palestine Liberation Army (PLA), engage in these activities almost daily. Two examples of surveillance software are FinFisher and Hacking Team. Governments, as well as law enforcement and security agencies, have purchased commercial versions of FinFisher and Hacking Team to spy on criminals, suspects, and their own citizens. An example that comes to mind is the Bahrain government, which used FinFisher software to spy on rebels who were fighting against the government. Big improvements and the large amounts of money invested in malware development have forced the AV industry to change and evolve dramatically over the last ten years. Unfortunately, the defensive side of information security, where AV software lies, is always behind the offensive side. Typically, an AV company cannot detect malware that is as yet unknown, especially if there is some quality assurance during the development of the malware software piece. The reason is very simple: AV evasion is a key part of malware development, and for attackers it is important that their malware stay undetected as long as possible. Many commercial malware packages, both legal and illegal, are sold with a window of support time. During that support period, the malware product is updated so it bypasses detection by AV software or by the operating system. Alternatively, malware may be updated to address and fix bugs, add new features, and so on. AV software can be the target of an attack, as in the case of The Mask, which was government-sponsored malware that used one of Kaspersky's zero-day exploits. Antivirus Scanners, Kernels, and Products A typical computer user may view the AV software as a simple software suite, but an attacker must be able to view the AV on a deeper level. This chapter will detail the various components of an AV, namely, the kernel, command-line scanner, GUI scanner, daemons or system services, file system filter drivers, network filter drivers, and any other support utility that ships with it. www.it-ebooks.info Antivirus 6 Part I ■ Basics ClamAV, the only open-source AV software, is an example of a scanner. It simply performs file scanning to discover malicious software patterns, and it prints a message for each detected file. ClamAV does not disinfect or use a true (behavioral-based) heuristic system. A kernel, on the other hand, forms the core of an AV product. For example, the core of ClamAV is the libclam.so library. All the routines for unpacking executable programs, compressors, cryptors, protectors, and so on are in this library. All the code for opening compressed files to iterate through all the streams in a PDF file or to enumerate and analyze the clusters in one OLE2 container file (such as a Microsoft Word document) are also in this library. The kernel is used by the scanner clamscan, by the resident (or daemon) clamd, or by other programs and libraries such as its Python bindings, which are called PyClamd. NOTE AV products often use more than one AV core or kernel. For example, F-Secure uses its own AV engine and the engine licensed from BitDefender. An antivirus product may not always offer third-party developers direct access to its core; instead, it may offer access to command-line scanners. Other AV products may not give access to command-line scanners; instead, they may only allow access to the GUI scanner or to a GUI program to configure how the real- time protection, or another part of the product, handles malware detection and disinfection. The AV product suite may also ship with other security programs, such as browsers, browser toolbars, drivers for self-protection, firewalls, and so on. As you can see, the product is the whole software package the AV company ships to the customer, while the scanners are the tools used to scan files and directories, and the kernel includes the core features offered to higher-level software components such as the GUI or command-line scanners. Typical Misconceptions about Antivirus Software Most AV users believe that security products are bulletproof and that just install- ing AV software keeps their computers safe. This belief is not sound, and it is not uncommon to read comments in AV forums like, "I'm infected with XXX malware. How can it be? I have YYY AV product installed!" To illustrate why AV software is not bulletproof, let's take a look at the tasks performed by modern AV products: ■ Discovering known malicious patterns and bad behaviors in programs ■ Discovering known malicious patterns in documents and web pages ■ Discovering known malicious patterns in network packets ■ Trying to adapt and discover new bad behaviors or patterns based on experience with previously known ones www.it-ebooks.info Chapter 1 ■ Introduction to Antivirus Software 7 You may have noticed that the word known is used in each of these tasks. AV products are not bulletproof solutions to combat malware because an AV product cannot identify what is unknown. Marketing material from various AV products may lead the average users to think they are protected from everything, but this is unfortunately far from true. The AV industry is based on known malware patterns; an AV product cannot spot new unknown threats unless they are based on old known patterns (either behavioral or static), regardless of what the AV industry advertises. Antivirus Features All antivirus products share a set of common features, and so studying one system will help you understand another system. The following is a short list of common features found in AV products: ■ The capability to scan compressed files and packed executables ■ Tools for performing on-demand or real-time file or directory scanning ■ A self-protection driver to guard against malware attacking the actual AV ■ Firewall and network inspection functionality ■ Command-line and graphical interface tools ■ A daemon or service ■ A management console The following sections enumerate and briefly discuss some common features shared by most AV products, as well as more advanced features that are avail- able only in some products. Basic Features An antivirus product should have some basic features and meet certain require- ments in order to be useable. For example, a basic requirement is that the AV scanner and kernel should be fast and consume little memory. Making Use of Native Languages Most AV engines (except the old Malwarebytes software, which was not a full AV product) are written in non-managed/native languages such as C, C++, or a mix of both. AV engines must execute as quickly as possible without degrading the system's performance. Native languages fulfill these requirements because, when code is compiled, they run natively on the host CPU at full speed. In the www.it-ebooks.info Antivirus 8 Part I ■ Basics case of managed software, the compiled code is emitted into a bytecode format and requires an extra layer to run: a virtual machine interpreter embedded in the AV kernel that knows how to execute the bytecode. For example, Android DEX files, Java, and .NET-managed code all require some sort of virtual machine to run the compiled bytecode. This extra layer is what puts native languages ahead of managed languages. Writing code using native languages has its drawbacks, though. It is harder to code with, and it is easier to leak memory and system resources, cause memory corruption (buffer overflows, use-after-free, double-free), or introduce programming bugs that may have serious security implications. Neither C nor C++ offers any mechanism to protect from memory corruptions in the way that managed languages such as .NET, Python, and Lua do. Chapter 3 describes vulnerabilities in the parsers and reveals why this is the most common source of bugs in AV software. Scanners Another common feature of AV products is the scanner, which may be a GUI or command-line on-demand scanner. Such tools are used to scan whenever the user decides to check a set of files, directories, or the system's memory. There are also on-access scanners, more typically called residents or real-time scanners. The resident analyzes files that are accessed, created, modified, or executed by the operating system or other programs (like web browsers); it does this to prevent the infection of document and program files by viruses or to prevent known malware files from executing. The resident is one of the most interesting components to attack; for example, a bug in the parser of Microsoft Word documents can expose the resident to arbitrary code execution after a malicious Word document is downloaded (even if the user doesn't open the file). A security bug in the AV's email message parser code may also trigger malicious code execution when a new email with a malicious attachment arrives and the temporary files are created on disk and analyzed by the on-access scanner. When these bugs are triggered, they can be used as a denial-of-service attack, which makes the AV program crash or loop forever, thus disarming the antivirus temporarily or permanently until the user restarts it. Signatures The scanner of any AV product searches files or packets using a set of signatures to determine if the files or packets are malicious; it also assigns a name to a pattern. The signatures are the known patterns of malicious files. Some typical, rather basic, signatures are consumed by simple pattern-matching techniques (for example, finding a specific string, like the EICAR string), CRCs (checksums), or MD5 hashes. Relying on cryptographic hashes, like MD5, works for only a www.it-ebooks.info Introduction to Antivirus Software 9 specific file (as a cryptographic hash tries to identify just that file), while other fuzzy logic-based signatures, like when applying the CRC algorithm on specific chunks of data (as opposed to hashing the whole file), can identify various files. AV products usually have different types of signatures, as described in Chapter 8. These signature types range from simple CRCs to rather complex heuristics patterns based on many features of the PE header, the complexity of the code at the entry point of the executable file, and the entropy of the whole file or some section or segment in the executable file. Sometimes signatures are also based on the basic blocks discovered while performing code analysis from the entry point of the executable files under analysis, and so on. Each kind of signature has advantages and disadvantages. For example, some signatures are very specific and less likely to be prone to a false positive (when a healthy file is flagged as malware), while others are very risky and can generate a large list of false positives. Imagine, for example, a signature that finds the word Microsoft anywhere in a file that starts with the bytes MZ\x90. This would cause a large list of false positives, regardless of whether it was dis- covered in a malware file. Signatures must be created with great care to avoid false positives, like the one in Figure 1-1, or true negatives (when true malware code is flagged as benign). Figure 1-1: A false positive generated with Comodo Internet Security and the de facto reverse- engineering tool IDA Compressors and Archives Another key part of every AV kernel is the support for compressed or archived file formats: ZIP, TGZ, 7z, XAR, and RAR, to name just a few. AVs must be able to decompress and navigate through all the files inside any compressed or archived file, as well as compressed streams in PDF files and other file formats. Because AV kernels must support so many different file formats, vulnerabilities are often found in the code that deals with this variety of input. This book discusses various vulnerabilities that affect different AV products. www.it-ebooks.info Chapter 1 ■ Antivirus 10 Part I ■ Basics Unpackers An unpacker is a routine or set of routines developed for unpacking protected or compressed executable files. Malware in the form of executables is commonly packed using freely available compressors and protectors or proprietary pack- ers (obtained both legally and illegally). The number of packers an AV kernel must support is even larger than the number of compressors and archives, and it grows almost every month with the emergence of new packers used to hide the logic of new malware. Some packer tools, such as UPX (the Universal Unpacker), just apply simple compression. Unpacking samples compressed by UPX is a very simple and straightforward matter. On the other hand, there are very complex pieces of software packers and protectors that transform the code to be packed into bytecode and then inject one or more randomly generated virtual machines into the executable so it runs the original code that the malware wrote. Getting rid of this virtualization layer and uncovering the logic of the malware is very hard and time-consuming. Some packers can be unpacked using the CPU emulator of the AV kernel (a component that is discussed in the following sections); others are unpacked exclu- sively via static means. Other more complex ones can be unpacked using both techniques: using the emulator up to some specific layer and then using a static routine that is faster than using the emulator when some specific values are known (such as the size of the encrypted data, the algorithm used, the key, and so on). As with compressors and archives, unpackers are a very common area to explore when you are looking for vulnerabilities in AV software. The list of packers to be supported is immense; some of them are used only during some specific malware campaign, so the code is likely written once and never again verified or audited. The list of packers to be supported grows every year. Emulators Most AV kernels on the market offer support for a number of emulators, with the only exception being ClamAV. The most common emulator in AV cores is the Intel x86 emulator. Some advanced AV products can offer support for AMD64 or ARM emulators. Emulators are not limited to regular CPUs, like Intel x86, AMD64, or ARM; there are also emulators for some virtual machines. For example, some emulators are aimed at inspecting Java bytecode, Android DEX bytecode, JavaScript, and even VBScript or Adobe ActionScript. Fingerprinting or bypassing emulators and virtual machines used in AV products is an easy task: you just need to find some incongruities here and there. For example, for the Intel x86 emulator, it is unlikely, if not impossible, that the developers of the AV kernel would implement all of the instructions supported by to-be-emulated CPUs in the same way the manufacturers of those www.it-ebooks.info Chapter 1 ■ Introduction to Antivirus Software 11 specific CPUs do. For higher-level components that use the emulator, such as the execution environments for ELF or PE files, it is even less likely that the developers would implement the whole operating system environment or every API provided by the OS. Therefore, it is really easy to discover many different ways to fool emulators and to fingerprint them. Many techniques for evading AV emulators are discussed in this book, as are techniques for fingerprinting them. Part 3 of this book covers writing exploits for a specific AV engine. Miscellaneous File Formats Developing an AV kernel is very complex. The previous sections discussed some of the common features shared by AV cores, and you can imagine the time and effort required to support these features. However, it is even worse with an AV kernel; the kernel must support a very long list of file formats in order to catch exploits embedded in the files. Some file formats (excluding compressors and archives) that come to mind are OLE2 containers (Word or Excel documents); HTML pages, XML documents, and PDF files; CHM help files and old Microsoft Help file formats; PE, ELF, and MachO executables; JPG, PNG, GIF, TGA, and TIFF image file formats; ICO and CUR icon formats; MP3, MP4, AVI, ASF, and MOV video and audio file formats; and so on. Every time an exploit appears for some new file format, an AV engineer must add some level of support for this file format. Some formats are so complex that even their original author may have problems correctly handling them; two examples are Microsoft and its Office file formats, and Adobe and its PDF format. So why would AV developers be expected to handle it better than the original author, considering that they probably have no previous knowledge about this file format and may need to do some reverse-engineering work? As you can guess, this is the most error-prone area in any AV software and will remain so for a long time. Advanced Features The following sections discuss some of the most common advanced features supported by AV products. Packet Filters and Firewalls From the end of the 1990s up until around 2010, it was very common to see a new type of malware, called worms, that abused one or more remote vulner- abilities in some targeted software products. Sometimes these worms simply used default username-and-password combinations to infect network shares in Windows CIFS networks by copying themselves with catchy names. Famous examples are "I love you," Conficker, Melissa, Nimda, Slammer, and Code Red. www.it-ebooks.info Antivirus 12 Part I ■ Basics Because many worms used network resources to infect computers, the AV industry decided to inspect networks for incoming and outgoing traffic. To do so, AV software installed drivers for network traffic analysis, and firewalls for blocking and detecting the most common known attacks. As with the previously mentioned features, this is a good source of bugs, and today worms are almost gone. This is a feature in AV products that has not been updated in years; as a result, it is likely suffering from a number of vulnerabilities because it has been practically abandoned. This is one of the remotely exposed attack surfaces that are analyzed in Chapter 11. Self-Protection As AV software tries to protect computer users from malware, the malware also tries to protect itself from the AV software. In some cases, the malware will try to kill the processes of the installed AV product in order to disable it. Many AV products implement self-protection techniques in kernel driv- ers to prevent the most common killing operations, such as issuing a call to ZwTerminateProcess. Other self-protection techniques used by AV software can be based on denying calls to OpenProcess with certain parameters for their AV processes or preventing WriteProcessMemory calls, which are used to inject code in a foreign process. These techniques are usually implemented with kernel drivers; the protec- tion can also be implemented in userland. However, relying on code running in userland is a failing protection model that is known not to have worked since 2000; in any case, many AV products still make this mistake. Various AV products that experience this problem are discussed in Part III of this book. Anti-Exploiting Operating systems, including Windows, Mac OS X, and Linux, now offer anti- exploiting features, also referred to as security mitigations, like Address Space Layout Randomization (ASLR) and Data Execution Prevention (DEP), but this is a recent development. This is why some AV suites offer (or used to offer) anti-exploiting solutions. Some anti-exploiting techniques can be as simple as enforcing ASLR and DEP for every single program and library linked to the executable, while other techniques are more complex, like user- or kernel-land hooks to determine if some action is allowed for some specific process. Unfortunately, as is common with AV software, most anti-exploiting toolkits offered by the AV industry are implemented in userland via function hooking; the Malwarebytes anti-exploiting toolkit is one example. With the advent of the Microsoft Enhanced Mitigation Experience Toolkit (EMET), most anti-exploiting toolkits implemented by the AV industry either are incomplete compared to it or are simply not up to date, making them easy to bypass. www.it-ebooks.info Chapter 1 ■ Introduction to Antivirus Software 13 In some cases, using anti-exploiting toolkits implemented by some AV compa- nies is even worse than not using any anti-exploiting toolkit at all. One example is the Sophos Buffer Overflow Protection System (BOPS), an ASLR implementa- tion. Tavis Ormandy, a prolific researcher working for Google, discovered that Sophos installed a system-wide Dynamic Link Library (DLL) without ASLR being enabled. This system-wide DLL was injected into processes in order to enforce and implement a faux ASLR for operating systems without ASLR, like Windows XP. Ironically, this system-wide DLL was itself compiled without ASLR support; as a result, in operating systems offering ASLR, like Windows Vista, ASLR was effectively disabled because this DLL was not ASLR enabled. More problems with toolkit implementations in AV software are discussed in Part IV of this book. Summary This introductory chapter talked about the history of antiviruses, various types of malware, and the evolution of both the AV industry and the malware writers' skills who seem to be always ahead of their game. In the second part of this chapter, the antivirus suite was dissected, and its various basic and advanced features were explained in an introductory manner, paving the way for more detailed explanation in the subsequent chapters of the book. In summary: ■ Back in the old days when the AV industry was in its infancy, the AVs were called scanners because they were made of command-line scanners and a signature database. As the malware evolved, so did the AV. AV software now includes heuristic engines and aims at protecting against browser exploits, network packets, email attachments, and document files. ■ There are various types of malicious software, such as Trojans, malware, viruses, rootkits, worms, droppers, exploits, shellcode, and so on. ■ Black hat malware writers are motivated by monetary gains and intel- lectual property theft, among other motivations. ■ Governments also participate in writing malware in the form of spying or sabotage software. Often they write malware to protect their own inter- ests, like the Bahrain government used the FinFisher software to spy on rebels or to sabotage other countries' infrastructures as in the case of the Stuxnet malware that was allegedly co-written by the U.S. and the Israeli governments to target the Iranian nuclear program. ■ AV products are well marketed using all sort of buzz words. This market- ing strategy can be misleading and gives the average users a false sense of security. www.it-ebooks.info Antivirus 14 Part I ■ Basics ■ An AV software is a system made of the core or the kernel, which orches- trates the functionality between all the other components: plug-ins, system services, file system filter drivers, kernel AV components, and so on. ■ AV need to run fast. Languages that compile into native code are the best choice because they compile natively on the platform without the overhead of interpreters (such as VM interpreters). Some parts of the AV can be written using managed or interpreted languages. ■ An AV software is made up of basic features such as the core or the kernel, the scanning engine, signatures, decompressors, emulators, and support for various file format parsing. Additionally, AV products may offer some advanced features, such as packet inspection capabilities, browser security add-ons, self-protection, and anti-exploitation. The next chapter starts discussing how to reverse-engineer AV cores kernels for the sake of automated security testing and fuzzing. Fuzzing is just one way to detect security bugs in antiviruses. www.it-ebooks.info CHAPTER 2 Reverse-Engineering the Core The core of an antivirus product is the internal engine, also known as the kernel. It glues together all important components of the AV while providing support- ing functionality for them. For example, the scanners use the API exported by the core to analyze files, directories, and buffers, as well as to launch other analysis types. This chapter discusses how you can reverse-engineer the core of an antivirus product, what features are interesting from an attacker's viewpoint, and some techniques to make the reverse-engineering process easier, especially when the antivirus software tries to protect itself against being reverse-engineered. By the end of the chapter, you will use Python to write a standalone tool that interfaces directly with the core of an AV product, thus enabling you to perform fuzzing, or automated testing of your evasion techniques. Reverse-Engineering Tools The de facto tool for reverse-engineering is the commercial IDA disassembler. During the course of this book, it is assumed that you have a basic knowledge of IDA because you will be using it for static and dynamic analysis tasks. Other tools that this chapter covers are WinDbg and GDB, which are the standard debuggers for Windows and Linux, respectively. The examples will also use Python for automating typical reverse-engineering tasks both from inside IDA www.it-ebooks.info 15 Antivirus 16 Part I ■ Basics and using the IDAPython plug-in and for writing standalone scripts that do not rely on other third-party plug-ins. Because this chapter covers malware and researching AV evasion techniques, it is recommended that you use virtualization software (such as VMware, VirtualBox, or even QEMU) and carry out the experimentation in a safe, virtual- ized environment. As you will see in the following sections, debugging symbols will be helpful to you when they are present, and the Linux version of an AV is most likely to have debugging symbols shipped with it. For the rest of the book, it is recommended that you keep two virtual machines handy—one with Windows and the other with Linux—in case you want to do hands-on experimentation. Command-Line Tools versus GUI Tools All current antivirus products offer some kind of GUI interface for configuring them, viewing results, setting up scheduled scans, and so on. The GUI scanners are typically too dense to reverse-engineer because they do not interact exclusively with the antivirus kernel also with many other components. Simply trying to discern which code handles GUI painting, refreshing, window events, and so on is a significant task that involves both static and dynamic work. Fortunately, some of today's antivirus products offer command-line-independent scanners. Command-line tools are smaller than their GUI counterparts and are often self-contained, making them the most interesting target to start the reverse- engineering process. Some AV software is designed to run in a centralized server, and therefore the scanning core is used by the server component rather than by the command- line tools or the GUIs. In such cases, the server will expose a communication protocol for the command-line tools to connect to and interface with. That does not mean that the server component has to exist in its own machine; instead, it can still run locally as a system service. For example, Avast for Linux and Kaspersky antivirus products have a server, and the GUIs or command-line scanners connect to it, issue the scan queries through it, and then wait for the results. In such cases, if you attempt to reverse-engineer the command-line tool, you will only learn about the communication protocol, or if you are lucky, you may find remote vulnerabilities in the servers, but you will not be able to understand how the kernel works. To understand how the kernel works, you have to reverse-engineer the server component, which, as mentioned before, is hosting the kernel. In the following sections, the server component from Avast AV for Linux will be used as an example. www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 17 Debugging Symbols On the Windows platform, it is unusual for products to ship with the correspond- ing debugging symbols. On the other hand, on Unix-based operating systems, debugging symbols often ship with third-party products (usually embedded in the binaries). The lack of debugging symbols makes reverse-engineering of the core of the antivirus product or any of its components a difficult task at first because you do not have function or label names that correspond to the disas- sembly listing. As you will see, there are tricks and tools that may help you discover some or all of the symbols for your target antivirus product. When an AV product exists for various platforms, it does not make sense for the company to have different source code for these different platforms. As such, in multi-platform AV products, it is very common for the kernel to share all or some of the source code base between the various platforms. In those situations, when you reverse the core on one platform, reversing it on another platform becomes easier, as you shall see. There are exceptions to this. For example, the AV product may not have a core for a certain platform (say, for Mac OS X) and may license it from another AV vendor. The AV vendor may decide to integrate another existing product's kernel into its own product so it only needs to change names, copyright notices, and the other resources such as strings, icons, and images. This is the case with the Bitdefender product and its engine, where many companies purchase licenses for the engine. Returning to the original question about how to get a partial or full under- standing of how the executable images work, you need to check whether the product you want to analyze offers any version for Unix-based operating systems (Linux, BSD, or Mac OS X), and you hope that the symbols are embedded in the binaries. If you are lucky, you will have symbols on that platform, and because the core is most likely the same between different operating system versions (with a few differences such as the use of OS-specific APIs and runtime libraries), you will be able to transfer the debugging symbols from one platform to another. Tricks for Retrieving Debugging Symbols Having established that on Unix-based operating systems you are more likely to have debugging symbols for AV products, this section uses the F-Secure anti- virus products as an example. Consider the fm library (fm4av.dll in Windows, and libfm-lnx32.so in Linux). Windows does not have debugging symbols for that library, but the Linux version includes many symbols for this and other binaries. www.it-ebooks.info 18 Part I ■ Figure 2-1: F-Secure for Windows library fm4av.dll as displayed in IDA Figure 2-2 shows the functions list with meaningful names, pulled by IDA from the embedded symbols in the binary, for the very same library but for the Linux version. Considering that antivirus kernels are almost equal, with only a few exceptions between platforms, you can start by reverse-engineering the Linux version. The functionality will be similar in the Windows version. You can port the symbols from the Linux version to the Windows version using third-party commercial binary diffing products such as zynamics BinDiff. You can perform the bindiffing on both libraries and then import the matched symbols from the Linux version to the Windows version by right-clicking the Matched Functions tab and selecting Import Functions and Comments (see Figure 2-3). In many situations, unlike the case of F-Secure, which has partial symbols, you may retrieve full symbols with variable and even label names. In those cases, the same techniques can be applied. www.it-ebooks.info Antivirus Basics Figure 2-1 shows the functions list discovered by IDA for the Windows version. Figure 2-2: F-Secure for Linux library libfmx-linux32.so as seen in IDA Figure 2-3: Importing symbols from Linux to Windows www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 19 20 Part I ■ Antivirus Basics Figure 2-4 shows a section of disassembly of one library of Comodo Antivirus for Linux with full symbols. Figure 2-4: Disassembly of Comodo for Linux library libPE32.so showing full symbols Porting symbols between operating systems is not 100-percent reliable for various reasons. For example, different compilers are used for Windows, Linux, BSD, and Mac OS X. While on Unix-based platforms, GCC (and sometimes Clang) is the most used compiler, this is not the case for Windows, where the Microsoft compiler is used. This means that the very same C or C++ code will generate different assembly code for both platforms, making it more difficult to compare functions and port symbols. There are other tools for porting symbols, like the Open Source IDA plug-in Diaphora, created by Joxean Koret, one of the the authors of this book, using the Hex-Rays decompiler-generated Abstract Syntax Tree (AST) for comparing function graphs, among other techniques. Debugging Tricks The previous sections focused exclusively on using static analysis techniques to get information from the antivirus product you want to reverse-engineer. This section focuses on dynamic analysis approaches to reverse-engineering the antivirus product of your choice. Antivirus products, like malware, generally try to prevent reverse-engineering. The AV executable modules can be obfuscated, sometimes even implementing different obfuscation schemes for each binary (as in the case of the Avira kernel). The AV executables may implement anti-debugging tricks that make it difficult for a researcher to understand how the malware detection algorithm operates. These anti-debugging tricks are designed to make it more difficult to debug the compo- nents of an antivirus to get a real idea of how they detect malware or how some specific parser bug can be exploited leading to attacker controlled code execution. www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 21 The following sections offer some advice for debugging antivirus software. All the debugging tips and tricks focus exclusively on Windows because no antivirus has been observed trying to prevent itself from being debugged on Linux, FreeBSD, or Mac OS X. Backdoors and Configuration Settings While antivirus products generally prevent you from attaching to their ser- vices with a debugger, this protection is not difficult to bypass when you employ reverse-engineering techniques. The self-protection mechanisms (as the antivirus industry calls them) are usually meant to prevent malware from attaching to an antivirus service, to create a thread in the context of the antivirus software, or to forbid killing the antivirus processes (a com- mon task in malware products). They are not meant to prevent users from disabling the antivirus in order to debug it or to do whatever they want with it. Actually, it would make no sense to prevent users from disabling (or uninstalling) the product. Disabling the self-protection mechanism of the antivirus product is one of the first steps you must carry out to start any dynamic analysis task where a debugger is involved, unless there is a self-contained command-line analysis scanner (as in the cases of the Avira scancl tool or the Ikarus t3 Scan tool). Command-line scanners do not usually try to protect themselves because, by their nature, they are not resident and are invoked on demand. The methods to disable the antivirus self-protection mechanism are not com- monly documented because, from the point of view of the antivirus companies, this information is only relevant to the support and engineering people: they actually need to debug the services and processes to determine what is happen- ing when a customer reports a problem. This information is not made public because a malware writer could use it to compromise a machine running the antivirus software. Most often, modifying one registry key somewhere in the registry hive enables you to debug the AV services. Sometimes a programmer backdoor may allow you to temporarily disable the self-protection mechanism, as in the case of the old versions of Panda Global Protection. Panda provided a library, called pavshld.dll (Panda Antivirus Shield), which exported one function that received only one parameter: a secret GUID. When passed, this GUID disabled the antivirus software. While there is no tool to call this function, you could easily create a tool to load this library dynamically and then call this function with the secret key, thereby disabling Panda's shield and allowing you to start performing dynamic analysis tasks with OllyDbg, IDA, or your favorite debugger. This vulnerability is discussed more in Chapter 14. The self-protection mechanisms of an antivirus product can be implemented in userland by hooking special functions and implementing anti-debugging tricks. In kernel-land, they can be implemented using a device driver. Today's antivirus software generally implements self-protection mechanisms using kernel drivers. The latter is the correct approach, because relying on userland hooks would be www.it-ebooks.info Antivirus 22 Part I ■ Basics a bad decision for many reasons, the simplest of which is that the hooks can be simply removed from userland processes, as discussed in Chapter 9. If a kernel-land driver was used for the sole purpose of protecting the AV from being disabled, then it may be sufficient for you to simply prevent the kernel driver from loading, which would thus disable the self-protection mechanism. To disable kernel drivers or system services under Windows, you would simply need to open the registry editor tool (regedit.exe), go to HKEY_LOCAL_MACHINE \System\CurrentControlSet\Services, search for any driver installed by the appropriate antivirus product, and patch the appropriate registry value. For example, say that you want to disable the self-protection mechanism (called "anti-hackers") on the Chinese antivirus product Qihoo 360. You would need to change the Start value for the 360AntiHacker driver (360AntiHacker.sys) to 4 (see Figure 2-5), which corresponds to the SERVICE_DISABLED constant in the Windows SDK. Changing the service start type to this value simply means that it is disabled and will not be loaded by Windows. After changing this value, you may need to reboot. Figure 2-5: How to disable the 360AntiHacker driver It is worth mentioning that the antivirus is likely going to forbid you from disabling the driver with an "Access Denied" error message or another less www.it-ebooks.info Reverse-Engineering the Core 23 meaningful message. If this occurs, you can reboot Windows in safe mode, disable the driver, and then reboot again in normal mode. Some antivirus products may have a single driver that implements core func- tionality in addition to the self-protection mechanism. In that case, disabling the driver will simply prevent the antivirus from working correctly because higher components may need to communicate with the driver. If this occurs, you only have one option: kernel debugging. Kernel Debugging This section focuses on how to use a kernel debugger to debug both the antivi- rus drivers and the user-mode processes. Kernel debugging is the least painful method of attaching a debugger to an antivirus process, while avoiding all the anti-debugging tricks based on the user mode. Instead of disabling the antivirus drivers that perform self-protection, you debug the entire operating system and attach, when required, to the desired userland process. This task must be performed using one of the debuggers (WinDbg or Kd) from the Debugging Tools for Windows package or the WDK (see Figure 2-6). Figure 2-6: The WinDbg debugger www.it-ebooks.info Chapter 2 ■ Antivirus 24 Part I ■ Basics To perform kernel debugging, you need to create a virtual machine with either the commercial VMware product or the open-source VirtualBox. The examples in this book use VirtualBox because it is free. After creating a virtual machine with Windows 7 or any later version, you need to configure the operating system boot options to allow kernel debugging. In the old days of Windows XP, Windows 2000, and so on, you could perform kernel debugging by editing the file c:\boot.ini. Since Windows Vista, you need to use the bcdedit tool. To accomplish that, just open a command prompt (cmd.exe) with elevated privileges (run as administrator), and then execute the following two commands: $ bcdedit /debug on $ bcdedit /dbgsettings serial debugport:1 baudrate:115200 The first command enables kernel debugging for the current operating system. The second command sets the global debug settings to serial communications, using the port COM1 and a baud-rate of 115,200, as shown in Figure 2-7. Figure 2-7: Setting up kernel debugging on Windows 7 with bcdedit After successfully configuring debugging for the current operating system, you need to shut down the current virtual machine to set up the remaining configuration settings, this time, from VirtualBox: 1. Right-click the virtual machine, select Settings, and, in the dialog box that appears, click Serial Ports on the left side. 2. Check the Enable Serial port option, select COM1 at Port Number, and then select Host Pipe from the drop-down menu for Port mode. 3. Check the Create Pipe option, and enter the following path in the Port /File Path: \\.\pipe\com_1 (as shown in Figure 2-8). 4. After you have correctly completed the previous steps, reboot the virtual machine and select the operating system that says "Debugger Enabled" in www.it-ebooks.info Figure 2-8: Setting up debugging in VirtualBox NOTE These steps assume that you are working in a Windows host running VirtualBox. Setting up kernel debugging for Windows in a Linux or Mac OS X host is a problematic process that, at the very least, requires two virtual machines and is largely dependent on the host operating system version. Although you can set up kernel debugging in a Linux or Mac OS X host with both VMware and VirtualBox, this can be very difficult. It is recommended that, when possible, you use a Windows host to perform kernel debugging. Debugging User-Mode Processes with a Kernel-Mode Debugger It is also possible with a kernel-mode debugger to debug just user-mode processes instead of the kernel. To do so, you have to connect the kernel debugger (WinDbg, for example) and type commands that allow the debugger to switch the current execution context to the execution context of the desired process. The required steps are listed here: 1. Open WinDbg in an elevated command prompt, and select File→Kernel Debug from the main menu. 2. In the dialog box, go to the COM tab and enter the value of the Port or File you set previously. Check the Pipe option. 3. Configure the symbols path to point to the remote Microsoft symbol server and instruct WinDbg to reload the symbols by issuing the follow- ing commands: www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 25 its description. Voilà! You can now debug both kernel drivers and user- mode applications without worrying about the self-protection mechanism of the corresponding antivirus software. Antivirus 26 Part I ■ Basics .sympath srv*http://msdl.microsoft.com/download/symbols .reload After you set the symbols path, WinDbg will be able to debug with the help of the public symbols. This example uses the F-Secure retail antivirus for Windows; you want to debug its user-mode service, F-Secure Scanner Manager 32-bit (fssm32.exe). To do this from WinDbg in kernel mode, you need to list all the processes running in the debugged host, search for the actual process to debug, switch the current execution context, and then start debugging. To list all the user-mode processes from kernel mode, execute the following command: > !process 0 0 You can filter out results by process name by appending the name of the process to the end of the command, as shown here: > !process 0 0 fssm32.exe PROCESS 868c07a0 SessionId: 0 Cid: 0880 Peb: 7ffdf000 \ ParentCid: 06bc DirBase: 62bb7000 ObjectTable: a218da58 HandleCount: 259. Image: fssm32.exe The output string process 868c07a0 points to an EPROCESS structure. Pass this EPROCESS address to the following command: .process /r /p 868c07a0. The modifiers /r /p are specified so the context switch between kernel and user mode happens automatically so you can debug the fssm32.exe process after running this command: lkd> .process /r /p 868c07a0 Implicit process is now 868c07a0 Loading User Symbols .............................................. After the context switch takes place, you can list all the user-mode libraries loaded by this process with the command lm: lkd> lm start end module name 00400000 00531000 fssm32 (deferred) 006d0000 006ec000 fs_ccf_id_converter32 (deferred) 00700000 0070b000 profapi (deferred) www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 27 00750000 00771000 json_c (deferred) 007b0000 007cc000 bdcore (deferred) 00de0000 00e7d000 fshive2 (deferred) 01080000 010d2000 fpiaqu (deferred) 01e60000 01e76000 fsgem (deferred) 02b20000 02b39000 sechost (deferred) 07f20000 07f56000 daas2 (deferred) 0dc60000 0dc9d000 fsuss (deferred) 0dce0000 0dd2b000 KERNELBASE (deferred) 10000000 10008000 hashlib_x86 (deferred) 141d0000 14469000 fsgeme (deferred) 171c0000 17209000 fsclm (deferred) 174b0000 174c4000 orspapi (deferred) 178d0000 17aad000 fsusscr (deferred) 17ca0000 1801e000 fsecr32 (deferred) 20000000 20034000 fsas (deferred) 21000000 2101e000 fsepx32 (deferred) (...) Now you can debug user-mode processes from kernel mode. If you would like to learn more debugging tricks for WinDbg, it is highly recommended that you read Chapter 4 in Practical Reverse Engineering (Dang, Gazet, Bachaalany, and Josse 2014; Wiley, ISBN-13: 978-1-118-78731-1). Analyzing AV Software with Command-Line Tools Sometimes, you may be lucky enough to find a completely self-contained com- mand-line tool. If this is the case, you don't need to mess with the antivirus in order to disable the protection mechanism or to set up kernel debugging. You can use any debugger you want to dynamically analyze the core of the antivi- rus product. There are various types of antivirus software for Windows that offer such command-line tools (Avira and Ikarus are two examples). However, many antivirus products do not offer any independent command-line tool for Windows because either they dropped this feature or it is exclusively used by the engineers or the support people. If that is the case, you may want to find out which other operating systems are supported. If there is a Linux, BSD, or Mac OS X version, odds are that there is an independent, self-contained command- line tool that you can debug. This is the case with Avira, Bitdefender, Comodo, F-Secure, Sophos, and many others. Debugging the command-line tool does not mean you are going to always debug it interactively with a tool such as WinDbg, IDA, OllyDbg, or GDB. You may want to write fuzzers using a debugging interface, such as the LLDB bindings, Vtrace debugger (from Kenshoto), or PyDbg and WinAppDbg Python APIs. www.it-ebooks.info Antivirus 28 Part I ■ Basics NOTE A fuzzer, or fuzz-testing tool, is a program written with the intent to feed a given program invalid or unexpected input. Depending on the program you are fuzz- ing, the input may vary. For example, when fuzzing an antivirus, you feed the AV mod- ified or incomplete samples. The goal of fuzzers will vary, from finding software bugs or software security bugs, to discovering how a program operates under certain input, and so on. In order to write fuzzers, you need a way to automate the task of modifying the input and then feeding it to the program to be fuzzed. Usually fuzzers run hun- dreds, if not thousands, of input mutations (modifications to the inputs) before they catch noteworthy bugs. Porting the Core This section discusses how to decide what platform and tools to automate. Choosing the appropriate operating system for automation and the right tool from the AV to be emulated puts you on the right path for your reverse-engineering and automation efforts. For automation in general or fuzz automation, the best operating systems are Unix based, especially Linux because it requires less memory and disk space and offers a plethora of tools to automate tasks. In general, it is easier to run a set of Linux-based virtual machines with QEMU, KVM, VirtualBox, or VMware than to do the same with a set of Windows virtual machines. Because of this, it is recommended that you run the fuzzing automations with antivirus software in Linux. Antivirus companies, like regular software companies, usually try to target popular operating systems such as Windows. If the antivirus product does not have a Linux version, but only Windows versions, it will still be pos- sible to run the Windows version of the AV scanner using the Wine (Wine Is Not an Emulator) emulator, at almost native speed. Wine software is best known for running Windows binaries in non-Windows operating systems, such as Linux. Winelib (Wine's supporting library), on the other hand, can be used to port Windows-specific applications to Linux. Some example applications that were successfully ported to Linux using Winelib were Picasa (an image viewer for organizing and editing digital photos, created by Google), Kylix (a compiler and integrated development environment once available from Borland but later discontinued), WordPerfect9 for Linux from Corel, and WebSphere from IBM. The idea behind using Wine or Winelib is that you can choose to run Windows-only command-line tools using Wine or reverse-engineer the core libraries to write a C or C++ wrapper for Linux, using Winelib, that invokes functions exported by a Windows-only DLL. Both mechanisms can be used successfully to run automations with, for example, the Windows-only command-line tool Ikarus t3 Scan (as shown in Figure 2-9) and the mpengine.dll library used by the Microsoft Security Essentials antivirus product (again, exclusive to Windows). This option is recommended www.it-ebooks.info Reverse-Engineering the Core 29 when there is no other way to automate the process of running the targeted antivirus product under Linux because the automation in Windows environ- ments is too complex or requires excessive resources. Figure 2-9: Ikarus t3 Scan running in Linux with Wine A Practical Example: Writing Basic Python Bindings for Avast for Linux This section gives you a practical example of how to reverse-engineer an antivi- rus component to create bindings. In short, when bindings are discussed here, they refer to writing tools or libraries that you can plug in to your fuzzers. The idea is that once you can interact with your own tools instead of with the tools supplied by the antivirus vendor, you can automate other tasks later (such as creating your own scanner or fuzzer). This example uses Avast antivirus for Linux as a target and the Python language as the automation language. This antivirus version is so simple that reverse-engineering it with the aim of writ- ing bindings should take only an hour or two. A Brief Look at Avast for Linux Avast for Linux has only two executables: avast and scan. The first executable is the server process responsible for unpacking the virus database file (the VPS file), launching scans, querying URLs, and so on. The second executable is the client tool to perform these queries. Incidentally, the distributed binaries con- tain partial symbols, as shown in Figure 2-10, which shows the client tool scan. www.it-ebooks.info Chapter 2 ■ 30 Part I ■ Figure 2-10: A list of functions and disassembly of the scan_path function in the "scan" tool from Avast Thanks to the partial symbols, you can start analyzing the file with IDA and easily determine what it does. Start with the main function: .text:08048930 ; int __cdecl main(int argc, const char **argv, const char **envp) .text:08048930 public main .text:08048930 main proc near ; DATA XREF: _start+17 o .text:08048930 .text:08048930 argc = dword ptr 8 .text:08048930 argv = dword ptr 0Ch .text:08048930 envp = dword ptr 10h .text:08048930 .text:08048930 push ebp .text:08048931 mov ebp, esp .text:08048933 push edi .text:08048934 push esi .text:08048935 mov esi, offset src ; "/var/run/avast/scan.sock" .text:0804893A push ebx .text:0804893B and esp, 0FFFFFFF0h www.it-ebooks.info Antivirus Basics Chapter 2 ■ Reverse-Engineering the Core 31 .text:0804893E sub esp, 0B0h .text:08048944 mov ebx, [ebp+argv] .text:08048947 mov dword ptr [esp+28h], 0 .text:0804894F mov dword ptr [esp+20h], 0 .text:08048957 mov dword ptr [esp+24h], 0 .text:0804895F .text:0804895F loc_804895F: ; CODE XREF: main+50 j .text:0804895F ; main+52 j ... .text:0804895F mov eax, [ebp+argc] .text:08048962 mov dword ptr [esp+8],offset shortopts ; "hvVfpabs:e:" .text:0804896A mov [esp+4], ebx ; argv .text:0804896E mov [esp], eax ; argc .text:08048971 call _getopt .text:08048976 test eax, eax .text:08048978 js short loc_8048989 .text:0804897A sub eax, 3Ah ; switch 61 cases .text:0804897D cmp eax, 3Ch .text:08048980 ja short loc_804895F .text:08048982 jmp ds:off_804A5BC[eax*4] ; switch jump At address 0x08048935, there is a pointer to the C string /var/run/avast /scan.sock, which is loaded into the ESI register. Later on, there is a call to the function getopt with the string hvVfpabs:e:. These are the arguments that the scan tool supports and the previous path and Unix socket that the client tool needs to connect to. You can verify it later on, at the address 0x08048B01: .text:08048B01 lea edi, [esp+0BCh+socket_copy] .text:08048B05 mov [esp+4], esi .text:08048B05 ; ESI points to our previously set socket's path .text:08048B09 mov [esp], edi ; dest .text:08048B0C mov [esp+18h], dl .text:08048B10 mov word ptr [esp+42h], 1 .text:08048B17 call _strcpy .text:08048B1C mov dword ptr [esp+8], 0 ; protocol .text:08048B24 mov dword ptr [esp+4], SOCK_STREAM ; type .text:08048B2C mov dword ptr [esp], AF_UNIX ; domain .text:08048B33 call _socket The pointer to the socket's path is copied (using strcpy) to a stack variable (stack_copy), and then it is used to open a Unix domains socket. This socket is then connected via the connect function call to the scan.sock socket: .text:08048B50 mov eax, [esp+0BCh+socket] .text:08048B54 lea edx, [esp+42h] .text:08048B58 mov [esp+4], edx ; addr .text:08048B5C mov [esp], eax ; fd .text:08048B5F neg ecx .text:08048B61 mov [esp+8], ecx ; len .text:08048B65 call _connect .text:08048B6A test eax, eax www.it-ebooks.info Antivirus 32 Part I ■ Basics It is now clear that the client (command-line scanner) wants to connect to the server process and send it scan requests using sockets. The next section looks at how the client communicates with the server. Writing Simple Python Bindings for Avast for Linux In the previous section, you established what the client program does; now, you verify this theory by trying to connect to the socket from the Python prompt: $ python >>> import socket >>> s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) >>> sock_name="/var/run/avast/scan.sock" >>> s.connect(sock_name) It works! You can connect to the socket. Now you need to determine what the client tool sends to the server and what responses it receives. Right after the con- nect call, it calls the function parse_response and expects the result to be the magical value 220: .text:08048B72 mov eax, [esp+0BCh+socket] .text:08048B76 lea edx, [esp+0BCh+response] .text:08048B7A call parse_response .text:08048B7F cmp eax, 220 Now you try to read 1,024 bytes from the socket after connecting to it: $ python >>> import socket >>> s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) >>> sock_name="/var/run/avast/scan.sock" >>> s.connect(sock_name) >>> s.recv(1024) '220 DAEMON\r\n' Mystery solved: you know now that the 220 error response code comes directly from the server as an answer. In your bindings, you need to get the number that is received from the welcome message that the Avast daemon sends and check if the answer is 220, which means everything is all right. Continuing with the main function, there is a call to the av_close function. The following is its disassembly: .text:08049580 av_close proc near .text:08049580 fd = dword ptr -1Ch .text:08049580 buf = dword ptr -18h .text:08049580 n = dword ptr -14h www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 33 .text:08049580 .text:08049580 push ebx .text:08049581 mov ebx, eax .text:08049583 sub esp, 18h .text:08049586 mov [esp+1Ch+n], 5 ; n .text:0804958E mov [esp+1Ch+buf], offset aQuit ; "QUIT\n" .text:08049596 mov [esp+1Ch+fd], eax ; fd .text:08049599 call _write .text:0804959E test eax, eax .text:080495A0 js short loc_80495C1 .text:080495A2 .text:080495A2 loc_80495A2: ; CODE XREF: av_close+4D .text:080495A2 mov [esp+1Ch+fd], ebx ; fd .text:080495A5 call _close .text:080495AA test eax, eax .text:080495AC js short loc_80495B3 The client then calls av_close after finishing its tasks, which sends the string QUIT\n to the daemon, to tell it that it has finished and that it should close the client connection. Now you create a minimal class to communicate with the Avast daemon, basically to connect and successfully close the connection. This is the content of basic_avast_client1.py containing your first implementation: #!/usr/bin/python import socket SOCKET_PATH = "/var/run/avast/scan.sock" class CBasicAvastClient: def __init__(self, socket_name): self.socket_name = socket_name self.s = None def connect(self): self.s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) self.s.connect(self.socket_name) banner = self.s.recv(1024) return repr(banner) def close(self): self.s.send("QUIT\n") def main(): cli = CBasicAvastClient(SOCKET_PATH) print(cli.connect()) cli.close() if __name__ == "__main__": main() www.it-ebooks.info Antivirus 34 Part I ■ Basics You try your script: $ python basic_avast_cli1.py '220 DAEMON\r\n' It works! You have your own code to connect to the daemon server and close the connection. Now it is time to discover more commands, including the most interesting one: the command to analyze a sample file or directory. At address 0x0804083B, there is an interesting function call: .text:08048D34 mov edx, [ebx+esi*4] .text:08048D37 mov eax, [esp+0BCh+socket] .text:08048D3B call scan_path Because you have partial symbols, you can easily determine what this func- tion is for: to scan a path. Take a look at the scan_path function: .text:08049F00 scan_path proc near ; CODE XREF: main+40B .text:08049F00 ; .text:08049EF1 .text:08049F00 .text:08049F00 name = dword ptr -103Ch .text:08049F00 resolved = dword ptr -1038h .text:08049F00 n = dword ptr -1034h .text:08049F00 var_1030 = dword ptr -1030h .text:08049F00 var_102C = dword ptr -102Ch .text:08049F00 var_1028 = dword ptr -1028h .text:08049F00 var_1024 = dword ptr -1024h .text:08049F00 var_1020 = dword ptr -1020h .text:08049F00 var_101C = byte ptr -101Ch .text:08049F00 var_10 = dword ptr -10h .text:08049F00 var_C = dword ptr -0Ch .text:08049F00 var_8 = dword ptr -8 .text:08049F00 var_4 = dword ptr -4 .text:08049F00 .text:08049F00 sub esp, 103Ch .text:08049F06 mov [esp+103Ch+resolved], 0 ; resolved .text:08049F0E mov [esp+103Ch+name], edx ; name .text:08049F11 mov [esp+103Ch+var_10], ebx .text:08049F18 mov ebx, eax .text:08049F1A mov [esp+103Ch+var_8], edi .text:08049F21 mov edi, edx .text:08049F23 mov [esp+103Ch+var_C], esi .text:08049F2A mov [esp+103Ch+var_4], ebp .text:08049F31 mov [esp+103Ch+var_102C], offset storage .text:08049F39 mov [esp+103Ch+var_1028], 1000h .text:08049F41 mov [esp+103Ch+var_1024], 0 .text:08049F49 mov [esp+103Ch+var_1020], 0 .text:08049F51 call _realpath .text:08049F56 test eax, eax .text:08049F58 jz loc_804A040 www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 35 .text:08049F5E .text:08049F5E loc_8049F5E: ; CODE XREF: scan_path+1CE j .text:08049F5E mov ds:storage, 'NACS' .text:08049F68 mov esi, eax .text:08049F6A mov ds:word_804BDE4, ' ' There is a call to the function realpath (which is to get the true real path of the given file or directory) and you can also see the 4-byte string (in little-endian format) SCAN, followed by some spaces. Without actually reverse-engineering the entire function, and given the format of the previous command implemented for the close method in the basic Python bindings for Avast, it seems that the command you want to send to the daemon to scan a file or directory is SCAN /some/path. Now you add the additional code that sends the scan command to the daemon and see the result it returns: #!/usr/bin/python import socket SOCKET_PATH = "/var/run/avast/scan.sock" class CBasicAvastClient: def __init__(self, socket_name): self.socket_name = socket_name self.s = None def connect(self): self.s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) self.s.connect(self.socket_name) banner = self.s.recv(1024) return repr(banner) def close(self): self.s.send("QUIT\n") def scan(self, path): self.s.send("SCAN %s\n" % path) return repr(self.s.recv(1024)) def main(): cli = CBasicAvastClient(SOCKET_PATH) print(cli.connect()) print(cli.scan("malware/xpaj")) cli.close() if __name__ == "__main__": main() www.it-ebooks.info Antivirus 36 Part I ■ Basics When you run the script, you get the following output: $ python basic_avast_cli1.py '220 DAEMON\r\n' '210 SCAN DATA\r\n' This code does not produce useful data because you need to read more packets from the socket as the command 210 SCAN DATA\r\n tells the client that more packets will be sent, with the actual response. Actually, you need to read until you receive a packet with the form 200 SCAN OK\n. Now you can modify the code of the member as follows (a lazy approach that, nevertheless, works): def scan(self, path): self.s.send("SCAN %s\n" % path) while 1: ret = self.s.recv(8192) print(repr(ret)) if ret.find("200 SCAN OK") > -1: break Now you try the code again. This time, you see a different output with the data you expected: $ python basic_avast_cli1.py '220 DAEMON\r\n' '210 SCAN DATA\r\n' 'SCAN /some/path/malware/xpaj/00908235ee9e267fa2f4c83fb4304c63af976cbc\t [L]0.0\t0 Win32:Hoblig\\ [Heur]\r\n' '200 SCAN OK\r\n' None Marvelous! The Avast server answered that the file 00908235ee9e267fa2f 4c83fb4304c63af976cbc was identified as the malware Win32:Hoblig. Now you have a working set of basic Python bindings that, at the very least, can scan paths (either files or directories) and get the scan result; therefore, you can adapt the code to write a fuzzer based on the protocol format. You may want to check whether Avast antivirus for Windows uses the same protocol, and port your bindings to Windows; if this is not the case, then you may want to continue fuzzing under Linux and attach GDB or another debugger to the /bin/avast daemon and use your bindings to feed malformed (fuzzed) input files to the Avast server and wait for it to crash. Remember, the core is the same for both Windows and Linux (although, according to the Avast authors, the Linux core version is not always the latest version of their core). If you have a crash in the Linux version of the tool, the odds of it affecting the Windows version are very high. Indeed, this very same method has been used to find a vulnerability parsing RPM files in the Linux version that affected all Avast- supported platforms. www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 37 The Final Version of the Python Bindings You can download the final version of the Python bindings from the following GitHub project page: https://github.com/joxeankoret/pyavast. The bindings are exhaustive, covering almost all protocol features discovered in April 2014. A Practical Example: Writing Native C/C++ Tools for Comodo Antivirus for Linux If a server is available, interfacing with one that is listening for commands on a given port is an easy way to automate tasks with various antivirus products. Unlike AVG or Avast for Linux, not all products offer such a server interface. In those cases, you need to reverse-engineer the command-line scanner, if there is one, as well as the core libraries, to reconstruct the required internal structures, the relevant functions, and their prototypes so you know how to call those functions using automation. This example creates an unofficial C/C++ SDK for Comodo Antivirus for Linux. Fortunately for you, it comes with full symbols, so discovering the interfaces, structures, and so on will be relatively easy. Start by analyzing the Comodo command-line scanner for Linux (called cmdscan), which is installed in the following directory: /opt/COMODO/cmdscan Open the binary in IDA, wait until the initial auto-analysis finishes, and then go to the main function. You should see a disassembly like this one: .text:00000000004015C0 ; __int64 __fastcall main(int argc, char **argv, char **envp) .text:00000000004015C0 main proc near .text:00000000004015C0 .text:00000000004015C0 var_A0= dword ptr -0A0h .text:00000000004015C0 var_20= dword ptr -20h .text:00000000004015C0 var_1C= dword ptr -1Ch .text:00000000004015C0 .text:00000000004015C0 push rbp .text:00000000004015C1 mov ebp, edi .text:00000000004015C3 push rbx .text:00000000004015C4 mov rbx, rsi ; argv .text:00000000004015C7 sub rsp, 0A8h .text:00000000004015CE mov [rsp+0B8h+var_1C], 0 .text:00000000004015D9 mov [rsp+0B8h+var_20], 0 .text:00000000004015E4 .text:00000000004015E4 loc_4015E4: www.it-ebooks.info Antivirus 38 Part I ■ Basics .text:00000000004015E4 .text:00000000004015E4 mov edx, offset shortopts ; "s:vh" .text:00000000004015E9 mov rsi, rbx ; argv .text:00000000004015EC mov edi, ebp ; argc .text:00000000004015EE call _getopt .text:00000000004015F3 cmp eax, 0FFFFFFFFh Here, it's checking the command-line options s:vh with the standard getopt function. If you run the command /opt/COMODO/cmdscan without arguments, it prints out the usage of this command-line scanner: $ /opt/COMODO/cmdscan USAGE: /opt/COMODO/cmdscan -s [FILE] [OPTION...] -s: scan a file or directory -v: verbose mode, display more detailed output -h: this help screen The command-line options identified in the disassembly, s:vh, are documented. The most interesting one in this case is the -s flag, which instructs the tool to scan a file or directory. Continue analyzing the disassembly to understand how this flag works: .text:00000000004015F8 cmp eax, 's' .text:00000000004015FB jz short loc_401613 (...) .text:0000000000401613 loc_401613: .text:0000000000401613 mov rdi, cs:optarg ; name .text:000000000040161A xor esi, esi ; type .text:000000000040161C call _access .text:0000000000401621 test eax, eax .text:0000000000401623 jnz loc_40172D .text:0000000000401629 mov rax, cs:optarg .text:0000000000401630 mov cs:src, rax ; Path to scan .text:0000000000401637 jmp short next_cmdline_option When the -s flag is specified, it checks whether the next argument is an exist- ing path by calling access. If the argument exists, it saves the pointer to the path to scan (a filename or directory) in the src static variable and continues parsing more command-line arguments. Now you can analyze the code after the command-line arguments are parsed: .text:0000000000401649 loc_401649: ; CODE XREF: main+36 j .text:0000000000401649 cmp cs:src, 0 .text:0000000000401651 jz no_filename_specified .text:0000000000401657 mov edi, offset dev_aflt_fd ; a2 .text:000000000040165C call open_dev_avflt .text:0000000000401661 call load_framework .text:0000000000401666 call maybe_IFrameWork_CreateInstance www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 39 The code checks whether the path to scan, src, was specified; if not, it goes to a label that shows the usage help and exits. Otherwise, it calls an open_dev_avflt function, then load_framework, and later maybe_IFramework_CreateInstance. You do not really need to reverse-engineer the open_dev_avflt function, as the device /dev/avflt is not actually required for scanning. Skip that function and go directly to load_framework, the function that is responsible for loading the Comodo kernel. The following is the entire pseudo-code for this function: void *load_framework() { int filename_size; // eax@1 char *self_dir; // rax@2 int *v2; // rax@3 char *v3; // rax@3 void *hFramework; // rax@6 void *CreateInstance; // rax@7 char *v6; // rax@9 char filename[2056]; // [sp+0h] [bp-808h]@1 filename_size = readlink("/proc/self/exe", filename, 0x800uLL); if ( filename_size == -1 || (filename[filename_size] = 0, self_dir = dirname(filename), chdir(self_dir)) ) { v2 = __errno_location(); v3 = strerror(*v2); LABEL_4: fprintf(stderr, "%s\n", v3); exit(1); } hFramework = dlopen("./libFRAMEWORK.so", 1); hFrameworkSo = hFramework; if ( !hFramework ) { v6 = dlerror(); fprintf(stderr, "error is %s\n", v6); goto LABEL_10; } CreateInstance = dlsym(hFramework, "CreateInstance"); FnCreateInstance = (int (__fastcall *) (_QWORD, _QWORD, _QWORD, _QWORD))CreateInstance; if ( !CreateInstance ) { LABEL_10: v3 = dlerror(); goto LABEL_4; } return CreateInstance; } www.it-ebooks.info Antivirus 40 Part I ■ Basics The decompiled code looks nice, doesn't it? You could just copy this function from the pseudo-code view to your C/C++ source file. In summary, the pseudo- code does the following: ■ It resolves its path by reading the symbolic link created by the Linux kernel /proc/self/exe, and then makes that path the current working directory. ■ It dynamically loads the libFRAMEWORK.so and resolves the function CreateInstance and stores the pointer into the FnCreateInstance global variable. ■ The CreateInstance function simply loads the kernel, which seems to reside inside libFRAMEWORK.so, and resolves the base function required to create a new instance of the framework. Next, you need to reverse-engineer the maybe_IFramework_CreateInstance function: .text:0000000000401A50 maybe_IFrameWork_CreateInstance proc near .text:0000000000401A50 .text:0000000000401A50 hInstance= qword ptr -40h .text:0000000000401A50 var_38= qword ptr -38h .text:0000000000401A50 maybe_flags= qword ptr -28h .text:0000000000401A50 .text:0000000000401A50 push rbp .text:0000000000401A51 xor esi, esi .text:0000000000401A53 xor edi, edi .text:0000000000401A55 mov edx, 0F0000h .text:0000000000401A5A push rbx .text:0000000000401A5B sub rsp, 38h .text:0000000000401A5F mov [rsp+48h+hInstance], 0 .text:0000000000401A68 lea rcx, [rsp+48h+hInstance] .text:0000000000401A6D call cs:FnCreateInstance The function the program resolved before, FnCreateInstance, is being called now, passing a local variable called hInstance. Naturally, it is going to create an instance of the Comodo Antivirus interface. Right after it creates the instance, the following pseudo-code is executed: BYTE4(maybe_flags) = 0; LODWORD(maybe_flags) = -1; g_FrameworkInstance = hInstance; cur_dir = get_current_dir_name(); hFramework = g_FrameworkInstance; cur_dir_len = strlen(cur_dir); if ( hFramework->baseclass_0->CFrameWork_Init( hFramework, cur_dir_len + 1, cur_dir, maybe_flags, 0LL) < 0 ) www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 41 { fwrite("IFrameWork Init failed!\n", 1uLL, 0x18uLL, stderr); exit(1); } free(cur_dir); This code is initializing the framework by calling hFramework->baseclass_0 ->CFrameWork_Init. It receives the hFramework instance that was just created, the directory with all the other kernel files, the size of the given directory path buffer, and what appears to be the flags given to the CFrameWork_Init. The current directory is the path of the actual cmdscan program, /opt/COMODO/, as it changed the current working directory earlier. After this, more functions are called in order to correctly load the kernel: LODWORD(v8) = -1; BYTE4(v8) = 0; if ( g_FrameworkInstance->baseclass_0->CFrameWork_LoadScanners( g_FrameworkInstance, v8) < 0 ) { fwrite("IFrameWork LoadScanners failed!\n", 1uLL, 0x20uLL, stderr); exit(1); } if ( g_FrameworkInstance->baseclass_0->CFrameWork_CreateEngine( g_FrameworkInstance, (IAEEngineDispatch **)&g_Engine) < 0 ) { fwrite("IFrameWork CreateEngine failed!\n", 1uLL, 0x20uLL, stderr); exit(1); } if ( g_Engine->baseclass_0->CAEEngineDispatch_GetBaseComponent( g_Engine, (CAECLSID)0x20001, (IUnknown **)&g_base_component_0x20001) < 0 ) { fwrite("IAEEngineDispatch GetBaseComponent failed!\n", 1uLL, 0x2BuLL, stderr); exit(1); } This loads the scanner routines by calling CFrameWork_LoadScanners, it creates a scanning engine by calling CFrameWork_CreateEngine, and it gets a base dis- patcher component, whatever it means for them, by calling CAEEngineDispatch_ GetBaseComponent. Although the next part can be safely ignored, it is good to understand the functionality anyway: v4 = operator new(0xB8uLL); v5 = (IAEUserCallBack *)v4; *(_QWORD *)v4 = &vtable_403310; www.it-ebooks.info Antivirus 42 Part I ■ Basics pthread_mutex_init((pthread_mutex_t *)(v4 + 144), 0LL); memset(&v5[12], 0, 0x7EuLL); g_user_callbacks = (__int64)v5; result = g_Engine->baseclass_0->CAEEngineDispatch_SetUserCallBack (g_Engine, v5); if ( result < 0 ) { fwrite("SetUserCallBack() failed!\n", 1uLL, 0x1AuLL, stderr); exit(1); } This code is used to set a few callbacks. For example, you could install callbacks to be notified every time a new file is opened, created, read, written, and so on. Do you want to write a generic unpacker using the Comodo engine? Install a notification callback and wait for it to be called, copy the temporary file or buffer, and you are done! Generic unpackers based on antivirus engines are popular. This is interesting, but the purpose of this demonstration is to reverse-engineer the core to get sufficient information about how to write a C/C++ SDK to interact with the Comodo kernel. Now that the maybe_IFrameWork_CreateInstance function has been analyzed, go back and look at the main function. The next code after the call to the previously analyzed function will be similar to the following pseudo-code: if ( __lxstat(1, filename, &v7) == -1 ) { v5 = __errno_location(); v6 = strerror(*v5); fprintf(stderr, "%s: %s\n", filename, v6); } else { if ( verbose ) fwrite("-----== Scan Start ==-----\n", 1uLL, 0x1BuLL, stdout); if ( (v8 & 0xF000) == 0x4000 ) scan_directory(filename, verbose, (__int64)&scanned_files, (__int64)&virus_found); else scan_stream(filename, verbose, &scanned_files, &virus_found); if ( verbose ) fwrite("-----== Scan End ==-----\n", 1uLL, 0x19uLL, stdout); fprintf(stdout, "Number of Scanned Files: %d\n", (unsigned int)scanned_files); fprintf(stdout, "Number of Found Viruses: %d\n", (unsigned int)virus_found); } This code checks whether the path pointed out by the global variable src exists. If it does, the code calls either scan_directory or scan_stream, depending on the flags returned by the call to __lxstat. The function to scan directories www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 43 is likely calling scan_stream for each discovered element. You can now delve deeper into this function to see what it does: int __fastcall scan_stream( char *filename, char verbose, _DWORD *scanned_files, _DWORD *virus_found) (...) SCANRESULT scan_result; // [sp+10h] [bp-118h]@1 SCANOPTION scan_option; // [sp+90h] [bp-98h]@1 ICAVStream *inited_to_zero; // [sp+E8h] [bp-40h]@1 memset(&scan_option, 0, 0x49uLL); memset(&scan_result, 0, 0x7EuLL); scan_option.ScanCfgInfo = (x1)-1; scan_option.bScanPackers = 1; scan_option.bScanArchives = 1; scan_option.bUseHeur = 1; scan_option.eSHeurLevel = 2; base_component_0x20001 = *(struct_base_component_0x20001_t **)g_base_comp; scan_option.dwMaxFileSize = 0x2800000; scan_option.eOwnerFlag = 1; inited_to_zero = 0LL; result = base_component_0x20001->pfunc50( g_base_comp, (__int64 *)&inited_to_zero, (__int64)filename, 1LL, 3LL, 0LL); This code segment is really interesting. It starts by initializing a SCANRESULT and a SCANOPTION object and specifying the required flags, such as whether archives should be scanned, the heuristic enabled, and so on. Then, the code calls a member function, pfunc50, passing a lot of arguments to it, such as the base component, the filename, and so on. You do not know what the function pfunc50 does, but do you really need it? Remember, the current task is not to fully understand how the Comodo kernel works but, rather, to interface with it. Continue with the following code: err = result; if ( result >= 0 ) { memset((void *)(g_user_callbacks + 12), 0, 0x7EuLL); err = g_Engine->baseclass_0->CAEEngineDispatch_ScanStream(g_Engine, inited_to_zero, &scan_option, &scan_result); (...) www.it-ebooks.info Antivirus 44 Part I ■ Basics This is the code that is actually scanning the file. It seems that the local vari- able inited_to_zero that was passed to the call to pfunc50 has all the required information to analyze the file. It is given to the function call CAEEngineDispatch_ ScanStream, as well as other arguments. The most interesting of these arguments are the SCANOPTION and SCANRESULT objects, which have an obvious purpose: to specify the scanning options and get the results of the scan. CAEEngineDispatch_ ScanStream is also initializing some global callbacks to zero, but you can skip this part and all the other parts in this function that use the callbacks. The next interesting part is the following one: if ( err >= 0 ) { ++*scanned_files; if ( verbose ) { if ( scan_result.bFound ) { fprintf(stdout, "%s ---> Found Virus, Malware Name is %s\n", filename, scan_result.szMalwareName); result = fflush(stdout); } else { fprintf(stdout, "%s ---> Not Virus\n", filename); result = fflush(stdout); } } } This code snippet checks whether the local variable err is not zero, incre- ments the scanned_files variable, and prints out the discovered malware name if the bFound member of the SCANRESULT object evaluates to true. The last step in this function is to simply increase the count of viruses found if a malware was detected: if ( scan_result.bFound ) { if ( err >= 0 ) ++*virus_found; } It's now time to go back to the main function. The last code after calling the scan_* functions is the following one: uninit_framework(); dlclose_framework(); close_dev_aflt_fd(&dev_aflt_fd); www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 45 This is the code for cleaning up; it un-initializes the framework and cancels any possible remaining scan: g_base_component_0x20001 = 0LL; if ( g_Engine ) { g_Engine->baseclass_0->CAEEngineDispatch_Cancel(g_Engine); result = g_Engine->baseclass_0->CAEEngineDispatch_UnInit( g_Engine, 0LL); g_Engine = 0LL; } if ( g_FrameworkInstance ) { result = g_FrameworkInstance->baseclass_0->CFrameWork_UnInit( g_FrameworkInstance, 0LL); g_FrameworkInstance = 0LL; } Finally, you close the used libFRAMEWORK.so library: void __cdecl dlclose_framework() { if ( hFrameworkSo ) dlclose(hFrameworkSo); } You now have all the information required to write your own C/C++ to interface with Comodo Antivirus! Fortunately, this antivirus ships with all the neces- sary structures, so you can export all the structure and enumeration definitions to a header file. To do so, in IDA, select View→Open Subviews→Local Types, right-click the Local Types window, and select the Export to Header File option from the pop-up menu. Check the Generate Compilable Header File option, select the correct path to write the header file, and click Export. After you fix compilation errors in it, this header file can be used in a common C/C++ project. The process of fixing the header file in order to use it with a common compiler is a nightmare. However, in this case, you do not need to go through this pro- cess. You can download the header file from https://github.com/joxeankoret /tahh/tree/master/comodo. Once you download this header file, you can get started. First, you create a command-line tool similar to Comodo cmdscan, but one that exports more interesting internal information. You start by adding the following required include files: #include#include #include #include www.it-ebooks.info Antivirus 46 Part I ■ Basics #include #include #include #include #include #include #include #include "comodo.h" These are the header files that you will need. You can now copy most of the pseudo-code created by the Hex-Rays decompiler into your project. However, you should do it step-by-step instead of copying the entire decompiled file. Start by adding the required calls to initialize, scan, and clean up the core in the function main: int main(int argc, char **argv) { int scanned_files = 0; int virus_found = 0; if ( argc == 1 ) return 1; load_framework(); maybe_IFrameWork_CreateInstance(); scan_stream(argv[1], verbose, &scanned_files, &virus_found); printf("Final number of Scanned Files: %d\n", scanned_files); printf("Final number of Found Viruses: %d\n", virus_found); uninit_framework(); dlclose_framework(); return 0; } In this code, the first command-line argument represents the file to scan. You start by loading the framework and creating an instance. You then call scan_ stream, which shows a summary of the scanned files and then un-initializes the framework and unloads the library that was used. You need to implement many functions here: load_framework, maybe_IFrameWork_CreateInstance, scan_stream, uninit_framework, and dlclose_framework. You can simply copy these functions from the Hex-Rays decompiler: go through each function and copy the pseudo-code. It will look like this: //---------------------------------------------------------------------- void uninit_framework() { g_base_component_0x20001 = 0; www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 47 if ( g_Engine ) { g_Engine->baseclass_0->CAEEngineDispatch_Cancel(g_Engine); g_Engine->baseclass_0->CAEEngineDispatch_UnInit(g_Engine, 0); g_Engine = 0; } if ( g_FrameworkInstance ) { g_FrameworkInstance->baseclass_0->CFrameWork_UnInit( g_FrameworkInstance, 0); g_FrameworkInstance = 0; } } //---------------------------------------------------------------------- int scan_stream(char *src, char verbosed, int *scanned_files, int *virus_found) { struct_base_component_0x20001_t *base_component_0x20001; int result; HRESULT err; SCANRESULT scan_result; SCANOPTION scan_option; ICAVStream *inited_to_zero; memset(&scan_option, 0, sizeof(SCANOPTION)); memset(&scan_result, 0, sizeof(SCANRESULT)); scan_option.ScanCfgInfo = -1; scan_option.bScanPackers = 1; scan_option.bScanArchives = 1; scan_option.bUseHeur = 1; scan_option.eSHeurLevel = enum_SHEURLEVEL_HIGH; base_component_0x20001 = * (struct_base_component_0x20001_t **)g_base_component_0x20001; scan_option.dwMaxFileSize = 0x2800000; scan_option.eOwnerFlag = enum_OWNER_ONDEMAND; scan_option.bDunpackRealTime = 1; scan_option.bNotReportPackName = 0; inited_to_zero = 0; result = base_component_0x20001->pfunc50( g_base_component_0x20001, (__int64 *)&inited_to_zero, (__int64)src, 1LL, 3LL, 0); err = result; if ( result >= 0 ) www.it-ebooks.info Antivirus 48 Part I ■ Basics { err = g_Engine->baseclass_0->CAEEngineDispatch_ScanStream (g_Engine, inited_to_zero, &scan_option, &scan_result); if ( err >= 0 ) { (*scanned_files)++; if ( scanned_files ) { //printf("Got scan result? %d\n", scan_result.bFound); if ( scan_result.bFound ) { printf("%s ---> Found Virus, Malware Name is %s\n", src, scan_result.szMalwareName); result = fflush(stdout); } else { printf("%s ---> Not Virus\n", src); result = fflush(stdout); } } } } if ( scan_result.bFound ) { if ( err >= 0 ) (*virus_found)++; } return result; } //---------------------------------------------------------------------- int maybe_IFrameWork_CreateInstance() { char *cur_dir; CFrameWork *hFramework; int cur_dir_len; CFrameWork *hInstance; int *v8; int *maybe_flags; hInstance = 0; if ( FnCreateInstance(0, 0, 0xF0000, &hInstance) < 0 ) { fwrite("CreateInstance failed!\n", 1uLL, 0x17uLL, stderr); exit(1); } BYTE4(maybe_flags) = 0; LODWORD(maybe_flags) = -1; www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 49 g_FrameworkInstance = hInstance; cur_dir = get_current_dir_name(); hFramework = g_FrameworkInstance; cur_dir_len = strlen(cur_dir); if ( hFramework->baseclass_0->CFrameWork_Init (hFramework, cur_dir_len + 1, cur_dir, maybe_flags, 0) < 0 ) { fwrite("IFrameWork Init failed!\n", 1uLL, 0x18uLL, stderr); exit(1); } free(cur_dir); LODWORD(v8) = -1; BYTE4(v8) = 0; if ( g_FrameworkInstance->baseclass_0- >CFrameWork_LoadScanners(g_FrameworkInstance, v8) < 0 ) { fwrite("IFrameWork LoadScanners failed!\n", 1uLL, 0x20uLL, stderr); exit(1); } if ( g_FrameworkInstance->baseclass_0- >CFrameWork_CreateEngine(g_FrameworkInstance, (IAEEngineDispatch **) &g_Engine) < 0 ) { fwrite("IFrameWork CreateEngine failed!\n", 1uLL, 0x20uLL, stderr); exit(1); } if ( g_Engine->baseclass_0->CAEEngineDispatch_GetBaseComponent( g_Engine, (CAECLSID)0x20001, (IUnknown **)&g_base_component_0x20001) < 0 ) { fwrite("IAEEngineDispatch GetBaseComponent failed!\n", 1uLL, 0x2BuLL, stderr); exit(1); } return 0; } //---------------------------------------------------------------------- void dlclose_framework() { if ( hFrameworkSo ) dlclose(hFrameworkSo); } //---------------------------------------------------------------------- void load_framework() { int filename_size; char *self_dir; www.it-ebooks.info Antivirus 50 Part I ■ Basics int *v2; char *v3; void *hFramework; char *v6; char filename[2056]; filename_size = readlink("/proc/self/exe", filename, 0x800uLL); if ( filename_size == -1 || (filename[filename_size] = 0, self_dir = dirname(filename), chdir(self_dir)) ) { v2 = __errno_location(); v3 = strerror(*v2); fprintf(stderr, "Directory error: %s\n", v3); exit(1); } hFramework = dlopen("./libFRAMEWORK.so", 1); hFrameworkSo = hFramework; if ( !hFramework ) { v6 = dlerror(); fprintf(stderr, "Error loading libFRAMEWORK: %s\n", v6); exit(1); } FnCreateInstance = (FnCreateInstance_t)dlsym(hFramework, "CreateInstance"); if ( !FnCreateInstance ) { v3 = dlerror(); fprintf(stderr, "%s\n", v3); exit(1); } } You only need to add the forward declarations of the functions right after the last include directive, as well as the global variables: //---------------------------------------------------------------------- // Function declarations int main(int argc, char **argv, char **envp); void uninit_framework(); int scan_stream(char *src, char verbosed, int *scanned_files, int *virus_found); int maybe_IFrameWork_CreateInstance(); void dlclose_framework(); void load_framework(); void scan_directory(char *src, unsigned __int8 a2, www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 51 __int64 a3, __int64 a4); //---------------------------------------------------------------------- // Data declarations char *optarg; char *src; char verbose; __int64 g_base_component_0x20001; __int64 g_user_callbacks; CAEEngineDispatch *g_Engine; CFrameWork *g_FrameworkInstance; typedef int (__fastcall *FnCreateInstance_t)(_QWORD, _QWORD, _QWORD, CFrameWork **); int (__fastcall *FnCreateInstance)( _QWORD, _QWORD, _QWORD, CFrameWork **); void *hFrameworkSo; vtable_403310_t *vtable_403310; You are now done with the very basic version of the Comodo command-line scanner. You can compile it with the following command in a Linux machine: $ g++ cmdscan.c -o mycmdscan -fpermissive \ -Wno-unused-local-typedefs -ldl In order to test it, you need to copy it to the /opt/COMODO directory, using the following command: $ sudo cp mycmdscan /opt/COMODO You can now test this program to see whether it is working like the original cmdscan from Comodo: $ /opt/COMODO/mycmdscan /home/joxean/malware/eicar.com.txt /home/joxean/malware/eicar.com.txt ---> Found Virus , \ Malware Name is Malware Number of Scanned Files: 1 Number of Found Viruses: 1 It works! Now, it is time to print more information regarding the detected or undetected file. If you look at the SCANRESULT structure, you will find some interesting members: struct SCANRESULT { char bFound; int unSignID; char szMalwareName[64]; int eFileType; int eOwnerFlag; www.it-ebooks.info Antivirus 52 Part I ■ Basics int unCureID; int unScannerID; int eHandledStatus; int dwPid; __int64 ullTotalSize; __int64 ullScanedSize; int ucrc1; int ucrc2; char bInWhiteList; int nReserved[2]; }; You can, for example, get the signature identifier that matched your malware, the scanner identifier, and the CRCs (checksums) that were used to detect your file, as well as whether the file is white-listed. In the scan_stream routine, you replace the line printing the discovered malware name with the following lines: printf("%s ---> Malware: %s\n", src, scan_result.szMalwareName); if ( scan_result.unSignID ) printf("Signature ID: 0x%x\n", scan_result.unSignID); if ( scan_result.unScannerID ) printf("Scanner : %d (%s)\n", scan_result.unScannerID, get_scanner_name(scan_result.unScannerID)); if ( scan_result.ullTotalSize ) printf("Total size : %lld\n", scan_result.ullTotalSize); if ( scan_result.ullScanedSize ) printf("Scanned size: %lld\n", scan_result.ullScanedSize); if ( scan_result.ucrc1 || scan_result.ucrc2 ) printf("CRCs : 0x%x 0x%x\n", scan_result.ucrc1, scan_result.ucrc2); result = fflush(stdout); Now, replace the line where the Not virus line is printed with the following lines: printf("%s ---> Not Virus\n", src); if ( scan_result.bInWhiteList ) printf("INFO: The file is white-listed.\n"); result = fflush(stdout); The last step is to add the following function before the scan_stream routine to resolve scanner identifiers to scanner names: //---------------------------------------------------------------------- const char *get_scanner_name(int id) { www.it-ebooks.info Chapter 2 ■ Reverse-Engineering the Core 53 switch ( id ) { case 15: return "UNARCHIVE"; case 28: return "SCANNER_PE64"; case 27: return "SCANN...