Mobile-Security-Framework-MobSF

MobSF Docker位置：https://hub.docker.com/r/opensecurity/mobile-security-framework-mobsf/

參考資料

https://my.oschina.net/u/4346652/blog/4318594
https://www.wandouip.com/t5i194106/
https://toments.com/2874458/
https://www.ctolib.com/topics-121483.html
1. 類似REF16
https://www.mad-coding.cn/2019/10/11/%E4%BD%BF%E7%94%A8docker%E5%AE%89%E8%A3%85%E7%A7%BB%E5%8A%A8%E5%AE%89%E5%85%A8%E6%A1%86%E6%9E%B6%EF%BC%88MobSF%EF%BC%89/
1. 值得參考
https://www.liqixin.net/archives/127
1. 可看看，精簡安裝，2020/04的內容
https://www.juduo.cc/tech/137501.html
1. 可看看，簡介類型
https://www.pianshen.com/article/49881524763/
1. 類似REF16
https://www.freebuf.com/column/220190.html
1. 建議與REF 15與REF 17一起看
2. 背景介紹不錯
3. 版本與現在的較接近
https://www.itdaan.com/tw/7190acfac5ca7d25eb4845b4d4d9bb2e
https://blog.csdn.net/vivian_ll/article/details/81092231
1. 類似REF16，安裝步驟太瑣碎，參考。
https://blog.csdn.net/hellomanshan/article/details/78229613
1. 類似REF16，安裝步驟太瑣碎，參考。
https://my.oschina.net/u/4346652/blog/4318594
1. 類似REF16，安裝步驟太瑣碎，參考。
https://www.twblogs.net/a/5b8cfa492b7177188338206d
1. 類似REF 16，參考就好
http://blog.jason.tools/2019/09/2020-ironman-03.html
1. 值得一讀，除了有精簡的安裝流程，並可參考其背景介紹以及簡單的使用方式。
2. 可以搭配REF 17一起看。
https://www.itread01.com/p/45521.html
1. 2018技術文章，MobSF為0.9.2（現在版本為 v3.1 beta）
2. 並沒有採用Docker
3. 以VirtualBox，採用Linux作業系統安裝
4. 參考就好
https://ithelp.ithome.com.tw/articles/10209033
1. 大二資工人-DYA25-終於成功安裝Mobsf，介紹docker與直接用github的源碼進行安裝
2. 以Docker安裝部分
  1. 先安裝Docker，並直接執行：
    docker pull opensecurity/mobile-security-framework-mobsf
    docker run -it -p 8000:8000 opensecurity/mobile-security-framework-mobsf:latest
  2. 執行後可看到以下畫面，可以http://0.0.0.0:8000 登入系統
3. 以github直接安裝
  1. 從github pull源碼
  2. 裝python requirements
  3. java環境JDK
4. 用Docker比較推薦先試試

https://dotblogs.com.tw/johnes/2018/08/18/mobilesecurityframeworkinstallation

Where are the Android Malware and Datasets?

DroidCat, http://www.people.vcu.edu/~rashidib/Res_files/DroidCatDataset.htm
1. 利用DroidCat取得Android APP的behavior logs
2. DroidCat被發表在Computer & Security期刊與CNS會議(2016)
3. 如果需要分析Andorid APP行為軌跡，可以跟作者要Dataset
Drebin, https://www.sec.cs.tu-bs.de/~danarp/drebin/
1. https://www.sec.cs.tu-bs.de/pubs/2014-ndss.pdf
2. 充足Dataset從Android Market, Chinese Market and Russian Market，還有http://www.malgenomeproject.org，總共超過14萬隻Binary，其中5000+是Malicious。
3. Drebin是這個研究團隊提出來的輕量化Android Malware偵測技術，透過Static Features進行判斷，期待改善Andorid AV在效能上的限制，並確保準確度。
4. 挑選VirusTotal十個AV投票，超過兩個AV，即判斷為Malicious。
5. 這篇論文可作為寫研究論文的範例，尤其在邏輯條理，論述結構，富有參考價值。
https://zeltser.com/malware-sample-sources/
1. 一般的惡意程式資料庫
https://www.researchgate.net/post/Where_can_I_get_Android_Malware_Samples
1. 作者希望找尋Android malware的Dataset，為了靜態分析研究用
2. 298個樣本，https://github.com/ashishb/android-malware (說明: http://sanddroid.xjtu.edu.cn:8080/) (*****)
  1. 其中xHelper是今年的Android malware (trojan) https://blog.malwarebytes.com/android/2020/02/new-variant-of-android-trojan-xhelper-reinfects-with-help-from-google-play/
3. Christian Camilo Urcuqui López的整理非常不錯（包含知名的Dataset for Andorid Malware)
4. DroidBench，https://github.com/secure-software-engineering/DroidBench
https://github.com/sk3ptre/AndroidMalware_2020
1. 趕快收集下來 (*****)
https://www.unb.ca/cic/datasets/andmal2017.html
1. 下載Android Malware Dataset (CIC-AndMal2017)與Android Adware and General Malware Dataset (CIC-AAGM2017) (需填寫資料)

New information

Using MobSF for APP testing

Android malware download:
- https://github.com/sk3ptre/AndroidMalware_2020
- https://github.com/sk3ptre?tab=repositories
Start up from Docker images of MobSF
- docker run -it -p 8000:8000 opensecurity/mobile-security-framework-mobsf:latest
- testing the APP from browser
MobSF info
- https://mobsf.github.io/docs/#/extras?id=rest-api
- https://mobsf.github.io/Mobile-Security-Framework-MobSF/
MobSF API with Python example
- usage: https://gist.github.com/ajinabraham/0f5de3b0c7b7d3665e54740b9f536d81
- sudo pip install requests_toolbelt
- python .... (要先確認API key of MobSF與source folder of APKs)
Task：to extract the static analysis result from a batch of Android malware samples by MobSF's API with Python.
- Please refer to the above example
  - Source folder: /Users/ching-haomao/Downloads/AndroidMalware_2020-master
  - API Key: find out from MobSF's API
  - you might need to modify the above example due to batch processing (APKs)
    unzipped from batched files
    .apk
    how to store the analysis result
  - 你可以得到這些惡意程式的靜態分析結果(結構化的存在資料庫中)
  - 可以參考以下分析程式：https://seaborn.pydata.org/examples/structured_heatmap.html

Research

https://ieeexplore.ieee.org/document/6234407

Graph Analysis for Android Malware

雙尾蠍APT--C-23（https://www.freebuf.com/articles/system/129223.html
1. 地域性、目標性，中東，高考族群，罕見同時針對以色列與巴勒斯坦
2. 少數APT組織主要使用Mobile App
3. 充滿了混淆，攻擊模組比發動攻擊前早兩三個月完成，推估有計畫性的攻擊
4. C&C(Domain Name, IP), binary(MD5, Android, iOS)

2. Malware Data Science (ch4)- Identifying Attack Campaigns using Malware Networks

Book, https://www.malwaredatascience.com/

建議使用VM, https://www.malwaredatascience.com/ubuntu-virtual-machine

VM密碼與帳號一樣

DIR = ~/malware_data_science/ch4

data: APT1 samples and IoCs

code: ch4 exmaples

Listing4-8 and Listing4-12可以聚焦在這兩個範例，從IoC讀取binary的特徵，並與Binary建立Bipartie Graph，透過Python的NetworkX套件，輸出圖形模型.dot檔

Wish: 我們能將 https://github.com/sk3ptre/AndroidMalware_2020 的Android樣本，經過MobSF的分析，找出Android App的resources，然後進行分群，探討Android惡意程式家族間的關係。

如何完成此架構？

準備的東西：

Android Malware Dataset。https://github.com/sk3ptre/AndroidMalware_2020
需要一個Python程式，自動將APK送入MobSF的API，然後取得動靜態分析結果，存成檔案。https://gist.github.com/ajinabraham/0f5de3b0c7b7d3665e54740b9f536d8
需要另一個Python程式，從動靜態分析結果中，取得分析特徵，對於Android惡意程式進行分群。
透過視覺畫呈現看出Android malware的分群。

Android Malware Clustering?

Android Malware Clustering through Malicious Payload Mining, https://arxiv.org/pdf/1707.04795.pdf

Crowdroid: Behavior-Based Malware Detection System for Android, https://dl.acm.org/doi/pdf/10.1145/2046614.2046619?casa_token=nI2v4YxbLLMAAAAA:v5uzIBeTMA7903AcWxXO50mq4QV0CPeydyONBxkxc8OuGtNOlJCDbpxxMPAl7gDxFJXkXHr1xvAU

Familial Clustering For Weakly-labeled Android Malware Using Hybrid Representation Learning, https://www.researchgate.net/profile/Yulei_Sui4/publication/336599998_Familial_Clustering_For_Weakly-labeled_Android_Malware_Using_Hybrid_Representation_Learning/links/5f67fab0a6fdcc008631ce68/Familial-Clustering-For-Weakly-labeled-Android-Malware-Using-Hybrid-Representation-Learning.pdf

Android Malware Clustering using Community Detection on Android Packages Similarity Network (2020), (*****) https://arxiv.org/pdf/2005.06075.pdf

EC2: Ensemble Clustering and Classification for Predicting Android Malware Families (*****), https://kclpure.kcl.ac.uk/portal/files/126815599/EC2_Ensemble_Clustering_and_CHACKRABORTY_Acc2Aug2017_GREEN_AAM.pdf

Implementation

Step 3: 需要另一個Python程式，從動靜態分析結果中，取得分析特徵，對於Android惡意程式進行分群。

請參考 Malware Data Science Ch4- iDentifying attack caMpaignS uSing Malware networkS https://www.malwaredatascience.com/

Bipartite networks (二元網路）Building networks with networkx

Building a Shared image relationship network Page 54

***

進入到 ~/malware_data_science/ch4/code/

執行 ./run-listing-4-12.sh (這隻shell script是用來執行 python listing-4-12.py，預先存入資料夾等路徑。

*重點概念

listing-4-12.py是針對APT1家族進行網路行為分析，具體來說，透過Malware的IoC進行分析。

觸類旁通！！

我們可以將Android Malware送到MobSF的結果，類似IoC進行萃取，並建成分析圖。

請同學回去研究學習了解listing-4-12.py (之後會以此程式進行android家族分析的基礎)

APK preprocessing and features extraction

***** 把 Step 1與Step 2完成（針對一批APK透過MobSF進行自動化靜態動態分析），Input是 https://github.com/sk3ptre/AndroidMalware_20 ，Output是3個Dot，一個是apk-network.dot, apk.dot, network.dot。

記得之前教過，怎麼用Python code將APK送到MobSF然後產生分析結果：

MobSF API with Python example
usage: https://gist.github.com/ajinabraham/0f5de3b0c7b7d3665e54740b9f536d81
sudo pip install requests_toolbelt
python .... (要先確認API key of MobSF與source folder of APKs)

我們又記得，要先把MobSF叫喚起來

以Docker安裝部分

先安裝Docker，並直接執行：
docker pull opensecurity/mobile-security-framework-mobsf
docker run -it -p 8000:8000 opensecurity/mobile-security-framework-mobsf:latest
執行後可看到以下畫面，可以http://0.0.0.0:8000 登入系統

"""
MOBSF REST API Python Requests
"""

import json
import requests
from requests_toolbelt.multipart.encoder import MultipartEncoder

//要修改下面三個資訊
SERVER = "http://0.0.0.0:8000"
FILE = '/Users/ching-haomao/Downloads/AndroidMalware_2020-master-2/actionSpy/5f529573d5d4d067700e981f09c48069.apk'
APIKEY = 'dcd7f2740deed93676a6c1973cec14f65f48f52f93381d5f937aa6713f19aec8'


def upload():
    """Upload File"""
    print("Uploading file")
    multipart_data = MultipartEncoder(fields={'file': (FILE, open(FILE, 'rb'), 'application/octet-stream')})
    headers = {'Content-Type': multipart_data.content_type, 'Authorization': APIKEY}
    response = requests.post(SERVER + '/api/v1/upload', data=multipart_data, headers=headers)
    print(response.text)
    return response.text


def scan(data):
    """Scan the file"""
    print("Scanning file")
    post_dict = json.loads(data)
    headers = {'Authorization': APIKEY}
    response = requests.post(SERVER + '/api/v1/scan', data=post_dict, headers=headers)
    print(response.text)


def pdf(data):
    """Generate PDF Report"""
    print("Generate PDF report")
    headers = {'Authorization': APIKEY}
    data = {"hash": json.loads(data)["hash"]}
    response = requests.post(SERVER + '/api/v1/download_pdf', data=data, headers=headers, stream=True)
    with open("report.pdf", 'wb') as flip:
        for chunk in response.iter_content(chunk_size=1024):
            if chunk:
                flip.write(chunk)
    print("Report saved as report.pdf")


def json_resp(data):
    """Generate JSON Report"""
    print("Generate JSON report")
    headers = {'Authorization': APIKEY}
    data = {"hash": json.loads(data)["hash"]}
    response = requests.post(SERVER + '/api/v1/report_json', data=data, headers=headers)
    print(response.text)


def delete(data):
    """Delete Scan Result"""
    print("Deleting Scan")
    headers = {'Authorization': APIKEY}
    data = {"hash": json.loads(data)["hash"]}
    response = requests.post(SERVER + '/api/v1/delete_scan', data=data, headers=headers)
    print(response.text)

//need to modify for muliple APK files from a folder
//malware2020 dataset needs to unzip
RESP = upload()
scan(RESP)
json_resp(RESP)
pdf(RESP)
delete(RESP)

3. 下載 https://github.com/sk3ptre/AndroidMalware_2020 存到一個資料夾

Homework:

自動化處理一個APK
自動化處理一堆APKs

APK的Clustering 可以做什麼？

可以把Androids APKs的功能分開來
1. 透過所請求的Permissions類型
2. 透過Manifest的資訊
如果這些Android APKs都是惡意的ㄋ？
1. 對壞人分群，可以知道一小群一小群壞人是哪些？
2. 家族？
3. 慣用套件，慣用技巧
先建立一個關係
1. malicious APK 與 permissions之間的對應關係，Bipartite graph
2. U 就是 APK, V 就是 Permissions
3. 透過矩陣來呈現BiPartie Graph，相鄰矩陣來呈現(資料結構...)
4. 有沒有已經寫好的範例，我們可以參考？

核心分析模組 (core)

C1: Feature extraction: extracting the features from MobSF output

C2: APK-Permissions graph construction: constructing the bipartie graph for profiling the permissions behavior

C3: APK privilege clustering: clustering the permission behavior based on APK-permissions graph

Input: APK001- p1,p2,p3; APK002- p3, p4

Output: APK001-c1, APK002-c1, APK003-c2,....

Good news- malware data science 這本書第四章，有類似的程式碼與案例，請找尋listing-4-12.py，這是我們這學期課程的核心引擎，會修改這個引擎符合Android malware在permissions行為上的需求（只符合C1與C2），C3用來作分群的。

hard clustering vs soft clustering: k-means vs c-means: https://medium.com/fintechexplained/machine-learning-hard-vs-soft-clustering-dc92710936af#:~:text=What%20Is%20Hard%20Clustering%3F,positive%20or%20a%20negative%20tweet.

C3怎麼辦？(線索：從sklearn的k-means等相關clustering演算範例去研究)，C2- APKs-Permissions graph是input，output是每一群。

研究模組

請下課後，自主學習：參考listing-4-12.py，將APK-permissions的Graph建立起來。
1. 如何從MobSF裡面把APK與permission的資訊萃取出來？
2. 如何利用NetworkX套件將APK與Permission的關係建立？
3. 如何把圖畫出來？(文字、畫圖）

Note: 從MobSF的output files中萃取permissions，透過NetworkX的networkx.Graph()，將APK與Permission的節點與連結建立，再透過bipartite.projected_graph(network, APK or Permissions).

請下課後，自主學習：https://github.com/qwp8510/Machine-Learning-K-means-clustering/blob/master/K-means%20Clustering%20in%20Python.ipynb
1. 目標：了解如何透過sklearn使用k-means
2. 資料input格式，資料output格式
3. 模型如何學習(參數設定，training, testing一些做法）
4. 結果如何呈現（文字，畫圖）

Core- APKs clustering

千頭萬緒？！浮木？需要有分群的例子，最好是跟惡意程式相關=> Malware Data Scicence ch5

(1) 分群？Similarity相似度，APK分群，那一定要分析Sim(APK1, APK2)，how to calculate the similarity between APKs?

(2) to calculate the similarity between APKs, extracting the feature vectors, choose a similarity function (or design a similarity functions for your purpose)

Similarity options (https://medium.com/qiubingcheng/%E6%AD%90%E6%B0%8F%E8%B7%9D%E9%9B%A2%E8%88%87%E9%A4%98%E5%BC%A6%E7%9B%B8%E4%BC%BC%E5%BA%A6%E7%9A%84%E6%AF%94%E8%BC%83-c78163ad51b)
- Euclidean similarity (絕對空間特徵向量)
  - https://blog.csdn.net/ifnoelse/article/details/7766038
- Cosine similarity (向量空間特徵向量)
  - https://zh.wikipedia.org/zh-tw/%E4%BD%99%E5%BC%A6%E7%9B%B8%E4%BC%BC%E6%80%A7
- Edit similarity (https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/431711/ )
- Jaccard similarity
  - S1= "ABCD", S2= "CDE" Jaccard_Sim(S1, S2)= (CD)/ABCDE=2/5=0.4
- ...designed by yourself

Malware Data Science: Attack Detection and Attribution

APK privilege pattern analysis, also called similarity analysis of APK permission usage, is the process by which we compare two APK samples by estimating the percentage of permissions. It differs from shared attribute analysis, which compares APKs samples based on their permissions (ACCESS_FINE_LOCATION, ACCESS_COARSE_LOCATIION, or BLUETOOTH_ADMIN permissions).

In MobSF static analysis result, APK privilege pattern analysis helps identify APK samples that can be analyzed together (because they were generated from the same Android malware toolkit or are different versions of the same Android malware family), which can determine whether the same developers could have been responsible for a group of Android malware samples.

Symantec discovered Android Malware Toolkit named DendroidThe Hacker News

Android Malware Toolkit Poses as Porn Apps Targeting Chinese-speaking Userssymantec

APK1: P1, P2, P3 (1,0,0,0,1,0)

APK2: P2, P3, P4 (1,0,0,0,1,1)

Sim(APK1, APK2) = > Jaccard Similarity

APK-Permission matrix -> similarity calculation -> APK-APK matrix -> choose a threshold for the construction of similarity graph (lamda = 0.8?) ->

An example of the kind of visualization you will learn to create in this chapter, showing shared permission relationships between some of the [2020 latest Android malware samples]

首先，透過Listing-5-1.py針對APT1的binary做一些事情(../data)，輸出到similarity_graph.dot:

#!/usr/bin/python

import argparse
import os
import networkx
from networkx.drawing.nx_pydot import write_dot
import itertools
import pprint

"""
Copyright (c) 2015, Joshua Saxe
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
    * Redistributions of source code must retain the above copyright
      notice, this list of conditions and the following disclaimer.
    * Redistributions in binary form must reproduce the above copyright
      notice, this list of conditions and the following disclaimer in the
      documentation and/or other materials provided with the distribution.
    * Neither the name 'Joshua Saxe' nor the
      names of its contributors may be used to endorse or promote products
      derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL JOSHUA SAXE BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""



def jaccard(set1,set2):
    """
    Compute the Jaccard distance between two sets by taking
    their intersection, union and then dividing the number
    of elements in the intersection by the number of elements
    in their union.
    """
    intersection = set1.intersection(set2)
    intersection_length = float(len(intersection))
    union = set1.union(set2)
    union_length = float(len(union))
    return intersection_length / union_length

def getstrings(fullpath):
    """
    Extract strings from the binary indicated by the 'fullpath'
    parameter, and then return the set of unique strings in
    the binary.
    """
    strings = os.popen("strings '{0}'".format(fullpath)).read()
    strings = set(strings.split("\n"))
    return strings

def pecheck(fullpath):
    """
    Do a cursory sanity check to make sure 'fullpath' is
    a Windows PE executable (PE executables start with the
    two bytes 'MZ')
    """
    return open(fullpath).read(2) == "MZ"

if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description="Identify similarities between malware samples and build similarity graph"
    )

    parser.add_argument(
        "target_directory",
        help="Directory containing malware"
    )

    parser.add_argument(
        "output_dot_file",
        help="Where to save the output graph DOT file"
    )

    parser.add_argument(
        "--jaccard_index_threshold","-j",dest="threshold",type=float,
        default=0.8,help="Threshold above which to create an 'edge' between samples"
    )

    args = parser.parse_args()
    malware_paths = [] # where we'll store the malware file paths
    malware_attributes = dict() # where we'll store the malware strings
    graph = networkx.Graph() # the similarity graph

    for root, dirs, paths in os.walk(args.target_directory):
        # walk the target directory tree and store all of the file paths
        for path in paths:
            full_path = os.path.join(root,path)
            malware_paths.append(full_path)

    # filter out any paths that aren't PE files
    malware_paths = filter(pecheck, malware_paths)

    # get and store the strings for all of the malware PE files
    for path in malware_paths:
        attributes = getstrings(path)
        print "Extracted {0} attributes from {1} ...".format(len(attributes),path)
        malware_attributes[path] = attributes

        # add each malware file to the graph
        graph.add_node(path,label=os.path.split(path)[-1][:10])

    # iterate through all pairs of malware
    for malware1,malware2 in itertools.combinations(malware_paths,2):

        # compute the jaccard distance for the current pair
        jaccard_index = jaccard(malware_attributes[malware1],malware_attributes[malware2])

        # if the jaccard distance is above the threshold add an edge
        if jaccard_index > args.threshold:
            print malware1,malware2,jaccard_index
            graph.add_edge(malware1,malware2,penwidth=1+(jaccard_index-args.threshold)*10)

    # write the graph to disk so we can visualize it
    write_dot(graph,args.output_dot_file)

import
- import argparse
  - Python處理執行參數(e.g., python listing51.py arg1, arg2)
  - 線上教學連結
- import os
- import networkx from networkx.drawing.nx_pydot
  - 建立graph重要套件
- import write_dot
- import itertools
  - 高效率的迭代器參考
- import pprint
  - pretty print
def
- jaccard 相似度計算
- getstrings取的獨一為二識別indentifier (檔案路徑，for us，APK path)
- pecheck 用來確認檔案是否為binary分析標的(過濾ioc, yara, txt, for us, is APK or not?)
main
- 先用argparse處理參數
- line 91- APK_paths
- line 92- APKattributes (APK_permission)
- line 93- pecheck是一個call-back function
- line 120- args.threshold (可以看line 87, default=0.8)
  malware_paths = filter(pecheck, malware_paths)

malwareattributes (m1(path): {p1, p2, p3}, m2:{p2,p3,p4})

分析 threshold = 0.8 與 0.3 有什麼差別？(0.3, 0.4, ..., 0.9, 0.99)

期末報告協作模式

請大家申請github 帳號
請folk https://github.com/ericmao/malware-app-analysis
安裝 https://desktop.github.com/
安裝 VSCode https://code.visualstudio.com/download
試著編輯readme.md
commit to local repository, push from local repository

PreviousMalware Analysis NextDetection, Tagging and Mitigation

Last updated 4 years ago

Was this helpful?