Mobile-Security-Framework-MobSF
MobSF Docker位置:https://hub.docker.com/r/opensecurity/mobile-security-framework-mobsf/
參考資料
https://www.liqixin.net/archives/127
可看看,精簡安裝,2020/04的內容
https://www.freebuf.com/column/220190.html
建議與REF 15與REF 17一起看
背景介紹不錯
版本與現在的較接近
https://blog.csdn.net/vivian_ll/article/details/81092231
類似REF16,安裝步驟太瑣碎,參考。
https://blog.csdn.net/hellomanshan/article/details/78229613
類似REF16,安裝步驟太瑣碎,參考。
https://my.oschina.net/u/4346652/blog/4318594
類似REF16,安裝步驟太瑣碎,參考。
https://www.twblogs.net/a/5b8cfa492b7177188338206d
類似REF 16,參考就好
http://blog.jason.tools/2019/09/2020-ironman-03.html
值得一讀,除了有精簡的安裝流程,並可參考其背景介紹以及簡單的使用方式。
可以搭配REF 17一起看。
https://www.itread01.com/p/45521.html
2018技術文章,MobSF為0.9.2(現在版本為 v3.1 beta)
並沒有採用Docker
以VirtualBox,採用Linux作業系統安裝
參考就好
https://ithelp.ithome.com.tw/articles/10209033
大二資工人-DYA25-終於成功安裝Mobsf,介紹docker與直接用github的源碼進行安裝
以Docker安裝部分
先安裝Docker,並直接執行:
docker pull opensecurity/mobile-security-framework-mobsf
docker run -it -p 8000:8000 opensecurity/mobile-security-framework-mobsf:latest
執行後可看到以下畫面,可以http://0.0.0.0:8000 登入系統
以github直接安裝
從github pull源碼
裝python requirements
java環境JDK
用Docker比較推薦先試試


Where are the Android Malware and Datasets?
DroidCat, http://www.people.vcu.edu/~rashidib/Res_files/DroidCatDataset.htm
利用DroidCat取得Android APP的behavior logs
DroidCat被發表在Computer & Security期刊與CNS會議(2016)
如果需要分析Andorid APP行為軌跡,可以跟作者要Dataset
Drebin, https://www.sec.cs.tu-bs.de/~danarp/drebin/
充足Dataset從Android Market, Chinese Market and Russian Market,還有http://www.malgenomeproject.org,總共超過14萬隻Binary,其中5000+是Malicious。
Drebin是這個研究團隊提出來的輕量化Android Malware偵測技術,透過Static Features進行判斷,期待改善Andorid AV在效能上的限制,並確保準確度。
挑選VirusTotal十個AV投票,超過兩個AV,即判斷為Malicious。
這篇論文可作為寫研究論文的範例,尤其在邏輯條理,論述結構,富有參考價值。
https://www.researchgate.net/post/Where_can_I_get_Android_Malware_Samples
作者希望找尋Android malware的Dataset,為了靜態分析研究用
298個樣本,https://github.com/ashishb/android-malware (說明: http://sanddroid.xjtu.edu.cn:8080/) (*****)
其中xHelper是今年的Android malware (trojan) https://blog.malwarebytes.com/android/2020/02/new-variant-of-android-trojan-xhelper-reinfects-with-help-from-google-play/
Christian Camilo Urcuqui López的整理非常不錯(包含知名的Dataset for Andorid Malware)
https://github.com/sk3ptre/AndroidMalware_2020
趕快收集下來 (*****)
https://www.unb.ca/cic/datasets/andmal2017.html
下載Android Malware Dataset (CIC-AndMal2017)與Android Adware and General Malware Dataset (CIC-AAGM2017) (需填寫資料)
New information
Using MobSF for APP testing
Android malware download:
Start up from Docker images of MobSF
docker run -it -p 8000:8000 opensecurity/mobile-security-framework-mobsf:latest
testing the APP from browser
MobSF API with Python example
sudo pip install requests_toolbelt
python .... (要先確認API key of MobSF與source folder of APKs)
Task:to extract the static analysis result from a batch of Android malware samples by MobSF's API with Python.
Please refer to the above example
Source folder: /Users/ching-haomao/Downloads/AndroidMalware_2020-master
API Key: find out from MobSF's API
you might need to modify the above example due to batch processing (APKs)
unzipped from batched files
.apk
how to store the analysis result
你可以得到這些惡意程式的靜態分析結果(結構化的存在資料庫中)
Research
Graph Analysis for Android Malware
雙尾蠍APT--C-23(https://www.freebuf.com/articles/system/129223.html
地域性、目標性,中東,高考族群,罕見同時針對以色列與巴勒斯坦
少數APT組織主要使用Mobile App
充滿了混淆,攻擊模組比發動攻擊前早兩三個月完成,推估有計畫性的攻擊
C&C(Domain Name, IP), binary(MD5, Android, iOS)

2. Malware Data Science (ch4)- Identifying Attack Campaigns using Malware Networks
Book, https://www.malwaredatascience.com/
建議使用VM, https://www.malwaredatascience.com/ubuntu-virtual-machine
VM密碼與帳號一樣
DIR = ~/malware_data_science/ch4
data: APT1 samples and IoCs
code: ch4 exmaples
Listing4-8 and Listing4-12可以聚焦在這兩個範例,從IoC讀取binary的特徵,並與Binary建立Bipartie Graph,透過Python的NetworkX套件,輸出圖形模型.dot檔
Wish: 我們能將 https://github.com/sk3ptre/AndroidMalware_2020 的Android樣本,經過MobSF的分析,找出Android App的resources,然後進行分群,探討Android惡意程式家族間的關係。

如何完成此架構?
準備的東西:
Android Malware Dataset。https://github.com/sk3ptre/AndroidMalware_2020
需要一個Python程式,自動將APK送入MobSF的API,然後取得動靜態分析結果,存成檔案。https://gist.github.com/ajinabraham/0f5de3b0c7b7d3665e54740b9f536d8
需要另一個Python程式,從動靜態分析結果中,取得分析特徵,對於Android惡意程式進行分群。
透過視覺畫呈現看出Android malware的分群。
Android Malware Clustering?
Related Work
Android Malware Clustering through Malicious Payload Mining, https://arxiv.org/pdf/1707.04795.pdf

Crowdroid: Behavior-Based Malware Detection System for Android, https://dl.acm.org/doi/pdf/10.1145/2046614.2046619?casa_token=nI2v4YxbLLMAAAAA:v5uzIBeTMA7903AcWxXO50mq4QV0CPeydyONBxkxc8OuGtNOlJCDbpxxMPAl7gDxFJXkXHr1xvAU
Familial Clustering For Weakly-labeled Android Malware Using Hybrid Representation Learning, https://www.researchgate.net/profile/Yulei_Sui4/publication/336599998_Familial_Clustering_For_Weakly-labeled_Android_Malware_Using_Hybrid_Representation_Learning/links/5f67fab0a6fdcc008631ce68/Familial-Clustering-For-Weakly-labeled-Android-Malware-Using-Hybrid-Representation-Learning.pdf
Android Malware Clustering using Community Detection on Android Packages Similarity Network (2020), (*****) https://arxiv.org/pdf/2005.06075.pdf

EC2: Ensemble Clustering and Classification for Predicting Android Malware Families (*****), https://kclpure.kcl.ac.uk/portal/files/126815599/EC2_Ensemble_Clustering_and_CHACKRABORTY_Acc2Aug2017_GREEN_AAM.pdf
Implementation
Step 3: 需要另一個Python程式,從動靜態分析結果中,取得分析特徵,對於Android惡意程式進行分群。
請參考 Malware Data Science Ch4- iDentifying attack caMpaignS uSing Malware networkS https://www.malwaredatascience.com/
Bipartite networks (二元網路)Building networks with networkx
Building a Shared image relationship network Page 54
***
進入到 ~/malware_data_science/ch4/code/
執行 ./run-listing-4-12.sh (這隻shell script是用來執行 python listing-4-12.py,預先存入資料夾等路徑。

*重點概念
listing-4-12.py是針對APT1家族進行網路行為分析,具體來說,透過Malware的IoC進行分析。
觸類旁通!!
我們可以將Android Malware送到MobSF的結果,類似IoC進行萃取,並建成分析圖。
請同學回去研究學習了解listing-4-12.py (之後會以此程式進行android家族分析的基礎)
APK preprocessing and features extraction
***** 把 Step 1與Step 2完成(針對一批APK透過MobSF進行自動化靜態動態分析),Input是 https://github.com/sk3ptre/AndroidMalware_20 ,Output是3個Dot,一個是apk-network.dot, apk.dot, network.dot。
記得之前教過,怎麼用Python code將APK送到MobSF然後產生分析結果:
MobSF API with Python example
sudo pip install requests_toolbelt
python .... (要先確認API key of MobSF與source folder of APKs)
我們又記得,要先把MobSF叫喚起來
以Docker安裝部分
先安裝Docker,並直接執行:
docker pull opensecurity/mobile-security-framework-mobsf
docker run -it -p 8000:8000 opensecurity/mobile-security-framework-mobsf:latest
執行後可看到以下畫面,可以http://0.0.0.0:8000 登入系統
"""
MOBSF REST API Python Requests
"""
import json
import requests
from requests_toolbelt.multipart.encoder import MultipartEncoder
//要修改下面三個資訊
SERVER = "http://0.0.0.0:8000"
FILE = '/Users/ching-haomao/Downloads/AndroidMalware_2020-master-2/actionSpy/5f529573d5d4d067700e981f09c48069.apk'
APIKEY = 'dcd7f2740deed93676a6c1973cec14f65f48f52f93381d5f937aa6713f19aec8'
def upload():
"""Upload File"""
print("Uploading file")
multipart_data = MultipartEncoder(fields={'file': (FILE, open(FILE, 'rb'), 'application/octet-stream')})
headers = {'Content-Type': multipart_data.content_type, 'Authorization': APIKEY}
response = requests.post(SERVER + '/api/v1/upload', data=multipart_data, headers=headers)
print(response.text)
return response.text
def scan(data):
"""Scan the file"""
print("Scanning file")
post_dict = json.loads(data)
headers = {'Authorization': APIKEY}
response = requests.post(SERVER + '/api/v1/scan', data=post_dict, headers=headers)
print(response.text)
def pdf(data):
"""Generate PDF Report"""
print("Generate PDF report")
headers = {'Authorization': APIKEY}
data = {"hash": json.loads(data)["hash"]}
response = requests.post(SERVER + '/api/v1/download_pdf', data=data, headers=headers, stream=True)
with open("report.pdf", 'wb') as flip:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
flip.write(chunk)
print("Report saved as report.pdf")
def json_resp(data):
"""Generate JSON Report"""
print("Generate JSON report")
headers = {'Authorization': APIKEY}
data = {"hash": json.loads(data)["hash"]}
response = requests.post(SERVER + '/api/v1/report_json', data=data, headers=headers)
print(response.text)
def delete(data):
"""Delete Scan Result"""
print("Deleting Scan")
headers = {'Authorization': APIKEY}
data = {"hash": json.loads(data)["hash"]}
response = requests.post(SERVER + '/api/v1/delete_scan', data=data, headers=headers)
print(response.text)
//need to modify for muliple APK files from a folder
//malware2020 dataset needs to unzip
RESP = upload()
scan(RESP)
json_resp(RESP)
pdf(RESP)
delete(RESP)
3. 下載 https://github.com/sk3ptre/AndroidMalware_2020 存到一個資料夾
Homework:
自動化處理一個APK
自動化處理一堆APKs
APK的Clustering 可以做什麼?
可以把Androids APKs的功能分開來
透過所請求的Permissions類型
透過Manifest的資訊
如果這些Android APKs都是惡意的ㄋ?
對壞人分群,可以知道一小群一小群壞人是哪些?
家族?
慣用套件,慣用技巧
先建立一個關係
malicious APK 與 permissions之間的對應關係,Bipartite graph
U 就是 APK, V 就是 Permissions
透過矩陣來呈現BiPartie Graph,相鄰矩陣來呈現(資料結構...)
有沒有已經寫好的範例,我們可以參考?

核心分析模組 (core)
C1: Feature extraction: extracting the features from MobSF output
C2: APK-Permissions graph construction: constructing the bipartie graph for profiling the permissions behavior
C3: APK privilege clustering: clustering the permission behavior based on APK-permissions graph
Input: APK001- p1,p2,p3; APK002- p3, p4
Output: APK001-c1, APK002-c1, APK003-c2,....
Good news- malware data science 這本書第四章,有類似的程式碼與案例,請找尋listing-4-12.py,這是我們這學期課程的核心引擎,會修改這個引擎符合Android malware在permissions行為上的需求(只符合C1與C2),C3用來作分群的。
hard clustering vs soft clustering: k-means vs c-means: https://medium.com/fintechexplained/machine-learning-hard-vs-soft-clustering-dc92710936af#:~:text=What%20Is%20Hard%20Clustering%3F,positive%20or%20a%20negative%20tweet.

C3怎麼辦?(線索:從sklearn的k-means等相關clustering演算範例去研究),C2- APKs-Permissions graph是input,output是每一群。

研究模組
請下課後,自主學習:參考listing-4-12.py,將APK-permissions的Graph建立起來。
如何從MobSF裡面把APK與permission的資訊萃取出來?
如何利用NetworkX套件將APK與Permission的關係建立?
如何把圖畫出來?(文字、畫圖)
Note: 從MobSF的output files中萃取permissions,透過NetworkX的networkx.Graph(),將APK與Permission的節點與連結建立,再透過bipartite.projected_graph(network, APK or Permissions).
目標:了解如何透過sklearn使用k-means
資料input格式,資料output格式
模型如何學習(參數設定,training, testing一些做法)
結果如何呈現(文字,畫圖)
Core- APKs clustering
千頭萬緒?!浮木?需要有分群的例子,最好是跟惡意程式相關=> Malware Data Scicence ch5
(1) 分群?Similarity相似度,APK分群,那一定要分析Sim(APK1, APK2),how to calculate the similarity between APKs?
(2) to calculate the similarity between APKs, extracting the feature vectors, choose a similarity function (or design a similarity functions for your purpose)
Similarity options (https://medium.com/qiubingcheng/%E6%AD%90%E6%B0%8F%E8%B7%9D%E9%9B%A2%E8%88%87%E9%A4%98%E5%BC%A6%E7%9B%B8%E4%BC%BC%E5%BA%A6%E7%9A%84%E6%AF%94%E8%BC%83-c78163ad51b)
Euclidean similarity (絕對空間特徵向量)
Cosine similarity (向量空間特徵向量)
Edit similarity (https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/431711/ )
Jaccard similarity
S1= "ABCD", S2= "CDE" Jaccard_Sim(S1, S2)= (CD)/ABCDE=2/5=0.4
...designed by yourself
Malware Data Science: Attack Detection and Attribution
APK privilege pattern analysis, also called similarity analysis of APK permission usage, is the process by which we compare two APK samples by estimating the percentage of permissions. It differs from shared attribute analysis, which compares APKs samples based on their permissions (ACCESS_FINE_LOCATION, ACCESS_COARSE_LOCATIION, or BLUETOOTH_ADMIN permissions).
In MobSF static analysis result, APK privilege pattern analysis helps identify APK samples that can be analyzed together (because they were generated from the same Android malware toolkit or are different versions of the same Android malware family), which can determine whether the same developers could have been responsible for a group of Android malware samples.
APK1: P1, P2, P3 (1,0,0,0,1,0)
APK2: P2, P3, P4 (1,0,0,0,1,1)
Sim(APK1, APK2) = > Jaccard Similarity

APK-Permission matrix -> similarity calculation -> APK-APK matrix -> choose a threshold for the construction of similarity graph (lamda = 0.8?) ->

An example of the kind of visualization you will learn to create in this chapter, showing shared permission relationships between some of the [2020 latest Android malware samples]
首先,透過Listing-5-1.py針對APT1的binary做一些事情(../data),輸出到similarity_graph.dot:

#!/usr/bin/python
import argparse
import os
import networkx
from networkx.drawing.nx_pydot import write_dot
import itertools
import pprint
"""
Copyright (c) 2015, Joshua Saxe
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name 'Joshua Saxe' nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL JOSHUA SAXE BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
def jaccard(set1,set2):
"""
Compute the Jaccard distance between two sets by taking
their intersection, union and then dividing the number
of elements in the intersection by the number of elements
in their union.
"""
intersection = set1.intersection(set2)
intersection_length = float(len(intersection))
union = set1.union(set2)
union_length = float(len(union))
return intersection_length / union_length
def getstrings(fullpath):
"""
Extract strings from the binary indicated by the 'fullpath'
parameter, and then return the set of unique strings in
the binary.
"""
strings = os.popen("strings '{0}'".format(fullpath)).read()
strings = set(strings.split("\n"))
return strings
def pecheck(fullpath):
"""
Do a cursory sanity check to make sure 'fullpath' is
a Windows PE executable (PE executables start with the
two bytes 'MZ')
"""
return open(fullpath).read(2) == "MZ"
if __name__ == '__main__':
parser = argparse.ArgumentParser(
description="Identify similarities between malware samples and build similarity graph"
)
parser.add_argument(
"target_directory",
help="Directory containing malware"
)
parser.add_argument(
"output_dot_file",
help="Where to save the output graph DOT file"
)
parser.add_argument(
"--jaccard_index_threshold","-j",dest="threshold",type=float,
default=0.8,help="Threshold above which to create an 'edge' between samples"
)
args = parser.parse_args()
malware_paths = [] # where we'll store the malware file paths
malware_attributes = dict() # where we'll store the malware strings
graph = networkx.Graph() # the similarity graph
for root, dirs, paths in os.walk(args.target_directory):
# walk the target directory tree and store all of the file paths
for path in paths:
full_path = os.path.join(root,path)
malware_paths.append(full_path)
# filter out any paths that aren't PE files
malware_paths = filter(pecheck, malware_paths)
# get and store the strings for all of the malware PE files
for path in malware_paths:
attributes = getstrings(path)
print "Extracted {0} attributes from {1} ...".format(len(attributes),path)
malware_attributes[path] = attributes
# add each malware file to the graph
graph.add_node(path,label=os.path.split(path)[-1][:10])
# iterate through all pairs of malware
for malware1,malware2 in itertools.combinations(malware_paths,2):
# compute the jaccard distance for the current pair
jaccard_index = jaccard(malware_attributes[malware1],malware_attributes[malware2])
# if the jaccard distance is above the threshold add an edge
if jaccard_index > args.threshold:
print malware1,malware2,jaccard_index
graph.add_edge(malware1,malware2,penwidth=1+(jaccard_index-args.threshold)*10)
# write the graph to disk so we can visualize it
write_dot(graph,args.output_dot_file)
def
jaccard 相似度計算
getstrings取的獨一為二識別indentifier (檔案路徑,for us,APK path)
pecheck 用來確認檔案是否為binary分析標的(過濾ioc, yara, txt, for us, is APK or not?)
main
先用argparse處理參數
line 91- APK_paths
line 92- APKattributes (APK_permission)
line 93- pecheck是一個call-back function
line 120- args.threshold (可以看line 87, default=0.8)
malware_paths = filter(pecheck, malware_paths)
malwareattributes (m1(path): {p1, p2, p3}, m2:{p2,p3,p4})
分析 threshold = 0.8 與 0.3 有什麼差別?(0.3, 0.4, ..., 0.9, 0.99)


期末報告協作模式
請大家申請github 帳號
安裝 VSCode https://code.visualstudio.com/download
試著編輯readme.md
commit to local repository, push from local repository
Last updated
Was this helpful?