威胁情报质量测试——针对TI Feeds的数据可视化和分析

2017年03月25日

1. 研究情况简介

威胁情报质量测试,全称“Threat Intelligence Quotient Test”,简称“tiq-test”,是对威胁情报指标feeds的数据可视化和统计分析(Dataviz and Statistical Analysis of Threat Intelligence Indicator feeds)。

该研究在BSides LV 2014、DEF CON 22、OpenDNS S4 IRespond和HushCon 2014上以“测量你的威胁情报feeds的智商(Measuring the IQ of your threat intelligence feeds)”,以及在nbtcon 2014和SANS CTI Summit 2015上以“从威胁情报到智慧防御:一种数据科学方法(From Threat Intelligence to Defense Cleverness: A Data Science Approach)”两个议题进行公开介绍和分享。

本文也围绕并总结这两个议题,对威胁情报质量测试的具体方法和内容进行介绍和分析。

2.两个议题内容归纳

议题:测量你的威胁威胁情报feeds的智商,对该部分的解读,将结合参考【3】的网页文章以及参考【4】的PPT进行介绍。

议题:从威胁情报到智慧防御:一种数据科学方法,对该部分的解读,将结合参考【6】的网页文章以及参考【7】的PPT进行介绍。

对两个议题进行归纳总结,主要包括对TIQ-Test的介绍及相关测试步骤,如下。

2.1 准备测试步骤

该部分内容主要参考【3】和【4】。

  • 1.从indicator feeds中提取原始数据信息

主要提取IP地址和主机名等信息。

代码:

outbound.ti = tiq.data.loadTI("raw", "public_outbound", "20140701")
outbound.ti[, list(entity, type, direction, source, date)]

提取结果:

##                      entity type direction     source       date
##     1:         1.224.163.26 IPv4  outbound alienvault 2014-07-01
##     2:         1.242.99.155 IPv4  outbound alienvault 2014-07-01
##     3:           1.85.2.118 IPv4  outbound alienvault 2014-07-01
##     4:           1.93.1.162 IPv4  outbound alienvault 2014-07-01
##     5:         1.93.161.204 IPv4  outbound alienvault 2014-07-01
##    ---                                                          
## 16298:         winscoft.com FQDN  outbound       zeus 2014-07-01
## 16299:           wmzbase.ru FQDN  outbound       zeus 2014-07-01
## 16300:       zhabademon.net FQDN  outbound       zeus 2014-07-01
## 16301: zhangleetranding.com FQDN  outbound       zeus 2014-07-01
## 16302:         znatnydom.by FQDN  outbound       zeus 2014-07-01
  • 2.选择Feeds

数据根据网络流方向,分为“inbound”和“outbound”,“ inbound”可以看作网络流的来源,“outbound”可以看作网络流的目标。

outbound代码

outbound.ti = tiq.data.loadTI("raw", "public_outbound", "20140701")
unique(outbound.ti$source)

outbound结果

##  [1] "alienvault"        "botscout"          "malcode"          
##  [4] "malcode_zones"     "malwaredomainlist" "malwaredomains"   
##  [7] "malwaregroup"      "palevotracker"     "spyeye"           
## [10] "zeus"

inbound代码

inbound.ti = tiq.data.loadTI("raw", "public_inbound", "20140701")
unique(inbound.ti$source)

inbound结果

##  [1] "alienvault"        "autoshun"          "blocklistde"      
##  [4] "bruteforceblocker" "charleshaley"      "ciarmy"           
##  [7] "dragonresearch"    "dshield"           "honeypot"         
## [10] "openbl"            "packetmail"        "virbl"
  • 3.数据预处理和扩充

移除IP中的局域网的IP地址。

对于每个IP记录,添加asnumber和asname(使用Maxmind ASN DB)、添加国家信息(使用Max Mind GeoLite DB)、添加rhost信息(使用DNSDB)

数据预处理和扩充代码:

enrich.ti = tiq.data.loadTI("enriched", "public_outbound", "20140710")
enrich.ti = enrich.ti[, notes := NULL]
enrich.ti[c(2,22264, 22266)]

填充后的部分数据样式:

##            entity type direction     source       date asnumber
## 1:   1.224.163.26 IPv4  outbound alienvault 2014-07-10     9318
## 2: 95.181.178.177 IPv4  outbound       zeus 2014-07-10    57311
## 3: 98.131.185.136 IPv4  outbound       zeus 2014-07-10    32392
##                                    asname country
## 1:                    Hanaro Telecom Inc.      KR
## 2: FOP ILIUSHENKO VOLODYMYR OLEXANDROVUCH      GB
## 3:                  Ecommerce Corporation      US
##                          host                   rhost
## 1:                         NA                      NA
## 2:           newdomaininfo.ru host178-177.neohost.net
## 3: projects.globaltronics.net                      NA
2.2测试数据

结合两个议题的内容,主要的测试内容有新鲜度测试(Novelty Test)、覆盖测试(Overlap Test)、地域测试(Population Test)、过期测试(Aging Test)、非凡测试(Uniqueness Test)。测试的内容主要参考【6】【7】。

  • 1.新鲜度测试

新鲜度测试是用来测量和比较添加或删除指标的情况(Measuring added and droppend indicators)。

Inbound测试代码如下:

inbound.novelty = tiq.test.noveltyTest("public_inbound", "20141001", "20141130", 
                                             select.sources=c("alienvault", "blocklistde", 
                                                                "dshield", "charleshaley"),
                                                                             .progress=FALSE)
tiq.test.plotNoveltyTest(inbound.novelty, title="Novelty Test - Inbound Indicators")

Inbound测试结果如下:

TIQ-Test 01

Outbound测试代码如下:

outbound.novelty = tiq.test.noveltyTest("public_outbound", "20141001", "20141130", 
                                        select.sources=c("alienvault", "malwaregroup", 
                                                         "malcode", "zeus"),
                                                                             .progress=FALSE)
tiq.test.plotNoveltyTest(outbound.novelty, title="Novelty Test - Outbound Indicators")

Outbound测试结果如下:

TIQ-Test 02

  • 2.覆盖测试

覆盖测试用来测量和比较数据的重复情况。

覆盖度测试代码如下:

overlap = tiq.test.overlapTest("public_inbound", "20141101", "enriched", 
                               select.sources=NULL)
tiq.test.plotOverlapTest(overlap, title="Overlap Test - Inbound Data - 20141101")

inbound数据覆盖度测试结果如下:

TIQ-Test 03

  • 3.地域测试

地域测试是用来测量和比较流入、流出数据的地域分布情况。

地域测试代码如下:

outbound.pop = tiq.test.extractPopulationFromTI("public_outbound", "country", 
                                                date = "20141111",
                                                select.sources=NULL, split.ti=F)
inbound.pop = tiq.test.extractPopulationFromTI("public_inbound", "country", 
                                               date = "20141111",
                                               select.sources=NULL, split.ti=F)

complete.pop = tiq.data.loadPopulation("mmgeo", "country")
tiq.test.plotPopulationBars(c(inbound.pop, outbound.pop, complete.pop), "country")

地域测试结果:

TIQ-Test 04

地域测试参考代码:

outbound.pop = tiq.test.extractPopulationFromTI("public_outbound", "country", 
                                                date = "20141111",
                                                select.sources=NULL,
                                                split.ti=FALSE)
complete.pop = tiq.data.loadPopulation("mmgeo", "country")
tests = tiq.test.populationInference(complete.pop$mmgeo, 
                                     outbound.pop$public_outbound, "country",
                                     exact = TRUE, top=10)

# Whose proportion is bigger than it should be?
tests[p.value < 0.05/10 & conf.int.end > 0][order(conf.int.end, decreasing=T)]

地域测试参考结果:

##    country conf.int.start conf.int.end       p.value
## 1:      CN    0.065323470   0.08150539  4.762343e-97
## 2:      RU    0.037520304   0.04752410 3.878014e-141
## 3:      HK    0.036605616   0.04563014 2.614466e-265
## 4:      UA    0.025893666   0.03373351 1.034106e-168
## 5:      NL    0.013268828   0.02084241  7.764204e-30
## 6:      DE    0.009467889   0.01878853  3.370376e-11
  • 4.过期测试

过期测试是用来观察指标信息最终是否被清除的情况。

过期测试代码:

outbound.aging = tiq.test.agingTest("public_outbound", "20141001", "20141130")
tiq.test.plotAgingTest(outbound.aging, title="Aging Test - Outbound Data")

过期测试结果:

TIQ-Test 05

  • 5.非凡测试

非凡测试代码:

uniqueTest = rbind(
    tiq.test.uniquenessTest("public_inbound", "20141001","20141001", "raw", split.tii = T),
    tiq.test.uniquenessTest("public_inbound", "20141001","20141015", "raw", split.tii = T),
    tiq.test.uniquenessTest("public_inbound", "20141001","20141030", "raw", split.tii = T),
    tiq.test.uniquenessTest("public_inbound", "20141001","20141129", "raw", split.tii = T)
)

uniqueTest[count == 1]

非凡测试结果:

##    count     ratio days
## 1:     1 0.9759170    1
## 2:     1 0.9737684   15
## 3:     1 0.9727630   30
## 4:     1 0.9710713   60

测试结果图示:

TIQ-Test 06

2.个人总结及点评

本方案主要基于威胁情报信息的统计特征进行分析,通过对统计特征的时间属性、地理位置属性等进行统计分析判断,并根据统计分析结果判断不同威胁情报feeds源的质量。

本情报测试的测试结果,仅仅针对威胁情报源的feeds的特征,而没有考虑情报数据之间的关联属性。而根据实际基于威胁情报分析的场景和用例,对威胁情报进行关联分析判断的结果,对于判断威胁情报本身的质量,是起着重要的作用的。

参考

【1】tiq-test - Threat Intelligence Quotient Test,https://github.com/mlsecproject/tiq-test

【2】tiq-test-Summer2014,https://github.com/mlsecproject/tiq-test-Summer2014

【3】Measuring the IQ of your Threat Intelligence - Summer 2014,http://rpubs.com/alexcpsec/tiq-test-Summer2014-2

【4】Measuring the IQ of your Threat Intelligence Feeds (#tiqtest),https://www.slideshare.net/AlexandrePinto10/defcon-22-measuring-the

【5】tiq-test-Winter2015,https://github.com/mlsecproject/tiq-test-Winter2015

【6】From Threat Intelligence to Defense Cleverness: A Data Science Approach,http://rpubs.com/alexcpsec/tiq-test-Winter2015

【7】From Threat Intelligence to Defense Cleverness: A Data Science Approach ,https://digital-forensics.sans.org/summit-archives/cti2015/From-Threat-Intelligence-to-Defense-Cleverness–A-Data-Science-Approach–Alex-Pinto-Niddel.pdf


版权声明:本文为博主原创文章,转载请注明出处 本文总阅读量