Scrapy

解決問題 Scrapy (2)

教學目標

透過 Python 中的 Scrapy 架構程式實作多層次爬蟲功能解決收集特定內容的問題。

重點概念

首先任何提供服務的網站通常皆會提供目錄或搜尋功能讓我們能夠找到適當的資訊,此時若我們只知道關鍵字,則要如何進行先搜尋關鍵字之後,接著收集特定內容呢?這就是本篇教學要解決的問題,主要會透過 Python 中的 Scrapy 架構程式實作多層次爬蟲功能,以 Youtube 影片為例。

Scrapy 架構介紹

接著我們會透過 Scrapy 架構程式範例實作多層次爬蟲功能,主要有三個階段,分別為:

  1. 前置階段:透過網址開啟搜尋關鍵字的網頁。
  2. 搜尋階段:透過搜尋關鍵字的網頁,取得關鍵字對應內容的網址。
  3. 收集階段:透過關鍵字對應內容的網址,收集特定內容的資訊。

其中 Scrapy 架構中的資料流主要有五個元件,分別為:

  1. Scrapy Engine
  2. Scheduler
  3. Downloader
  4. Spiders
  5. Item Pipeline

Scrapy 架構程式範例

再來 Scrapy 架構程式範例僅會使用 Scrapy Engine、Spiders 和 Downloader 三個元件為主,請參考下述 Scrapy 架構程式範例。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import scrapy
class example(scrapy.spider.Spider):
name='example'
# 透過網址開啟搜尋關鍵字的網頁
keyword='keyword'
start_urls=['http://www.example.com?q='+keyword]
# 透過搜尋關鍵字的網頁,取得關鍵字對應內容的網址
def parse(self,response):
url = response.xpath('…').extract()
keyword = response.xpath('…').extract()
yield scrapy.http.Request(youtube_url, callback = self.parse2, meta={'keyword': keyword})
# 透過關鍵字對應內容的網址,收集特定內容的資訊。
def parse2(self,response):
content = response.xpath('…').extract()
yield {'keyword': response.meta["keyword"]}
yield {'url': response.url}
yield {'content': content}

其中 scrapy.http.Request 物件主要是代表 HTTP 請求,其主要參數主要有 url、callback 和 meta,所謂 url 代表請求的網址,callback 則是請求回應的函數,最後 meta 則是傳送參數,主要是在 Spider 中產生,並且由 Downloader 執行,之後產生 Reponse 物件。其中所謂 scrapy.http.Response 物件主要是代表 HTTP 回應通常被下載至 Spiders 中進行處理。

Scrapy 架構程式應用

最後我們就以 Scrapy 架構程式範例套用至 Youtube 網站收集特定關鍵字影片的觀看數、喜歡數和不喜歡數,主要有三個階段,分別為:

  1. 前置階段:透過 Youtube 網址開啟搜尋關鍵字的網頁。
  2. 搜尋階段:透過 Youtube 搜尋關鍵字的網頁,取得關鍵字對應內容的 Youtube 影片網址 。
  3. 收集階段:透過關鍵字對應內容的 Youtube 影片網址,收集影片觀看數、喜歡數和不喜歡數的資訊。

編輯 YoutubeDataSpider.py 程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import scrapy
import urllib
class YoutubeDataSpider(scrapy.spiders.Spider):
# 透過 Youtube 網址開啟搜尋關鍵字的網頁。
name = 'YoutubeDataSpider'
keywords = ['SAS Viya','SAS 9']
start_urls = []
for keyword in keywords:
start_urls.append('https://www.youtube.com/results?search_query='+urllib.parse.quote_plus(keyword))
# 透過 Youtube 搜尋關鍵字的網頁,取得關鍵字對應內容的 Youtube 影片網址
def parse(self, response):
keyword = response.xpath('//title/text()')[0].extract()
url = 'https://www.youtube.com' + response.xpath('//div[@class="yt-lockup-content"]//a//@href')[0].extract()
yield scrapy.http.Request(url, callback = self.parse2, meta={'keyword': keyword})
# 透過關鍵字對應內容的 Youtube 影片網址,收集影片觀看數、喜歡數和不喜歡數的資訊
def parse2(self, response):
views = response.xpath('//div[@class="watch-view-count"]/text()')[0].extract().replace("views","").replace(",","").replace(" ","")
likes = response.xpath('//button[contains(@aria-label, "like")]//@aria-label')[0].extract().replace("like this video along with","").replace(" other people","").replace(" other person","").replace(",","").replace(" ","")
dislikes = response.xpath('//button[contains(@aria-label, "dislike")]//@aria-label')[0].extract().replace("dislike this video along with","").replace(" other people","").replace(" other person","").replace(",","").replace(" ","")
yield {'keyword': response.meta["keyword"]}
yield {'url': response.url}
yield {'view': views}
yield {'like': likes}
yield {'dislike': dislikes}

執行 YoutubeDataSpider.py 程式碼

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
> scrapy runspider YoutubeDataSpider.py -o result.json
2017-10-28 20:09:33 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-10-28 20:09:33 [scrapy.utils.log] INFO: Overridden settings: {'FEED_FORMAT': 'json', 'FEED_URI': 'result.json', 'SPIDER_LOADER_WARN_ONLY': True}
2017-10-28 20:09:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2017-10-28 20:09:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-10-28 20:09:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-10-28 20:09:33 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-10-28 20:09:33 [scrapy.core.engine] INFO: Spider opened
2017-10-28 20:09:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-10-28 20:09:33 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-10-28 20:09:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.youtube.com/results?search_query=SAS+9> (referer: None)
2017-10-28 20:09:34 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.youtube.com/results?search_query=SAS+Viya> (referer: None)
2017-10-28 20:09:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.youtube.com/watch?v=6cIppmnzL6M> (referer: https://www.youtube.com/results?search_query=SAS+9)
2017-10-28 20:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=6cIppmnzL6M>
{'keyword': 'SAS 9 - YouTube'}
2017-10-28 20:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=6cIppmnzL6M>
{'url': 'https://www.youtube.com/watch?v=6cIppmnzL6M'}
2017-10-28 20:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=6cIppmnzL6M>
{'view': '9860'}
2017-10-28 20:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=6cIppmnzL6M>
{'like': '25'}
2017-10-28 20:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=6cIppmnzL6M>
{'dislike': '1'}
2017-10-28 20:09:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.youtube.com/watch?v=2wXAUBLJGuo&list=PLVBcK_IpFVi8gMnQgAwBWrn0yqjyBnCLV> (referer: https://www.youtube.com/results?search_query=SAS+Viya)
2017-10-28 20:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=2wXAUBLJGuo&list=PLVBcK_IpFVi8gMnQgAwBWrn0yqjyBnCLV>
{'keyword': 'SAS Viya - YouTube'}
2017-10-28 20:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=2wXAUBLJGuo&list=PLVBcK_IpFVi8gMnQgAwBWrn0yqjyBnCLV>
{'url': 'https://www.youtube.com/watch?v=2wXAUBLJGuo&list=PLVBcK_IpFVi8gMnQgAwBWrn0yqjyBnCLV'}
2017-10-28 20:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=2wXAUBLJGuo&list=PLVBcK_IpFVi8gMnQgAwBWrn0yqjyBnCLV>
{'view': '3866'}
2017-10-28 20:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=2wXAUBLJGuo&list=PLVBcK_IpFVi8gMnQgAwBWrn0yqjyBnCLV>
{'like': '22'}
2017-10-28 20:09:35 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.youtube.com/watch?v=2wXAUBLJGuo&list=PLVBcK_IpFVi8gMnQgAwBWrn0yqjyBnCLV>
{'dislike': '4'}
2017-10-28 20:09:35 [scrapy.core.engine] INFO: Closing spider (finished)
2017-10-28 20:09:35 [scrapy.extensions.feedexport] INFO: Stored json feed (10 items) in: result.json
2017-10-28 20:09:35 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1260,
'downloader/request_count': 4,
'downloader/request_method_count/GET': 4,
'downloader/response_bytes': 154656,
'downloader/response_count': 4,
'downloader/response_status_count/200': 4,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 10, 28, 12, 9, 35, 613798),
'item_scraped_count': 10,
'log_count/DEBUG': 15,
'log_count/INFO': 8,
'request_depth_max': 1,
'response_received_count': 4,
'scheduler/dequeued': 4,
'scheduler/dequeued/memory': 4,
'scheduler/enqueued': 4,
'scheduler/enqueued/memory': 4,
'start_time': datetime.datetime(2017, 10, 28, 12, 9, 33, 973120)}
2017-10-28 20:09:35 [scrapy.core.engine] INFO: Spider closed (finished)

總結我們透過 Python 中的 Scrapy 架構程式實作多層次爬蟲功能,本篇主要以 Youtube 影片為例收集影片觀看數、喜歡數和不喜歡數的資訊。

相關資源

解決問題 Scrapy (1)

教學目標

解決如何在 Windows 作業系統上安裝 Scrapy 爬蟲開源碼套件的問題。

重點概念

首先若要在 Windows 作業系統上安裝 Scrapy 爬蟲開源碼套件,請先安裝 Python 3.x。

接著安裝前置作業,主要有三個套件,分別為:

  1. 安裝 pywin32 套件。
  2. 安裝 lxml 套件。
  3. 安裝 twisted 套件。

安装 pywin32 套件

Pywin32 為存取 Windows 作業系統 API 函式庫,允許我們透過 Python 開發 Win32 的應用。

直接透過 pip 指令進行 pywin32 套件安裝。

1
> pip install pypiwin32

安裝 lxml 套件

lxml 為高效能的 Python XML 函式庫,建構在 libxml2 和 libxslt 函式庫之上,主要執行 XML 解析和轉換等任務。

請先下載至 http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml 網站下載對應版本的 whl 檔。

透過 pip 指令進行 lxml 套件安裝,請注意檔案名稱請以下載的檔案為主。

1
> pip install lxml-3.8.0-cp36-cp36m-win32.whl

安装 twisted 套件

twisted 為基於事件驅動的網路引擎框架,支援許多常見的傳輸及應用層協定,像是 TCP、HTTP、SSL、… 等。

請先下載至 http://www.lfd.uci.edu/~gohlke/pythonlibs/#twisted 網站下載對應版本的 whl 檔。

透過 pip 指令進行 twisted 套件安裝,請注意檔案名稱請以下載的檔案為主。

1
> pip install Twisted-17.5.0-cp36-cp36m-win32.whl

安裝 Scrapy 爬蟲開源碼套件

再來開始安装 Scrapy 套件,非常簡單,僅需要輸入下述指令就能夠進行安裝。

透過 pip 指令進行 Scrapy 套件安裝

1
> pip install Scrapy

查看 Scrapy 套件版本。

1
2
> scrapy -v
Scrapy 1.4.0 - no active project

測試 Scrapy 爬蟲開源碼套件

最後我們為了測試 Scrapy 套件,所以會試爬 SAS 部落格的最新文章的標題,請先建立 myspider.py 程式碼檔案,接著透過 Scrapy 執行 myspider.py 程式碼檔案,將結果輸出至 result.json 檔案中。

myspider.py 程式碼檔案內容。

1
2
3
4
5
6
7
8
9
import scrapy

class BlogSpider(scrapy.spiders.Spider):
name = 'blogspider'
start_urls = ['http://blogs.sas.com/content/all-posts/']

def parse(self, response):
for title in response.css('article.post > div.content > a'):
yield {'title': title.css('a ::text').extract_first()}

執行 myspider.py 程式碼檔案。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
> scrapy runspider myspider.py -o result.json
2017-09-13 23:28:52 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
2017-09-13 23:28:52 [scrapy.utils.log] INFO: Overridden settings: {'FEED_FORMAT': 'json', 'FEED_URI': 'result.json', 'SPIDER_LOADER_WARN_ONLY': True}
2017-09-13 23:28:52 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2017-09-13 23:28:52 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-09-13 23:28:52 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-09-13 23:28:52 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-09-13 23:28:52 [scrapy.core.engine] INFO: Spider opened
2017-09-13 23:28:52 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-09-13 23:28:52 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-09-13 23:28:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://blogs.sas.com/content/all-posts/> (referer: None)
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tBusiness intelligence, business users and agility'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tUsing government data for good: Hear leaders share analytics stories'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tCould your company survive a fake news attack?'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tBaby Led Weaning: What it is and How to do it'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tEdge Analytics - So kommt Analytics in den Truck (Teil 3) HEUTE: Konfiguration der Software'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tSimulate multivariate clusters in SAS'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tInvertir en gestión y análisis de datos será decisivo para las empresas colombianas'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tTop machine learning techniques: Add features to training data'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tHow did one school system save money, improve local traffic and make students happier? With fewer bus stops and better bus schedules'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tWhere do hurricanes strike Florida? (110 years of data)'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\t3 new outcomes you can expect from self-service data preparation'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tSAS Viya: What’s in it for me, the business?'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tProblem Relatives: Google Doc Add-on vs. Wordy and Misplaced Clauses'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\t6 Gründe warum für Reisebloggerin Kerstin Beck das australische Perth das Paradies ist'}
2017-09-13 23:28:58 [scrapy.core.scraper] DEBUG: Scraped from <200 http://blogs.sas.com/content/all-posts/>
{'title': '\n\t\t\t\t\t\tSymbolic derivatives in SAS'}
2017-09-13 23:28:58 [scrapy.core.engine] INFO: Closing spider (finished)
2017-09-13 23:28:58 [scrapy.extensions.feedexport] INFO: Stored json feed (15 items) in: result.json
2017-09-13 23:28:58 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 229,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 54085,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 9, 13, 15, 28, 58, 104852),
'item_scraped_count': 15,
'log_count/DEBUG': 17,
'log_count/INFO': 8,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2017, 9, 13, 15, 28, 52, 792298)}
2017-09-13 23:28:58 [scrapy.core.engine] INFO: Spider closed (finished)

result.json 爬蟲結果檔案內容。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[
{"title": "\n\t\t\t\t\t\tBusiness intelligence, business users and agility"},
{"title": "\n\t\t\t\t\t\tUsing government data for good: Hear leaders share analytics stories"},
{"title": "\n\t\t\t\t\t\tCould your company survive a fake news attack?"},
{"title": "\n\t\t\t\t\t\tBaby Led Weaning: What it is and How to do it"},
{"title": "\n\t\t\t\t\t\tEdge Analytics - So kommt Analytics in den Truck (Teil 3) HEUTE: Konfiguration der Software"},
{"title": "\n\t\t\t\t\t\tSimulate multivariate clusters in SAS"},
{"title": "\n\t\t\t\t\t\tInvertir en gesti\u00f3n y an\u00e1lisis de datos ser\u00e1 decisivo para las empresas colombianas"},
{"title": "\n\t\t\t\t\t\tTop machine learning techniques: Add features to training data"},
{"title": "\n\t\t\t\t\t\tHow did one school system save money, improve local traffic and make students happier? With fewer bus stops and better bus schedules"},
{"title": "\n\t\t\t\t\t\tWhere do hurricanes strike Florida? (110 years of data)"},
{"title": "\n\t\t\t\t\t\t3 new outcomes you can expect from self-service data preparation"},
{"title": "\n\t\t\t\t\t\tSAS Viya: What\u2019s in it for me, the business?"},
{"title": "\n\t\t\t\t\t\tProblem Relatives: Google Doc Add-on vs. Wordy and Misplaced Clauses"},
{"title": "\n\t\t\t\t\t\t6 Gru\u0308nde warum f\u00fcr Reisebloggerin Kerstin Beck das australische Perth das Paradies ist"},
{"title": "\n\t\t\t\t\t\tSymbolic derivatives in SAS"}
]

總結當我們在 Windows 作業系統中安裝好 Scrapy 套件之後,就能夠以非常有效率的方式從網際網路中爬下任何資料,以利進行後續進行資料準備。

相關資源