This will change your $PATH so its first entry is the virtualenv’s bin/ directory. (You have to use source because it changes your shell environment in-place.)

Now, when you install any module with pip, it belongs to this virtual environment.

Leave virtual envuronment ¶

deactivate

This command is used to leave the virtual environment.

Output module ¶

pip freeze > requirement.txt

Remove the virtual environment ¶

Make sure you leave the virtual environment

rm -r temp

剩下進階的用法就去查 document 吧！！！

macos 可以用 virtualenvwrapper 來管理 virtualenv，好用！！！

20200109-1

Source1: Virtualenv Guide

Source2: Python_Virtualenv 簡單筆記

Removing dependencies ¶

就是想要整理 module 但是不知道哪個可以刪哪個不可以，所以很麻煩

原本有一個 pip-autoremove 的 module 可以用，但是作者好像放棄更新了(abandon)

所以在 python3 會有問題，對，我就是用了然後產生問題

雖然說會有問題，但還是可以用這個來尋找 dependencies，再一個一個手動刪除(超級麻煩)

pip-autoremove -l <module>

然後把列出來的 module 手動一個一個刪除

20200109-2

Warning ¶

千萬不要用 pip-autoremove 去移除 module，情況會變異常複雜

Source: stackoverflow

Source: pip-autoremove

pipdeptree ¶

就是把 dependencies 用樹狀的概念表示出來

pip install pipdeptree

使用方法去看 document，這裡來說如何輸出成 graph

有四個指令可以用

$ pipdeptree --graph-output dot > dependencies.dot
$ pipdeptree --graph-output pdf > dependencies.pdf
$ pipdeptree --graph-output png > dependencies.png
$ pipdeptree --graph-output svg > dependencies.svg

這個要安裝 GraphViz

brew install GraphViz

pip install --user GraphViz

兩個都要打！！！

一開始只打一個，害我搞超久QQ

Source: pipdeptree

Scrapy ¶

就是爬蟲

Install ¶

pip install scrapy

強烈建議使用 virtual environment 安裝

Create Project ¶

scrapy startproject IUBook

然後在 spider 資料夾底下寫自己的爬蟲

touch IUBook/spiders/book_spider.py

打程式，照官網上範例自己改就好了

先設定好 cookies 和 headers，等 request 時一起送出去

cookies={'GUEST_PASSWORD': '000000'}
headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11'}

yield scrapy.Request(url=url, headers=headers, cookies=cookies, callback=self.parse)

Start ¶

scrapy crawl book

book 是因為我名字取 book

name= "book"

Downloading Files [link] ¶

在 scrapy 裡是使用 FilesPipeline 來下載

Setup ¶

put this in setting.py

For image:

ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = '/path/to/valid/dir'

For files:

ITEM_PIPELINES = {'scrapy.pipelines.files.FilesPipeline': 1} 
FILES_STORE = '/path/to/valid/dir'

然後後面我都不會了，要用到 item 但是看不懂，通通抄網路上的

Item ¶

class IUBookPic(scrapy.Item):
    title = scrapy.Field()
    file_urls = scrapy.Field()
    files = scrapy.Field()

Parse ¶

def parse(self, response):
    fileurls = response.css('span').xpath('@data-url').getall()
    yield IUBookPic(file_urls=fileurls)

Robot.txt ¶

可能會遇到一個情況是 getting Forbidden by robots.txt:

前面的 user-agent 一定要設定好

然後在 setting.py 裡加上

ROBOTSTXT_OBEY=False

Source: Scrapy

Source: Scraping images with Python and Scrapy

Source: getting Forbidden by robots.txt: scrapy

好像有要打什麼上來，但忘了，等記得了再補充吧

想起來了，今天海報到了啊！！！

我好興奮啊啊啊啊啊～～～！！！

打一篇文章真累啊～～

趁現在閒閒沒事多打幾篇，哈哈

給讀到最後的你一點點獎勵