Ipython的高级用法总结

时间:2023-01-30 19:45:39

Ipython的高级用法总结


转自:http://www.111cn.net/phper/python/79550.htmipython 是一个 python 的交互式 shell,比默认的python shell 好用得多,支持变量自动补全,自动缩进,支持 bash shell 命令,内置了许多很有用的功能和函数。
今天我们由浅入深的看看ipython, 本文作为读者的你已经知道ipython并且用了一段时间了.


%run


这是一个magic命令, 能把你的脚本里面的代码运行, 并且把对应的运行结果存入ipython的环境变量中:

$cat t.py
# coding=utf-8
l = range(5)

$ipython
In [1]: %run t.py # `%`可加可不加

In [2]: l # 这个l本来是t.py里面的变量, 这里直接可以使用了
Out[2]: [0, 1, 2, 3, 4]

alias

In [3]: %alias largest ls -1sSh | grep %s
In [4]: largest to
total 42M
 20K tokenize.py
 16K tokenize.pyc
8.0K story.html
4.0K autopep8
4.0K autopep8.bak
4.0K story_layout.html

PS 别名需要存储的, 否则重启ipython就不存在了:

In [5]: %store largest
Alias stored: largest (ls -1sSh | grep %s)

下次进入的时候%store -r

bookmark - 对目录做别名

In [2]: %pwd
Out[2]: u'/home/vagrant'

In [3]: %bookmark dongxi ~/shire/dongxi

In [4]: %cd dongxi
/home/vagrant/shire/dongxi_code

In [5]: %pwd
Out[5]: u'/home/vagrant/shire/dongxi_code'

ipcluster - 并行计算

其实ipython提供的方便的并行计算的功能. 先回答ipython做并行计算的特点:

1.

$wget http://www.gutenberg.org/files/27287/27287-0.txt

第一个版本是直接的, 大家习惯的用法.

In [1]: import re

In [2]: import io

In [3]: non_word = re.compile(r'[Wd]+', re.UNICODE)

In [4]: common_words = {
   ...: 'the','of','and','in','to','a','is','it','that','which','as','on','by',
   ...: 'be','this','with','are','from','will','at','you','not','for','no','have',
   ...: 'i','or','if','his','its','they','but','their','one','all','he','when',
   ...: 'than','so','these','them','may','see','other','was','has','an','there',
   ...: 'more','we','footnote', 'who', 'had', 'been',  'she', 'do', 'what',
   ...: 'her', 'him', 'my', 'me', 'would', 'could', 'said', 'am', 'were', 'very',
   ...: 'your', 'did', 'not',
   ...: }

In [5]: def yield_words(filename):
   ...:     import io
   ...:     with io.open(filename, encoding='latin-1') as f:
   ...:         for line in f:
   ...:             for word in line.split():
   ...:                 word = non_word.sub('', word.lower())
   ...:                 if word and word not in common_words:
   ...:                     yield word
   ...:

In [6]: def word_count(filename):
   ...:     word_iterator = yield_words(filename)
   ...:     counts = {}
   ...:     counts = defaultdict(int)
   ...:     while True:
   ...:         try:
   ...:             word = next(word_iterator)
   ...:         except StopIteration:
   ...:             break
   ...:         else:
   ...:             counts[word] += 1
   ...:     return counts
   ...:

In [6]: from collections import defaultdict # 脑残了 忘记放进去了..
In [7]: %time counts = word_count(filename)
CPU times: user 88.5 ms, sys: 2.48 ms, total: 91 ms
Wall time: 89.3 ms

现在用ipython来跑一下:

ipcluster start -n 2 # 好吧, 我的Mac是双核的

先讲下ipython 并行计算的用法:

In [1]: from IPython.parallel import Client # import之后才能用%px*的magic

In [2]: rc = Client()

In [3]: rc.ids # 因为我启动了2个进程
Out[3]: [0, 1]

In [4]: %autopx # 如果不自动 每句都需要: `%px xxx`
%autopx enabled

In [5]: import os # 这里没autopx的话 需要: `%px import os`

In [6]: print os.getpid() # 2个进程的pid
[stdout:0] 62638
[stdout:1] 62636

In [7]: %pxconfig --targets 1 # 在autopx下 这个magic不可用
[stderr:0] ERROR: Line magic function `%pxconfig` not found.
[stderr:1] ERROR: Line magic function `%pxconfig` not found.

In [8]: %autopx # 再执行一次就会关闭autopx
%autopx disabled

In [10]: %pxconfig --targets 1 # 指定目标对象, 这样下面执行的代码就会只在第2个进程下运行

In [11]: %%px --noblock # 其实就是执行一段非阻塞的代码
   ....: import time
   ....: time.sleep(1)
   ....: os.getpid()
   ....:
Out[11]: <AsyncResult: execute>

In [12]: %pxresult # 看 只返回了第二个进程的pid
Out[1:21]: 62636

In [13]: v = rc[:] # 使用全部的进程, ipython可以细粒度的控制那个engine执行的内容

In [14]: with v.sync_imports(): # 每个进程都导入time模块
   ....:     import time
   ....:
importing time on engine(s)

In [15]: def f(x):
   ....:     time.sleep(1)
   ....:     return x * x
   ....:

In [16]: v.map_sync(f, range(10)) # 同步的执行

Out[16]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [17]: r = v.map(f, range(10)) # 异步的执行

In [18]: r.ready(), r.elapsed # celery的用法
Out[18]: (True, 5.87735)

In [19]: r.get() # 获得执行的结果
Out[19]: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

入正题:

In [20]: def split_text(filename):
....:    text = open(filename).read()
....:    lines = text.splitlines()
....:    nlines = len(lines)
....:    n = 10
....:    block = nlines//n
....:    for i in range(n):
....:        chunk = lines[i*block:(i+1)*(block)]
....:        with open('count_file%i.txt' % i, 'w') as f:
....:            f.write('n'.join(chunk))
....:    cwd = os.path.abspath(os.getcwd())
....:    fnames = [ os.path.join(cwd, 'count_file%i.txt' % i) for i in range(n)] # 不用glob是为了精准
....:    return fnames

In [21]: from IPython import parallel

In [22]: rc = parallel.Client()

In [23]: view = rc.load_balanced_view()

In [24]: v = rc[:]

In [25]: v.push(dict(
   ....:     non_word=non_word,
   ....:     yield_words=yield_words,
   ....:     common_words=common_words
   ....: ))
Out[25]: <AsyncResult: _push>

In [26]: fnames = split_text(filename)

In [27]: def count_parallel():
   .....:     pcounts = view.map(word_count, fnames)
   .....:     counts = defaultdict(int)
   .....:     for pcount in pcounts.get():
   .....:         for k, v in pcount.iteritems():
   .....:             counts[k] += v
   .....:     return counts, pcounts
   .....:

In [28]: %time counts, pcounts = count_parallel() # 这个时间包含了我再聚合的时间
CPU times: user 47.6 ms, sys: 6.67 ms, total: 54.3 ms # 是不是比直接运行少了很多时间?
Wall time: 106 ms # 这个时间是

In [29]: pcounts.elapsed, pcounts.serial_time, pcounts.wall_time
Out[29]: (0.104384, 0.13980499999999998, 0.104384)

更多地关于并行计算请看这里: nbviewer.ipython.org/url/www.astro.washington.edu/users/vanderplas/Astr599/notebooks/21_IPythonParallel.ipynb


学习下写ipython的magic命令. 好, magic是什么? 它是ipython自带的一些扩展命令, 类似%history, %prun, %logstart..

想查看全部的magic可以使用ismagic, 列出可用的全部magics

%lsmagic

magic分为2类:

    line magic: 一些功能命令
    cell magic: 主要是渲染ipython notebook页面效果以及执行某语言的代码

idb - python db.py shell extension

idb是我最近写的一个magic. 主要是给ipython提供db.py的接口,我们直接分析代码(我只截取有代表性的一段):

import os.path
from functools import wraps
from operator import attrgetter
from urlparse import urlparse

from db import DB # db.py提供的接口
from IPython.core.magic import Magics, magics_class, line_magic # 这三个就是我们需要做magic插件的组件


def get_or_none(attr):
    return attr if attr else None


def check_db(func):
    @wraps(func)
    def deco(*args):
        if args[0]._db is None: # 每个magic都需要首页实例化过db,so 直接加装饰器来判断
            print '[ERROR]Please make connection: `con = %db_connect xx` or `%use_credentials xx` first!'  # noqa
            return
        return func(*args)
    return deco


@magics_class  # 每个magic都需要加这个magics_class装饰器
class SQLDB(Magics): # 要继承至Magics
    _db = None # 每次打开ipython都是一次实例化

    @line_magic('db_connect') # 这里用了line_magic 表示它是一个line magic.(其他2种一会再说) magic的名字是db_connect. 注意 函数名不重要
                              # 最后我们用 %db_connect而不是%conn
    def conn(self, parameter_s): # 每个这样的方法都接收一个参数 就是你在ipython里输入的内容
        """Conenct to database in ipython shell.
        Examples::
            %db_connect
            %db_connect postgresql://user:pass@localhost:port/database
        """
        uri = urlparse(parameter_s) # 剩下的都是解析parameter_s的逻辑

        if not uri.scheme:
            params = {
                'dbtype': 'sqlite',
                'filename': os.path.join(os.path.expanduser('~'), 'db.sqlite')
            }
        elif uri.scheme == 'sqlite':
            params = {
                'dbtype': 'sqlite',
                'filename': uri.path
            }
        else:
            params = {
                'username': get_or_none(uri.username),
                'password': get_or_none(uri.password),
                'hostname': get_or_none(uri.hostname),
                'port': get_or_none(uri.port),
                'dbname': get_or_none(uri.path[1:])
            }

        self._db = DB(**params) # 这里给_db赋值

        return self._db # return的结果就会被ipython接收,显示出来

    @line_magic('db') # 一个新的magic 叫做%db -- 谨防取名冲突
    def db(self, parameter_s):
        return self._db

    @line_magic('table')
    @check_db
    def table(self, parameter_s):
        p = parameter_s.split() # 可能传进来的是多个参数,但是对ipython来说,传进来的就是一堆字符串,所以需要按空格分隔下
        l = len(p)
        if l == 1:
            if not p[0]:
                return self._db.tables
            else:
                return attrgetter(p[0])(self._db.tables)
        else:
            data = self._db.tables
            for c in p:
                if c in ['head', 'sample', 'unique', 'count', 'all', 'query']:
                    data = attrgetter(c)(data)()
                else:
                    data = attrgetter(c)(data)
            return data

def load_ipython_extension(ipython): # 注册一下. 假如你直接去ipython里面加 就不需要这个了
    ipython.register_magics(SQLDB)

PS:

    调试中可以使用%reloa_ext idb 的方式重启magic
    %install_ext 之后默认放在你的ipython自定义目录/extensions里. 我这里是~/.ipython/extensions

好了,大家是不是觉得ipython的magic也不是很难嘛

来了解ipython都提供了什么?

    magic装饰器的类型:

    line_magic # 刚才我们见识了, 就是%xx, xx就是magic的名字
    cell_magic # 就是%%xx
    line_cell_magic # 可以是%xx, 也可以是%%xx

先说cell_magic 来个例子,假如我想执行个ruby,本来应该是:

In [1]: !ruby -e 'p "hello"'
"hello"

In [2]: %%ruby # 也可以这样
   ...: p "hello"
      ...:
      "hello"

再说个notebook的:

In [3]: %%javascript
   ...: require.config({
   ...:     paths: {
   ...:         chartjs: '//code.highcharts.com/highcharts'
   ...:     }
   ...: });
   ...:
   <IPython.core.display.Javascript object>
});

然后再说line_cell_magic:

In [4]: %time 2**128
CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 5.01 µs
Out[4]: 340282366920938463463374607431768211456L

In [5]: %%time
   ...: 2**128
   ...:
   CPU times: user 4 µs, sys: 0 ns, total: 4 µs
   Wall time: 9.06 µs
   Out[5]: 340282366920938463463374607431768211456L

Ps: line_cell_magic方法的参数是2个:

@line_cell_magic
def xx(self, line='', cell=None):

带参数的magic(我直接拿ipython源码提供的magic来说明):

一共2种风格:

    使用getopt: self.parse_options
    使用argparse: magic_arguments

self.parse_options

@line_cell_magic
def prun(self, parameter_s='', cell=None):
    opts, arg_str = self.parse_options(parameter_s, 'D:l:rs:T:q',
                                       list_all=True, posix=False)
    ...

getopt用法可以看这里 http://pymotw.com/2/getopt/index.html#module-getopt

我简单介绍下’D:l:rs:T:q’就是可以使用 -D, -l, -r, -s, -T, -q这些选项. :号是告诉你是否需要参数,split下就是: D:,l:,r,s:,T:,q 也就是-r和-q不需要参数其他的都是参数 类似 %prun -D
magic_arguments

@magic_arguments.magic_arguments() # 最上面
@magic_arguments.argument('--breakpoint', '-b', metavar='FILE:LINE',
    help="""
    Set break point at LINE in FILE.
    """
) # 这种argument可以有多个
@magic_arguments.argument('statement', nargs='*',
    help="""
    Code to run in debugger.
    You can omit this in cell magic mode.
    """
)
@line_cell_magic
def debug(self, line='', cell=None):
    args = magic_arguments.parse_argstring(self.debug, line) # 要保持第一个参数等于这个方法名字,这里就是self.debug
    ...

还有个magic方法集: 用于并行计算的magics: https://github.com/ipython/ipython/blob/master/IPython/parallel/client/magics.py