Radonで始めるコードリファクタリング　【ランクAよりもPython力をつけろ】編

スマートPythonプログラミング、引き続き読んでます。

sandmark.hateblo.jp

読み進めていると、Radonという循環的複雑度を調査するパッケージが紹介されていたので、今回はそれを用いてsandmark/unmoのdictionary.pyのリファクタリングに挑戦します。

先に言っておきますが、結果は失敗です。今の私の力ではリファクタリングできなかったという事実がわかったことが何よりの成果です。それに伴って課題が見つかったので、個人的には有意義な時間でした。

教科書

スマートPythonプログラミング: Pythonのより良い書き方を学ぶ

作者: もみじあめ
発売日: 2016/03/12
メディア: Kindle版
この商品を含むブログ (1件) を見る

Cyclomatic Complexity - 循環的複雑度

「ソースコードは読みやすく簡潔に保つ」というのは、プログラマが心がけるべき暗黙的なルールです。しかし一言で「読みやすい」と言っても基準が欲しいものです。『循環的複雑度』はその基準のひとつと言えます。

Radonが指標とする複雑度は、制御構造やインデントです。例えば、ブロックを作り出さないリスト内包表記であっても、偶数のみのリストを作る[x for x in range(10) if x%2==0]というコードにはifが含まれています。これは制御構造があると見なされ、循環的複雑度は+1されます。

Radonはメソッドや関数、クラスに対してそれぞれの循環的複雑度を調査してくれます。AからFでランク付けされ、Fに近づくほど循環的複雑度が高い≒見づらいという結果が導かれます。とはいえ、すべてAランクに保ったからといって見やすいコードになるとは限りません。今回はその典型的な失敗例を紹介します。

`dictionary.py`のCC

『循環的複雑度』では漢字ばかりで読みづらいので、'CC'と略すことにいたしましょう。まずはradonをインストールします。

pip install radon

CCを計測するにはradon cc [filename]と呼び出します。今回検証するのは次のファイル。Unmoがとりあえず動作するようになり、「そのうちリファクタリングしようかな」で放置していたときのdictionary.pyです。

sandmark/unmo/dictionary.py at 543f3cdcdee5335d9301f81176711c4c8757e842

では計測してみましょう。デフォルトではランクしか表示されないので、CCがいくつなのかを表示する-sオプションを渡します。

$ radon cc -s dictionary.py

dictionary.py
M 75:4 Dictionary.study_pattern - B (7)
M 89:4 Dictionary.save - B (7)
M 125:4 Dictionary.load_template - B (6)
M 53:4 Dictionary.study_template - A (5)
M 104:4 Dictionary.load_random - A (5)
M 114:4 Dictionary.load_pattern - A (5)
C 7:0 Dictionary - A (3)
M 156:4 Dictionary.make_pattern - A (3)
M 69:4 Dictionary.study_random - A (2)
M 141:4 Dictionary.load_markov - A (2)
M 35:4 Dictionary.__init__ - A (1)
M 42:4 Dictionary.study - A (1)
M 49:4 Dictionary.study_markov - A (1)
M 151:4 Dictionary.pattern_to_line - A (1)
M 164:4 Dictionary.random - A (1)
M 169:4 Dictionary.pattern - A (1)
M 174:4 Dictionary.template - A (1)
M 179:4 Dictionary.markov - A (1)

行頭のMはメソッド、Cはクラスです。他にもFがありますが、dictionary.pyでは関数定義していません。

おおむねAランクになっています。study_pattern, save, load_templateメソッドがBランクで、全体的なCCとしては「まぁ読みやすい」範囲に留まっています。

しかし「とても読みやすい」ではないのが気になる。18個ある項目のうち15個がAランクなら、残りの3個もAランクにしたいと思うのが抗いがたい人情なのです。Bランクのメソッドを見てみましょう。

def study_pattern(self, text, parts):
    """ユーザーの発言textを、形態素partsに基づいてパターン辞書に保存する。"""
    for word, part in parts:
        if morph.is_keyword(part):  # 品詞が名詞であれば学習
            # 単語の重複チェック
            # 同じ単語で登録されていれば、パターンを追加する
            # 無ければ新しいパターンを作成する
            duplicated = next((p for p in self._pattern if p['pattern'] == word), None)
            if duplicated:
                if text not in duplicated['phrases']:
                    duplicated['phrases'].append(text)
            else:
                self._pattern.append({'pattern': word, 'phrases': [text]})

def save(self):
    """メモリ上の辞書をファイルに保存する。"""
    with open(Dictionary.DICT['random'], mode='w', encoding='utf-8') as f:
        f.write('\n'.join(self.random))

    with open(Dictionary.DICT['pattern'], mode='w', encoding='utf-8') as f:
        f.write('\n'.join([Dictionary.pattern_to_line(p) for p in self._pattern]))

    with open(Dictionary.DICT['template'], mode='w', encoding='utf-8') as f:
        for count, templates in self._template.items():
            for template in templates:
                f.write('{}\t{}\n'.format(count, template))

    self._markov.save(Dictionary.DICT['markov'])

def load_template(filename):
    """filenameをテンプレート辞書として読み込み、ハッシュを返す"""
    templates = defaultdict(lambda: [])
    try:
        with open(filename, encoding='utf-8') as f:
            for line in f:
                count, template = line.strip().split('\t')
                if count and template:
                    count = int(count)
                    templates[count].append(template)
        return templates
    except IOError as e:
        print(format_error(e))
        return templates

「言われてみれば確かに」という感じで、読みやすいかと聞かれたら「はい」とは答えられません……。

リファクタリング - save

比較的改善の余地が見つけやすいsaveメソッドからリファクタリングしていきます。

def save(self):
    """メモリ上の辞書をファイルに保存する。"""
    with open(Dictionary.DICT['random'], mode='w', encoding='utf-8') as f:
        f.write('\n'.join(self.random))

    with open(Dictionary.DICT['pattern'], mode='w', encoding='utf-8') as f:
        f.write('\n'.join([Dictionary.pattern_to_line(p) for p in self._pattern]))

    with open(Dictionary.DICT['template'], mode='w', encoding='utf-8') as f:
        for count, templates in self._template.items():
            for template in templates:
                f.write('{}\t{}\n'.format(count, template))

    self._markov.save(Dictionary.DICT['markov'])

要するにwithブロックが多いんですよ。分割しましょう。

    def save(self):
        """メモリ上の辞書をファイルに保存する。"""
        self._save_random()
        self._save_pattern()
        self._save_template()
        self._markov.save(Dictionary.DICT['markov'])

    def _save_template(self, dicfile=None):
        """テンプレート辞書をdicfileに保存する。
        dicfileのデフォルト値はDictionary.DICT['template']"""
        dicfile = dicfile if dicfile is not None else Dictionary.DICT['template']
        with open(Dictionary.DICT['template'], mode='w', encoding='utf-8') as f:
            for count, templates in self._template.items():
                for template in templates:
                    f.write('{}\t{}\n'.format(count, template))

    def _save_pattern(self, dicfile=None):
        """パターン辞書をdicfileに保存する。
        dicfileのデフォルト値はDictionary.DICT['pattern']"""
        dicfile = dicfile if dicfile is not None else Dictionary.DICT['pattern']

        lines = [Dictionary.pattern_to_line(p) for p in self._pattern]
        with open(dicfile, mode='w', encoding='utf-8') as f:
            f.write('\n'.join(lines))

    def _save_random(self, dicfile=None):
        """ランダム辞書をdicfileに保存する。
        dicfileのデフォルト値はDictionary.DICT['random']"""
        dicfile = dicfile if dicfile is not None else Dictionary.DICT['random']
        with open(dicfile, mode='w', encoding='utf-8') as f:
            f.write('\n'.join(self.random))

メソッドを分割しました（直球）。radon cc -s dictionary.pyした結果はオールランクAです。しかし「読みやすくなった？」の答えは「ノー」でしょう。我ながら手を加える前のほうがよほど読みやすかったと思います。

結局ツールを使ったところでPythonの基礎が身についていなければ猫に小判なのです。今回のケースでは、

ファイルを開く
引数のデフォルト値のチェックをする
ファイルに書き込む

という3つの処理が各メソッドに分散されただけで、むしろコード量としては増えています。これでは意味がないどころかマイナスなので、リファクタリングするにはまだ他の何かが必要です。

次回はデコレータを使ってみよう

気になりつつも手を出してこなかったのが、Python特有のデコレータです。これまで何も考えずに@staticmethodとか@propertyとか使ってきましたが、あれを使って真の意味でのリファクタリングをしてみます。

すなぶろ

Pythonistaを目指しているつもりが、Hyのおかげで無事脱線。Clojurian目指すことになりました。

Radonで始めるコードリファクタリング　【ランクAよりもPython力をつけろ】編

教科書

Cyclomatic Complexity - 循環的複雑度

`dictionary.py`のCC

リファクタリング - save

次回はデコレータを使ってみよう

教科書

Cyclomatic Complexity - 循環的複雑度

dictionary.pyのCC

リファクタリング - save

次回はデコレータを使ってみよう

`dictionary.py`のCC