Generating a word cloud from an Emacs buffer

I write a lot of content, and sometimes I like to look back and re-read what I wrote in the past. I also like to run analytics on the content to identify the themes quickly. One tool to do that is a "Wordcloud": a visualization of the most frequent words in a text, where the size of each word in the wordcloud is proportional to its number of occurence in the text.

I found a wordcloud cli tool and wanted to be able to quickly run it on a note from emacs. First I built a shell script that takes two arguments: a file and a word length. The script invokes the wordcloud_cli but first it removes some org-mode markup and irrelevant information:

#!/bin/bash
cat $1 \
    | sed "s|\[\[.*\]\[\(.*\)\]\]|\1|g" \
    | sed "s|properties||gi" \
    | sed "s|title||gi" \
    | sed "s|thing||gi" \
    | sed "s|#+.*||gi" \
    > /tmp/words
wordcloud_cli --text /tmp/words --imagefile /tmp/img.png --width 1280 --height 1280 --min_word_length $2

I wired the script to emacs using the following emacs-lisp function:

(defun wordcloud (arg)
  "Create a wordcloud from the current file"
  (interactive "P")
  (message (shell-command-to-string (s-concat "wordcloud " (buffer-file-name) " " (format "%d" (or arg 8)))))
  (shell-command-to-string "open /tmp/img.png"))

It uses a prefix argument which lets me configure the minimum word length to consider with a default of 8. For example, by running it on the content of an article about perfectionism that was recently on hacker news (https://arunkprasad.com/log/unlearning-perfectionism/), it rendered:

/assets/perfectionism.png