Generating a word cloud from an Emacs buffer
I write a lot of content, and sometimes I like to look back and re-read what I wrote in the past. I also like to run analytics on the content to identify the themes quickly. One tool to do that is a "Wordcloud": a visualization of the most frequent words in a text, where the size of each word in the wordcloud is proportional to its number of occurence in the text.
I found a wordcloud cli tool and wanted to be able to quickly run it on a note from emacs. First I built a shell script that takes two arguments: a file and a word length. The script invokes the wordcloud_cli but first it removes some org-mode markup and irrelevant information:
#!/bin/bash
cat $1 \
| sed "s|\[\[.*\]\[\(.*\)\]\]|\1|g" \
| sed "s|properties||gi" \
| sed "s|title||gi" \
| sed "s|thing||gi" \
| sed "s|#+.*||gi" \
> /tmp/words
wordcloud_cli --text /tmp/words --imagefile /tmp/img.png --width 1280 --height 1280 --min_word_length $2
I wired the script to emacs using the following emacs-lisp function:
(defun wordcloud (arg)
"Create a wordcloud from the current file"
(interactive "P")
(message (shell-command-to-string (s-concat "wordcloud " (buffer-file-name) " " (format "%d" (or arg 8)))))
(shell-command-to-string "open /tmp/img.png"))
It uses a prefix argument which lets me configure the minimum word length to consider with a default of 8. For example, by running it on the content of an article about perfectionism that was recently on hacker news (https://arunkprasad.com/log/unlearning-perfectionism/), it rendered: