I may be stuck in the past, or like punishment, but my editor of choice is still VIM. However, certain tricks seem to be hard to find on Google searches, so I'm going to compile them here:
- Creating custom commands and keyboard mappings are easy in VIM. To create a custom command, list the command in the .vimrc file. The % character includes the current buffer's filename in the shell command.
command CommandName execute "!shellcommand %"
This command can be run in VIM using the standard :CommandName convention. To map this new command to a keyboard shortcut, use the map command in the .vimrc file.
map <F5> :CommandName<CR>
So, I always am using some command line shortcuts to do various tasks, and often have to look up the tricks every time I need to do something remotely fancy. Here are some of my most-used helpful hints:
- To remove the leading spaces and tabs from each line of text on standard in (so use with a pipe for the input), this sed command will work well:
sed -e 's/^[ \t]*//'
- Reformatting XML/HTML files so that line returns inside tags are removed:
xmllint --format --noblanks infile.xml > outfile.xml
Large streams of data, mostly unlabeled.
Machine learning approach to fit models to data. How does it work? Take the raw data, hypothesize a model, use a learning algorithm to get the model parameters to match the data.
What makes a good machine learning algorithm?
- Performance guarantees: (statistical consistency and finite sample bounds)
- Real-world sensors, data, resources (high-dimensional, large-scale, ...)
For many types of dynamical systems, learning is provably intractable. You must choose the right class of model, or else all bets are off!
- Spectral Learning approaches to machine learning
- Topology: Encompasses the global shape of the data, and the relations between data points or groups within the global structure
- Google Pagerank Algorithm
- Example: Cosmic Crystallography
- Torus universe (zero curvature)
- Spherical universe (positive curvature)
- Other universe (negative curvature)
- Data: Hyperspectral Imagery
- Gradient Flow Algorithm
- identify neighbor with highest density for each data point (arrow points from that point to that particular neighbor)
- follow the arrows to identify clusters
Found an interesting paper by Nicolas Christin and his group at CMU, available here. The authors examine the encrypted passwords across the entire university and run algorithms to guess the passwords. They break down the demographics along with how many attempts it took to guess the password. What's interesting? Check out Figure 1! Business students have the most guessable passwords, while Computer Science students have the least. I encourage all to check out this paper, or at least browse through the graphs!