Occasionally, I blog about things that interest me.
We've started using Google BigQuery extensively at Cruise as a data warehouse. The syntax for querying arrays in BigQuery isn't obvious and this post will explain how it works.
Consider a table that stores a company and its executives with the following schema:
companies
- name
- founding_year
- executives
* name
* title
with some sample data
WITH companies AS (
SELECT
"Apple" as name, 1976 as founding_year,
[
(SELECT AS STRUCT 'Tim Cook' name, 'CEO' title),
(SELECT AS STRUCT 'Jony Ive' name, 'Chief Design Officer' title),
(SELECT AS STRUCT 'Jeff Williams' name, 'COO' title)
] as executives UNION ALL SELECT
"Amazon" as name, 1994 as founding_year,
[
(SELECT AS STRUCT 'Jeff Bezos' name, 'CEO' title),
(SELECT AS STRUCT 'Brian T. Olsavsky' name, 'CFO' title)
] as executives UNION ALL SELECT
"Twitter" as name, 2006 as founding_year,
[
(SELECT AS STRUCT 'Jack Dorsey' name, 'CEO' title),
(SELECT AS STRUCT 'Ned Segal' name, 'CFO' title)
] as executives UNION ALL SELECT
"AirBNB" as name, 2008 as founding_year,
[
(SELECT AS STRUCT 'Brian Chesky' name, 'CEO' title),
(SELECT AS STRUCT 'Joe Gebbia' name, 'CPO' title)
] as executives UNION ALL SELECT
"Square" as name, 2009 as founding_year,
[
(SELECT AS STRUCT 'Jack Dorsey' name, 'CEO' title)
] as executives
) SELECT * FROM companies;
Row | name | founding_year | executives.name | executives.title |
---|---|---|---|---|
1 | Apple | 1976 | Tim Cook | CEO |
Jony Ive | Chief Design Officer | |||
Jeff Williams | COO | |||
2 | Amazon | 1994 | Jeff Bezos | CEO |
Brian T. Olsavsky | CFO | |||
3 | 2006 | Jack Dorsey | CEO | |
Ned Segal | CFO | |||
4 | AirBNB | 2008 | Brian Chesky | CEO |
Joe Gebbia | CPO | |||
5 | Square | 2009 | Jack Dorsey | CEO |
We can use familiar SQL queries to query top level fields like name
or founding_year
SELECT * FROM companies WHERE founding_year > 2000;
Row | name | founding_year | executives.name | executives.title |
---|---|---|---|---|
1 | 2006 | Jack Dorsey | CEO | |
Ned Segal | CFO | |||
2 | AirBNB | 2008 | Brian Chesky | CEO |
Joe Gebbia | CPO | |||
3 | Square | 2009 | Jack Dorsey | CEO |
To query nested fields like executives.name
or executives.title
, we use a combination of CROSS JOIN
and UNNEST
.
For example, this query returns all companies with Jack Dorsey as an executive:
SELECT * FROM companies CROSS JOIN UNNEST(executives) as executive WHERE executive.name = 'Jack Dorsey';
Row | name | founding_year | executives.name | executives.title |
---|---|---|---|---|
1 | 2006 | Jack Dorsey | CEO | |
Ned Segal | CFO | |||
2 | Square | 2009 | Jack Dorsey | CEO |
To understand how this query works, notice that CROSS JOIN
and UNNEST
return the row with the flat value of each item in the array.
For example, consider this query
SELECT c.*, executive FROM companies c CROSS JOIN UNNEST(c.executives) as executive
This returns a row for each combination of company and executive.
Row | name | founding_year | executives.name | executives.title | executive.name | executive.title |
---|---|---|---|---|---|---|
1 | Apple | 1976 | Tim Cook | CEO | Tim Cook | CEO |
Jony Ive | Chief Design Officer | |||||
Jeff Williams | COO | |||||
2 | Apple | 1976 | Tim Cook | CEO | Jony Ive | Chief Design Officer |
Jony Ive | Chief Design Officer | |||||
Jeff Williams | COO | |||||
3 | Apple | 1976 | Tim Cook | CEO | Jeff Williams | COO |
Jony Ive | Chief Design Officer | |||||
Jeff Williams | COO | |||||
4 | Amazon | 1994 | Jeff Bezos | CEO | Jeff Bezos | CEO |
Brian T. Olsavsky | CFO | |||||
5 | Amazon | 1994 | Jeff Bezos | CEO | Brian T. Olsavsky | CFO |
Brian T. Olsavsky | CFO | |||||
6 | 2006 | Jack Dorsey | CEO | Jack Dorsey | CEO | |
Ned Segal | CFO | |||||
7 | 2006 | Jack Dorsey | CEO | Ned Segal | SFO | |
Ned Segal | CFO | |||||
8 | AirBNB | 2008 | Brian Chesky | CEO | Brian Chesky | CEO |
Joe Gebbia | CPO | |||||
9 | AirBNB | 2008 | Brian Chesky | CEO | Joe Gebbia | CPO |
Joe Gebbia | CPO | |||||
10 | Square | 2009 | Jack Dorsey | CEO | Jack Dorsey | CEO |
Once, we have this table, it's easy to see how WHERE
clauses work like we expect on the executive.name
and executive.title
fields.
Given the results above - what would this query return?
SELECT c.name, executive.name as ceo FROM companies c CROSS JOIN UNNEST(c.executives) as executive WHERE executive.title='CEO'
If you guessed right, it returns the name and CEO of each company in our table.
Row | name | ceo |
---|---|---|
1 | Apple | Tim Cook |
2 | Amazon | Jeff Bezos |
3 | Jack Dorsey | |
4 | AirBNB | Brian Chesky |
5 | Square | Jack Dorsey |
Tip: You can use a comma in place of CROSS JOIN
in your queries for brevity.
SELECT c.* FROM companies c CROSS JOIN UNNEST(executives) as executive WHERE executive.name = 'Jack Dorsey';
is equivalent to
SELECT c.* FROM companies c, UNNEST(executives) as executive WHERE executive.name = 'Jack Dorsey';
I hope this helps you explore datasets in BigQuery more easily.
BigQuery has worked well for us as a data warehouse. From engineers to data analysts and managers, I've seen users of all technical abilities have use BigQuery at Cruise in their day-to-day work to build a self-driving robotaxi.
I spent 5 years in the Computer Science BS-MS program. I was a peer mentor for CS111 for 3 years.
I've met a wide range of students in the CS department. I've helped students with assignments, and explained concepts to them over and over again. This blog post explains the patterns for success and failure that I have noticed.
A good performance in the intro courses builds the foundation for your CS knowledge. Without it, you are as useful as a mathematician that cannot add fractions, or solve linear equations.
By the time you get through CS111, you should have a clear mental model of how problem solving is accomplished through code. It's easy to get distracted as a 111 student by the complexities of Java, Eclipse, jEdit, your weekly assignments, and course project. This is especially true if you come to 111 with no programming experience. It might feel like all you're doing is staying afloat week after week, assignment after assignment, milestone after milestone. Someone tells you that using Eclipse is good/bad. Someone tells you that you should put curly braces on the same line as a function, as opposed to putting it on its own line. It can be very overwhelming.
If at the end of 111, you are able to take a problem given to you, think about how it's solved, write code to solve it, and debug your code until it works, you have done well. This seems like a hard thing to measure, so I'm going to give you a couple of litmus tests to gauge your performance.
Consider the problem below:
Write a program that prints the numbers from 1 to 100. But for multiples of three print Lemon
instead of the number and for the multiples of five print Juice
. For numbers which are multiples of both three and five print LemonJuice
.
Here is what the output of the program looks like from 1 to 20.
1
2
Lemon
4
Juice
Lemon
7
8
Lemon
Juice
11
Lemon
13
14
LemonJuice
16
17
Lemon
19
Juice
This problem should be a breeze for you after 111. If you cannot write code to solve this in 30 minutes, without someone's help, you are in trouble.
Here is another litmus test. If you can solve the first 5 problems on Project Euler without any help, you have learned problem solving. You can start worrying about other things. If you can't do these problems, you need to spend more time learning to problem solve, writing code, and debugging code. You will not get very far in a CS degree before you do this.
You can write code to solve a Project Euler problem, and submit the solution by creating an account. This is also a great way to learn a new programming language (This is how I learned the basics of Python).
Data Structures develops on your understanding of problem solving from 111 and teaches you about clever ways in which programmers manipulate data, and algorithms.
You can ask any upperclassman what they think the most important CS course is, and Data Structures will almost certainly be one of their top 3.
Now, let me tell you the story of Alice. Alice is a metaphorical student that represents 50% of the CS112 roster.
Alice took CS111, did fairly well in it (got better than a B). In 111, Alice often found it hard to listen to the professor in lecture. Every once in a while, she zoned out, and couldn't figure out what Tjang or Sesh was saying. She went home, looked at the weekly assignment, and finished it (perhaps with some help from a friend, TA, or someone at the iLabs). Since every concept in lecture was reinforced in a weekly assignment, she understood most of the material that was covered in 111.
When Alice took 112, she followed a similar path. Tjang/Sesh was lecturing about Heaps, Linked Lists, Graphs, Tail Recursion, Efficiency Analysis of Insertion Sort etc. She went on Facebook on her laptop/got a text from a roommate, subequently zoned out in class, and quickly lost track of what the professor was saying. Alice went home, there was a project due in two weeks. Alice worked very hard on the project. She definitely had some issues along the way, but she was able to ask Sesh/TAs/peers/people at the iLabs for help and get it done. She got an 85+ on all the 5 projects in CS112. Alice got around a 50 on both Data Structures exams, ended Data Structures with a C/C+, and moved on.
Alice is the canonical example of a student that does poorly in the Rutgers CS degree, and here's why.
In 111, Alice had projects every week that forced her to learn material. The only thing in 112 that forces you to learn material is the exam, and that happens twice a semester. Sometimes, the first 112 exam is so early in the semester that there's very little material covered on it. This means that the final covers a ton of material, and because Alice has been zoning out of 112 lecture on a regular basis, she does not know why Heapify is O(n), or why you would want to use a min heap as a frontier when implmenting Dijkstra's shortest path algorithm.
In order to avoid Alice's pitfalls, you need to make sure that you understand every concept in Data Structures. Practically every sentence that Sesh or Tjang says in 112 is important. I understand that you are human, and cannot keep perfect concentration in an 80 minute lecture. But you need to be proactive about understanding all the concepts that are covered. This means that if you zoned out in the lecture about AVL Trees, you still need to make sure you learn it. You can go read the textbook after class. You can look up the material online. There's some very good online courses that cover data structures. I strongly recommend the lectures from Stanford's Programming Abstractions course. You can ask upperclassmen to explain the concept to you.
You need to do this before the day of your data structures exam, because the course covers a lot of important material, and you cannot learn all of it in a day.
Prerequisites don't matter.
Stop focusing on grades. Focus on concepts you're learning
Your professors haven't written code in 15+ years. They're not going to teach you how to develop software - don't waste your time taking CS431.
Start writing code in your free time to solve any problem that you have. I wrote RUBUS when I was in 112. I used to maintain static web pages for a board game my roommate played with his high school friends. Everyone that's good at programming has put in lots of time into it, and you will need to do the same.
Talk to upperclassmen about ideas you have, and the things that they've built. Hang out in the CAVE, attend USACS events, and go to hackathons. All of this experience will add up, and help you land your first paid programming gig.
When I started taking CS classes, I didn't understand why people got paid to sit at a computer and write for-loops. I thought the mysterious, legendary developers getting internships and part time jobs must know so much more about the web/mobile apps/algorithms. This is a classic case of impostor syndrome, and it creates a mental roadblock until you get paid to write code.
Use your summers for internships or part time work (not for folding clothes). For your first development job, I recommend Student System Administration at OSS, System Administration at LCSR (under Doug Motto), an internship at Too Much Media, or HackNY if you are lucky. You'll want to do a more traditional tech internship at companies like Microsoft, Google or Etsy, but you'll have an easier time getting these once you have some experience. If you hang around CS folks, you'll hear about such opportunities frequently. Apply early and often - you'll have an easier time in October than in April.
The Rutgers CS community, while not perfect, has grown a great deal over the last few years. You've got access to events like HackRU, HackNY, PennApps, and HackTCNJ. You've got access to the CAVE, the Hackerspace, and the Makerspace. USACS, RuMAD and the Rutgers Hackathon Club are active, and full of smart people. People taking the same classes as you have gone on to amazing jobs, start companies, and sell companies. Start reading hacker news, and /r/programming.
Don't forget to give back to the community. Teach CS111 recitation. Join the USACS board, and help plan events. Hang out at the CAVE and help underclassmen understand difficult concepts. Help the noobs out at hackathons.
Make friends out of your peers. Impossible looking homework assignments will become easier. You'll spend a silly amount of time working on a CTF challenge, or writing a game. You'll get one letter Github usernames together. After college, they'll help you find jobs and offer you their couches.
Rutgers is a great place to study Computer Science, and I hope your time there will be as memorable as mine was.
If you're like me, you've written loads of web apps, but you rarely set up SSL on them. SSL is a must for any production-grade web application, especially if you're authenticating users or taking personal information from them. Otherwise all the contents of your HTTP requests are being sent in plaintext - user login info / passwords, cookies etc.
Usually, SSL certificates can cost lots of money (Verisign charges over $100 / month), and be annoying to setup. After paying for domains and hosting, this is the last thing you want to shell out money for. Thus, StartSSL Free is a very appealing product because it gives you a free SSL certificate valid for 1 year, that's accepted in all major browsers. I'm using it to serve flipdclass.com over SSL.
domain_key.enc
.www
as the sub domain to the certificate. You'll need to create a certificate for each subdomain that you want to access over HTTPS, unless you get a wildcard SSL cert (not available through StartSSL Free). domain.crt
. I would also recommend saving the intermediate and root CA certs, because you'll need them for your webserver setup.I'll walk you through setting up nginx or apache for SSL.
domain_key.enc, domain.crt, ca.pem, sub.class1.server.ca.pem
), probably via scp./etc/nginx/ssl/
domain_key.enc
, so that it can be read by your web server. Without this, your webserver will prompt you for a password everytime it is restarted.$ openssl rsa -in domain_key.enc -out domain.key
$ chmod 400 domain.key # only root should be able to read this.
<VirtualHost _default_:443>
ServerName domain.com
SSLEngine On
SSLCertificateFile /path/to/certs/domain.crt
SSLCertificateKeyFile /path/to/certs/domain.key
SSLCertificateChainFile /path/to/certs/sub.class1.server.ca.pem
...
</VirtualHost>
Nginx does not have a directive for SSL Certificate Chains, so you will to concatenate your certificate to the intermediate and root CA certs.
$ cat domain.crt sub.class1.server.ca.pem ca.pem > domain.chained.crt
Then you can configure your virtual host as follows.
server {
listen 443 default_server ssl;
ssl_certificate /path/to/certs/domain.chained.crt;
ssl_certificate_key /path/to/certs/domain.key;
...
}
Now, you can reload your webserver, and if you did everything correctly, you should get a successful HTTPS connection to your web app. Make sure that you test your site on a few different browsers, because not all browsers will behave the same way with SSL certificates.
You should also consider configuring your webserver to redirect all traffic to HTTPS, in order to prevent users from leaking their sensitive data by mistake.
SSH is a powerful protocol that lets you access machines remotely and run commands on them. Rutgers has a cluster of linux machines for CS students, and I often run programs on them. Sometimes, I leave a program running for a while, and forget which machine it was on. In this situation, PDSH comes in handy. It lets me run ps aux | grep -i <username>
quickly across all the machines.
PDSH lets you run a command in parallel across a bunch of machines. I start by creating a text file with a list of machines I want to shell into:
cd.cs.rutgers.edu
cp.cs.rutgers.edu
grep.cs.rutgers.edu
kill.cs.rutgers.edu
less.cs.rutgers.edu
ls.cs.rutgers.edu
man.cs.rutgers.edu
pwd.cs.rutgers.edu
rm.cs.rutgers.edu
top.cs.rutgers.edu
vi.cs.rutgers.edu
cpp.cs.rutgers.edu
java.cs.rutgers.edu
perl.cs.rutgers.edu
basic.cs.rutgers.edu
assembly.cs.rutgers.edu
pascal.cs.rutgers.edu
php.cs.rutgers.edu
lisp.cs.rutgers.edu
prolog.cs.rutgers.edu
adapter.cs.rutgers.edu
command.cs.rutgers.edu
decorator.cs.rutgers.edu
facade.cs.rutgers.edu
flyweight.cs.rutgers.edu
mediator.cs.rutgers.edu
patterns.cs.rutgers.edu
singleton.cs.rutgers.edu
state.cs.rutgers.edu
template.cs.rutgers.edu
visitor.cs.rutgers.edu
builder.cs.rutgers.edu
composite.cs.rutgers.edu
design.cs.rutgers.edu
factory.cs.rutgers.edu
interpreter.cs.rutgers.edu
null.cs.rutgers.edu
prototype.cs.rutgers.edu
specification.cs.rutgers.edu
strategy.cs.rutgers.edu
utility.cs.rutgers.edu
Let's say I save this file as machines.txt
. I can then run a command in parallel across all these machines:
$ pdsh -R ssh -w ^machines "<command>"
Here are some things you can do with PDSH that you might find useful
Find all python processes running on these machines.
$ pdsh -R ssh -w ^machines "ps aux | grep -i python"
Kill any processes being run by my user. (Super useful if you forget to log out of a lab machine.)
$ pdsh -R ssh -w ^machines "killall -u `whoami`"
Check a specific log file for errors.
$ pdsh -R ssh -w ^machines "grep -i error /path/to/log"
It's a handy UNIX tool to have in your arsenal when working with lots of machines. Clearly, I am only showing the usage of pdsh
in the most basic way. Check out PDSH on Google Code for a more detailed description of everything PDSH can do.
Web Scraping is a super useful technique that lets you get data out of web pages that don't have an API. I often scrape web pages to get structured data out of unstructured web pages, and Python is my language of choice for quick scripts.
In the past, I used Beautiful Soup almost exclusively to do this kind of scraping. BeautifulSoup is a great library for web scraping - it has great docs, and it gets the job done most of the time. I've used it on lots of projects. However, I find that it doesn't fit my workflow.
Let's say I wanted to scrape some data off a web page. I usually inspect the element in the Chrome Dev Console, and guess at a selector that might give me the data I want. Perhaps I guess div.foo li a
. I quickly check to see if this works by running this selector in the console $('div.foo li a')
, and modify it if it doesn't.
Even after using BeautifulSoup for a while, I find that I have to go back and read the docs to write code that scrapes this selector. I always forget how to select classes in BeautifulSoup's find_all
method. I don't remember how to write a CSS attribute selector such as a[href=*foo*]
. It doesn't let me write code at the speed of thought.
LXML is a robust library for parsing XML and HTML in Python that even BeautifulSoup is built on top of. I don't know much about lxml
, except that I can use CSS Selectors with it very easily, thanks to lxml.cssselect. Look at the example code below to see how easy this is.
import lxml.html
from lxml.cssselect import CSSSelector
# get some html
import requests
r = requests.get('http://url.to.website/')
# build the DOM Tree
tree = lxml.html.fromstring(r.text)
# print the parsed DOM Tree
print lxml.html.tostring(tree)
# construct a CSS Selector
sel = CSSSelector('div.foo li a')
# Apply the selector to the DOM tree.
results = sel(tree)
print results
# print the HTML for the first result.
match = results[0]
print lxml.html.tostring(match)
# get the href attribute of the first result
print match.get('href')
# print the text of the first result.
print match.text
# get the text out of all the results
data = [result.text for result in results]
As you can see, it's really easy to use CSS Selectors with Python and lxml. Instead of spending time reading BeautifulSoup docs, spend time writing your application.
LXML and CSSSelect are both Python packages that you can install easily via pip
. In order to install lxml
via pip you will need libxml2
and libxslt
. On a standard Ubuntu installation, you can simply do
sudo apt-get install libxml2-dev libxslt1-dev
pip install lxml cssselect
Check out the lxml installation page and lxml.cssselect for more information.
Having used Linux almost exclusively for the last four years, I miss efficient window management on Macs. Coming from the awesome window manager, I find that OS X does not have good support for a two monitor multiple workspace workflow out of the box. After tinkering with third party software, I believe I've found a good solution for most of my complaints, and have a workflow that I feel productive with. In my experience, this works best with multiple monitors, a standard keyboard (think Dell not Apple), and a three button mouse (I'm not a fan of touchpads or Apple mice).
I really like the use of multiple desktops in my workflow. I usually set up four desktops. I keep Spotify open on the very last one. The middle ones are my "work" desktops that I use for terminals, browsers, IDEs, and documentation. The first one is usually a "distraction workspace" that will have my email, and Adium open. This helps me keep my windows organized, and keep focus when I need to.
In order to set this up, I add additional desktops (up to 4). The easiest way to do this is to open up Mission Control
(usually Control-Up), hover over the Desktops, and click the Plus button on the top right.
Once this is done, I would recommend setting up easy keybindings to switch between desktops. To do this, you go to System Preferences > Keyboard > Keyboard Shortcuts > Mission Control
. Then you can set up keybindings for Move left a space
, Move right a space
, ... Switch to Desktop 1-4
. I use Ctrl-Alt-Left/Right
to move between desktops, and use Command-1/2/3/4
to jump to a desktop.
On OS X, it's sometimes pretty cumbersome to perform window management tasks like moving windows between monitors, and maximizing windows efficiently. This is where Slate comes in. Slate is a configurable third-party window management application, that makes these window management tasks super easy. I will explain how I use Slate day-to-day.
You can install Slate, pretty simply by downloading the Slate dmg. After installing and starting Slate, you will want to make sure it's properly configured.
Here is my ~/.slate.js
configuration file that describes the keybindings I use with it. Right click on the Slate icon in the topbar > Relaunch and Load Config
, to apply configuration changes.
//Save this in ~/.slate.js
//This configuration file came from http://vverma.net.
//Credit to Gerard O'Neill, http://goneill.net for introducing me to Slate.
var left = {
'x': 'screenOriginX',
'y': 'screenOriginY',
'width': 'screenSizeX/2',
'height': 'screenSizeY',
};
var right = {
'x': 'screenOriginX + screenSizeX/2',
'y': 'screenOriginY',
'width': 'screenSizeX/2',
'height': 'screenSizeY',
};
//half screen.
slate.bind('right:ctrl,cmd', function(win) {
var screen = slate.screen().rect();
var win_rect = win.rect();
// if we are at the edge of a screen on the right.
if(Math.abs(screen.x + screen.width - win_rect.x - win_rect.width) < 5) {
var curr_screen = slate.screen().id();
slate.log(curr_screen);
if(curr_screen < slate.screenCount() - 1) {
var shift_screen = _.clone(left);
shift_screen['screen'] = curr_screen + 1;
win.doOperation(slate.operation('move', shift_screen));
}
} else {
win.doOperation(slate.operation('move', right));
}
});
//half screen.
slate.bind('left:ctrl,cmd', function(win) {
var screen = slate.screen().rect();
var win_rect = win.rect();
// if we are at the edge of a screen on the left.
if(screen.x == win_rect.x) {
var curr_screen = slate.screen().id();
if(curr_screen > 0) {
var shift_screen = _.clone(right);
shift_screen['screen'] = curr_screen - 1;
win.doOperation(slate.operation('move', shift_screen));
}
} else {
win.doOperation(slate.operation('move', left));
}
});
//maximize
slate.bind('up:ctrl,cmd', function(win) {
win.doOperation(slate.operation('move', {
'x': 'screenOriginX',
'y': 'screenOriginY',
'width': 'screenSizeX',
'height': 'screenSizeY',
}));
});
//center
slate.bind('down:ctrl,cmd', function(win) {
win.doOperation(slate.operation('move', {
'x': 'screenOriginX + screenSizeX/4',
'y': 'screenOriginY + screenSizeY/4',
'width': 'screenSizeX/2',
'height': 'screenSizeY/2',
}));
});
Here is the mapping of keybindings
Cmd-Ctrl-Left - Split window in half vertically and move to the left. (Moves to the next screen if you are at the edge.)
Cmd-Ctrl-Right - Split window in half vertically and move to the right. (Moves to the next screen if you are at the edge.)
Cmd-Ctrl-Up - Maximize window.
Cmd-Ctrl-Down - Center window in its current screen.
Feel free to modify this slate configuration to suit your needs. You might find the Slate documentation helpful.
If you use Adium as a chat client on your machine, I recommend setting up a Global Keyboard Shortcut. This allows you to switch the focus to Adium anytime on your machine by pressing the key sequence. It's super handy to instantly switch to Adium when you get an Adium notification.
I set my global keyboard shortcut to Cmd-Shift-/
. To use this, you'll have to get rid of the keybinding for the Help Center
first. Do this by removing the keybinding in System Preferences > Keyboard > Keyboard Shortcuts > Help Center
.
To set the global keyboard shortcut in Adium, go to Preferences > General > Global Shortcut
.
Now, you can press Cmd-Shift-/
to switch to Adium, and press Cmd-/
to show/hide your buddy list.
You should already be using Alfred as your primary application launcher. It lets you launch applications with your keyboard really easily, and do much more. It's also way faster than Spotlight.
If you use multiple desktops, sometimes you'll want to create multiple windows of the same application. Alfred will get in your way here, because if you try to launch an application that already has a window open, it will take you to the window instead of opening a new one. This happens to me all the time when I want to create a Chrome window, when you already have one open on another window.
The easiest way to do this is to go to an existing Chrome window, press Cmd-N
to create another window, drag the newly created window, and while dragging the window, press Cmd-1/2/3/4
to take the window to another desktop.
Like I mentioned before, I use Spaces in my development workflow. One thing that I really dislike about Spaces is that when you move between Spaces using Ctrl-Alt-Left/Right
, it takes a second to animate the movement. I don't like this because it feels clunky.
You can run this in a terminal to make this animation a lot faster.
defaults write com.apple.dock expose-animation-duration -int 0; killall Dock
Credit to Gerard O'Neill for showing me a lot of this workflow.
Rutgers Open Systems Solutions mirrors a bunch of Linux distributions, and you can use these to download packages quickly when you're on campus. When downloading on the Rutgers campus, your bandwidth will also not be throttled which significantly improves your download speeds.
To add a mirror, open up your /etc/apt/sources.list
file.
sudo [editor] /etc/apt/sources.list
It should look something like this:
# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.
deb http://us.archive.ubuntu.com/ubuntu/ [distrib] main restricted
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib] main restricted
## Major bug fix updates produced after the final release of the
## distribution.
deb http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates main restricted
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates main restricted
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team. Also, please note that software in universe WILL NOT receive any
## review or updates from the Ubuntu security team.
deb http://us.archive.ubuntu.com/ubuntu/ [distrib] universe
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib] universe
deb http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates universe
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates universe
## N.B. software from this repository is ENTIRELY UNSUPPORTED by the Ubuntu
## team, and may not be under a free licence. Please satisfy yourself as to
## your rights to use the software. Also, please note that software in
## multiverse WILL NOT receive any review or updates from the Ubuntu
## security team.
deb http://us.archive.ubuntu.com/ubuntu/ [distrib] multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib] multiverse
deb http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib]-updates multiverse
## N.B. software from this repository may not have been tested as
## extensively as that contained in the main release, although it includes
## newer versions of some applications which may provide useful features.
## Also, please note that software in backports WILL NOT receive any review
## or updates from the Ubuntu security team.
deb http://us.archive.ubuntu.com/ubuntu/ [distrib]-backports main restricted universe multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ [distrib]-backports main restricted universe multiverse
...
[distrib]
here is the name of your distribution. This is one of natty
, oneiric
, precise
, quantal
, raring
, or saucy
. You can find a full list on the Ubuntu Wikipedia Page
You are going to add the following lines at the top of the file.
deb http://mirrors.rutgers.edu/ubuntu [distrib] main restricted universe multiverse
deb http://mirrors.rutgers.edu/ubuntu [distrib]-updates main restricted universe multiverse
deb http://mirrors.rutgers.edu/ubuntu [distrib]-backports main restricted universe multiverse
deb http://mirrors.rutgers.edu/ubuntu [distrib]-security main restricted universe multiverse
Be sure to replace [distrib]
with the name of your distribution. Now, save the file and quit, and run
sudo apt-get update
You should now be downloading packages from the Rutgers mirror. Try to install a package, and make sure that you're making requests to http://mirrors.rutgers.edu
.
When building web applications, you sometimes want to retrieve JSON data from APIs and domains that are external to your service. Because of the Same Origin policy in browsers, you cannot retrieve data from other domains via AJAX.
Usually to get around this, APIs will have endpoints that support JSON-P. JSON-P is a nifty technique that loads JSON data via <script>
tags instead of loading the data via XMLHttpRequests (AJAX). To understand this, let's look at an example.
Let's say you have a service on http://myservice.com/data.json
that returns the following JSON.
{
"from": "myservice",
"status": 200,
"data": ['foo', 'bar', 'baz'],
}
An application that lives on http://anotherapplication.com
cannot access data.json in client-side JS via AJAX because anotherapplication.com
and myservice.com
are not the same domain.
As the author of myservice.com
, you can solve this problem by turning your JSON endpoint into a JSON-P endpoint. To do this you write myservice.com
in such a way that hitting http://myservice.com/data.json?callback=procedureName
returns the following:
procedureName({
"from": "myservice",
"status": 200,
"data": ['foo', 'bar', 'baz'],
})
Now, the author of anotherapplication.com
can load data.json by adding the following script tag dynamically to the client side DOM.
<script type="text/javascript" src="http://myservice.com/data.json?callback=procedureName">
Now, the function procedureName
will get called with the data from data.json. Using this trick, does mean that you have to trust http://myservice.com
, because any content returned by it can get executed by your client side JS.
Most web services will support JSON-P if they expect you to retrieve their data on the client side, but some do not.
For services that do not support JSON-P that live on the internet, you can use YQL to proxy the request through Yahoo's servers, and retrieve data in the same way.
Here is a snippet of jQuery code that would normally hit http://myservice.com/data.json
$.ajax({
'url': 'http://myservice.com/data.json',
'dataType': 'json',
'success': function(response) {
console.log(response);
},
});
Here is how you modify it to proxy via Yahoo's servers.
var yql_url = 'https://query.yahooapis.com/v1/public/yql';
var url = 'http://myservice.com/data.json';
$.ajax({
'url': yql_url,
'data': {
'q': 'SELECT * FROM json WHERE url="'+url+'"',
'format': 'json',
'jsonCompat': 'new',
},
'dataType': 'jsonp',
'success': function(response) {
console.log(response);
},
});
The snippet above will send a request to Yahoo and get back data from myservice.com
as a response. This does mean that http://myservice.com
needs to live on the open web (not on an internal server), so that Yahoo servers can hit it.
jQuery will automatically add a callback
parameter to the request, and give that name to the success
function, so that it gets called appropriately.
Often times when building web applications, I used to spend time deploying my web applications via ssh
and scp
. Then I used Heroku for a few projects, and I really liked that deploying to heroku was as easy as it could be.
git push heroku master
I wanted to have a similar deployment scheme on my own projects that aren't deployed on Heroku.
Since git is a distributed version control system, you can push the code that lives on your machine to another machine very easily via ssh
. So your first instinct is to set up a repo in the location that your code needs to be deployed, and push to it via git. This is a good instinct, but git does not allow you to push code to a working copy. To resolve this, you will create a bare repository on your server, and push to it. You will also set up a git hook to automatically deploy your application when code gets pushed to the bare repository.
Before you start, your codebase needs to be in a git repository. This could be a Github repository that you use for version control. I will assume that your codebase lives in one directory called project
on your development machine, which I will refer to as develop
.
This codebase will be deployed to your server. I will refer to your server as deploy
.
Now, you are going to create a bare git repository on deploy
, and you will be able to push to it from develop
.
username@deploy:~$ mkdir repos # this is the dir where all your repos will be stored.
username@deploy:~$ cd repos
username@deploy:~/repos$ mkdir project.git
username@deploy:~/repos$ cd project.git # You can replace this with the name of your project.
username@deploy:~/repos/project.git$ git init --bare
# Initialized empty Git repository in /home/username/repos/project.git
You will now set up your codebase on develop
to push to the repos/project.git directory on deploy
.
username@develop:~$ cd /path/to/my/project
username@develop:~/code/project$ git status
# This must be a git repo.
username@develop:~/code/project$ git remote add deploy username@deploy:~/repos/project.git # This is the path to your bare repo.
username@develop:~/code/project$ git push deploy master
This will push your codebase, to the bare repository you just created on deploy
. You can verify this by cloning the bare repository if you'd like.
username@develop:~$ cd /tmp
username@develop:/tmp$ git clone username@deploy:~/repos/project.git
# Cloning into 'project'...
# remote: Counting objects: 666, done.
# remote: Compressing objects: 100% (417/417), done.
# remote: Total 666 (delta 255), reused 632 (delta 221)
# Receiving objects: 100% (666/666), 621.96 KiB | 462 KiB/s, done.
# Resolving deltas: 100% (255/255), done.
username@develop:/tmp$ cd project
username@develop:/tmp$ ls
# make sure your files are here.
Now that we are pushing to the repos/project.git directory on deploy
. Let's set up our repository to actually deploy its code. I'll assume that your application gets deployed to /var/www/myproject.com
.
username@deploy:~$ cd repos/project.git
username@deploy:~$ ls
# HEAD branches config description hooks info objects refs
username@deploy:~$ cd hooks
username@deploy:~$ [editor] post-receive
The post-receive hook gets called by git right after code gets pushed to a repository (right after git push deploy master). We will make this hook deploy your application to /var/www/myproject.com
. Using an editor of your choice, place the following in the post-receive file.
#!/bin/bash
### This file gets run when code is pushed to the project.git directory.
GIT_WORK_TREE=/var/www/myproject.com git checkout -f
Make the hook executable.
username@deploy:~/repos/project.git/hooks$ chmod +x post-receive
Make sure that your user has permissions to write to /var/www/myproject.com
. This is it! You can now deploy your code anytime you want by running:
username@develop:~/code/project$ git push deploy master
Verify that your code is deployed when you push, and you should never need to use scp
to deploy ever again.