05 Jun 2013
You ever need to create a dictionary?
Of course you do! Heck, I do it all the time!
Oh, stop looking at me like that. I don’t just mean language
dictionaries! Address books, blogrolls, bibliographies, calendar events
grouped by date… anywhere you have a list of
words/phrases/terms/names/etc. and their corresponding
definitions/explanations/details/etc., you’ve got a dictionary.
For example, suppose you wanted to create a list of tourist attractions
in your favorite city:
Cerro San Cristóbal: A really big hill on the north side of town.
La Vega: Awesome place to buy veggies.
Patio Bellavista: Great nightlife and dining.
I’m going to go ahead and take the liberty of assuming that your
favorite city is Santiago de Chile, but if this is completely
unacceptable to you, feel free to follow along with your own list.
Not too shabby. Now, let’s tweak it a little bit by putting everything
in quotes:
"Cerro San Cristóbal": "A really big hill on the north side of Santiago."
"La Vega": "Awesome place to buy veggies."
"Patio Bellavista": "Great nightlife and dining."
And just so that it’s clear where each entry in the list ends, we’ll add
a comma at the end:
"Cerro San Cristóbal": "A really big hill on the north side of Santiago.",
"La Vega": "Awesome place to buy veggies.",
"Patio Bellavista": "Great nightlife and dining."
And for good measure, let’s decorate it with some curly braces because
they’re just so goshdarned pretty:
{
"Cerro San Cristóbal": "A really big hill on the north side of Santiago.",
"La Vega": "Awesome place to buy veggies.",
"Patio Bellavista": "Great nightlife and dining."
}
Well, would you look at that — we now have ourselves a
properly-formatted JSON object! Not only is it still perfectly readable
by humans, but you can now feed it into a script in just about any
language.
And since most JSON readers don’t mind a little whitespace, you can make
it even prettier:
{
"Cerro San Cristóbal":
"A really big hill on the north side of Santiago.",
"La Vega":
"Awesome place to buy veggies.",
"Patio Bellavista":
"Great nightlife and dining."
}
With a little bit of JSON, you can easily create attractive definition
lists that are readable both by humans and computers!
28 May 2013
Let’s pretend that you are tasked with writing a one-off script to process a massive amount of backlogged data (a few of you might not have to pretend very hard). Because this script is ultimately going to be run once and then discarded, it would be a waste to spend lots of time building error handling routines and making the process super robust — it would be far less expensive just to expect failure and come up with an efficient way to deal with it.
And if there’s one thing you can count on, it’s failure. Servers will go offline, files will disappear, hamsters will chew through the power cables… one way or another, you’re going to run into a situation where a script or job that you’ve written will not be able to complete its task successfully.
And then what?
Well, you’ve got two options:
- Clear out all the partially-processed data (assuming that’s even possible!), run the job again and hope extra hard that it works this time!
- Periodically save your progress to a file so that your program knows exactly where to leave off in the event of a crash.
I don’t know about you, but I am rather partial to option #2.
For the purposes of this example, let’s say that you manage a website with a huge number of hosted image files (cc-by-sa, of course). One day, you noticed that other website owners are using your images without properly crediting your site. I know that barely ever happens in real life, but bear with me here.
p So you’ve gotta write a script that does two things:
- Submit each photo to a reverse image search API to locate unauthorized uses of the image and log the offending websites to a file for later follow-up.
- Apply a watermark to each image so that future content copiers will attribute the images correctly whether they want to or not.
Step 1 could potentially take awhile, and there are a lot of opportunities for something to get screwed up somewhere along the line. Network outages, connection timeouts, the remote server sends back a 503 — there are a lot of ways this process could fail.
Even the watermarking process could go boom. A corrupted or oddly-formatted image might cause the image processor to crash.
Since you can’t spend hours and hours making your script fail-proof, you’ll have to find a way to keep track of your script’s progress so that in the event that something does break, the program can recover gracefully (and something will break – usually right after you go home for the evening, wasting an entire night’s worth of processing time).
JSON to the rescue! Yeah, I bet you were wondering when I’d get to the point already.
Imagine you started off with a status.json
file that looked something like this:
{
"img_1856322": {
"search": false,
"watermark": false
},
"img_1856323": {
"search": false,
"watermark": false
},
... and so on ...
}
How this file was generated is outside the scope of this article, but given that you’ve made it this far, I bet you’ve already thought of at least 3 devilishly clever ways to do it.
When your script starts up, it reads the JSON object from the status file and starts traversing it, looking for tasks that haven’t been completed yet. As it finishes each step, it updates the object with the results (and don’t forget to write the JSON back to status.json
each time it is updated so that you don’t lose your status in the event that the entire script gets crash-y!).
After awhile, status.json
starts to show signs of life (explanatory comments added):
{
"img_1856322": {
"search": [], // Oh good; nobody's using this image! It's mine! All mine!!
"watermark": true // Watermark completed successfully (:
},
"img_1856323": {
"search": ["http://www.istealimages.com/buy_my_poster/totally_legit.jpg"], // Somebody is abusing our image! Release the hounds!
"watermark": ["DivideByZeroError", "Unexpected divide by zero in region 42."] // Oops; something went wrong during the watermarking! Wait... *unexpected* divide by zero?
}
... you get the idea ...
}
Notice something else cool about status.json
? It doubles as a log file once the script is finished running! Not only do you have a lean placemarker file for your script, but it’s also in a format that’s easy for a human to scan through.
For really complex jobs or periodic tasks, you will probably find that it is worth the time to integrate a fully-featured job manager and spend the extra effort tuning your script for robustness. But when you have a one-off task to crunch through, consider using a JSON-formatted status file so that both you and your script can keep a handle on things.
21 May 2013
From time to time, your data models might suffer from field creep — you
have to keep adding more and more fields to your models to work around
edge cases and unique business requirements.
In the world of relational databases, this usually manifests either as
additional fields tacked onto an existing table, or a separate “detail”
table with a one-to-one relation to its corresponding source model — and
these are just the least complicated approaches; there are far more
complex ways to address the situation!
One approach to addressing the need for ever-more-specific-use-case
fields: add more (increasingly ridiculously-named) fields to the table.
An alternate approach: move all of those pesky “detail” fields into a
separate table that you can hide away in the closet whenever company
comes over.
In addition to cluttering up your ERD, you are also leveraging the power
(and overhead!) of your relational database engine for things that don’t
actually need it.
Your pet store application might never need to filter a select query by
userstory183274_marker value, but thanks to user story 183274, you
still have to store that information with each pet.
Perhaps there’s another approach, one that is more scalable and doesn’t
clutter up your database tables with (almost) useless fields… and yet
still makes this custom information accessible to your application.
Well, as it happens, there is! Combine all of those extra “detail”
fields into a single detail field that uses – you guessed it – JSON to
store key/value pairs!
Much easier on the eyes!
Let’s peek at some values in our newly JSON-ified Pet table:
pet_id |
owner_id |
species_id |
name |
detail |
1 |
1 |
1 |
‘Bowser’ |
{"collar_manufacturer": "PetProtector", "rabies_flag": true} |
2 |
1 |
2 |
‘Oliver’ |
{"has_name_tag": false, "music_preference": "Light Jazz"} |
3 |
2 |
1 |
‘Marcus’ |
{"userstory183274_marker": "FILENOTFOUND"} |
Notice that in the time it took to draft up this table, we added a new
feature: Musical Preference. Oliver likes Light Jazz.
Some of the benefits of storing “extra” data about your models this way:
- You can add additional properties without having to alter your
database schema.
- Keeps your database schema tidy; only “essential” fields show up in
your tables.
There are a few costs to this approach that you should be aware of,
however:
- It requires additional application logic to encode/decode the JSON
in the detail field. This has implications both for performance and
development time.
- These fields are not indexable! If you need to search/filter by a
property in a detail field, you will need to move it to its own
column in the database table or use a system such as Lucene to index
and search that value.
- Extracting data from a detail field is an all-or-nothing operation;
you have to pull the entire JSON string from the database and decode
it in your application to find a specific property. This is usually
not an issue for modern ORMs, since all columns are usually
automatically pulled in every time they fetch a record from
the database.
For storing one-off properties and flag values that you only need at the
application level (i.e., you do not need to index/filter/search these
values), think about whether you can store these values in a single
JSON-formatted string instead of bloating your tables with extra fields.
And if you find that you’re using this feature a lot in your
relational-database-powered application, you might want to consider
whether a NoSQL solution such as CouchDB is a better fit for some or all
of your data!
20 May 2013
Anyone who’s been writing applications for awhile has run into the problem of making the application’s behavior configurable in different operating environments. From DSNs to administrator email addresses, it’s important to have a set of configuration values that live outside the codebase.
But how to store these values in a format that a computer can easily read and write?
A lot of solutions have been tried over the years, including INI, XML and YAML. Each of these formats has its strengths, but they all lack the combination of power and simplicity that JSON offers.
Consider a configuration file that stores database connection strings for different operating modes. Let’s start by seeing how you might write it with INI syntax:
[prod]
sql_username = dbuser
sql_password = dbpass
sql_hostname = 192.168.113.172
sql_portnum = 3306
sql_database = app_database
redis_hostname = 192.168.113.173
redis_portnum = 6379
[test]
sql_username = dbuser
sql_password = dbpass
sql_hostname = 192.168.113.172
sql_portnum = 3306
sql_database = test_app_database
redis_hostname = 192.168.113.175
redis_portnum = 6379
Well, it gets the job done, but it’s kind of… verbose. Plus, if that config file gets long enough, it will get really difficult to separate the different sections (prod, test, etc.).
What about an XML config file?
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE configurations SYSTEM "path/to/config.dtd">
<configurations>
<configuration mode="prod">
<option key="sql">
<value key="username" type="string">dbuser</value>
<value key="password" type="string">dbpass</value>
...
</option>
<option key="redis">
<value key="hostname" type="inet_addr">192.168.113.173</value>
<value key="portnum" type="integer">6379</value>
</option>
</configuration>
<configuration mode="test">
...
</configuration>
</configurations>
Yuck — and I thought INI was verbose! I didn’t even want to type the entire thing out, it got so huge! And by the way, good luck trying to store array values!
How about YAML?
---
prod:
sql:
username: dbuser
password: dbpass
...
redis:
hostname: 192.168.113.173
portnum: 6379
test:
sql:
username: dbuser
password: dbpass
...
redis:
hostname: 192.168.113.175
portnum: 6379
Not bad. Definitely easy for a human to read, and there are libraries in practically every programming language to read and write YAML files. But I’ll share with you a little secret: YAML is JSON-compatible!
That’s right; the above configuration file could also be written like this:
---
prod:
sql: {"username": "dbuser", "password": "dbpass", ...}
redis: {"hostname": "192.168.113.173", "portnum": 6379}
test:
sql: {"username": "dbuser", "password": "dbpass", ...}
redis: {"hostname": "192.168.113.175", "portnum": 6379}
p Well, if you're going to go and use a configuration file format that's compatible with JSON... why not use JSON?
pre.
{
"prod": {
"sql": {
"username": "dbuser",
"password": "dbpass",
...
},
"redis": {
"hostname": "192.168.113.173",
"portnum": 6379
}
},
"test": {
...
}
}
Not only do you get an easy to read syntax (for both humans and computers), but you also get the advantage of being able to explicitly supply a type for your configuration values.
Try to set a configuration value to null in an INI or YAML file (credit where it’s due, at least XML can handle this… if you don’t mind typing a dozen extra characters!).
What about true or false? In JSON, this is just part of the syntax:
{
"prod": {
"use_caching": true
},
"test": {
"use_caching": false
}
}
And with JSON libraries available in nearly every programming language (or supported natively if you are developing a Node.js app!), it’s incredibly easy to integrate a JSON configuration into your application!
19 May 2013
We’ve all been there before: You finished putting together an awesome class or function whose behavior can be configured down to the smallest degree, and now to wire up all the different configuration values:
function jsonapi(url, data, pre_execute_hook, success_hook, failure_hook, error_hook, post_execute_hook, async, method, jsonp) {
...
}
What a mess. Can you imagine trying to call that monstrosity?
jsonapi('http://www.example.com/rest/doSomethingAwesome', {"subscribe": ["peter", "henrietta", "xavier"]}, null, mySuccessHandler, myFailureHandler, myErrorHandler, null, null, null, true);
At least you have some job security — nobody is going to want to maintain that thing!
However, there’s a better way to organize your function parameters using JSON:
function jsonapi(options) {
// Merge options with default values.
options = $.extend(
{
// Standard Options
'url': '',
// Hooks
'pre_execute': null,
'success': null,
'failure': null,
'error': null,
'post_execute': null,
// Advanced Options
'async': true,
'method': 'post',
'jsonp': null,
'data': null
},
(options || {})
);
}
Note that we are using the extend() function from the jQuery library to merge values in the options parameter.
When you are invoking the function, pass a JSON-encoded object as the value of the options parameter:
jsonapi({
"url": 'http://www.example.com/rest/doSomethingAwesome',
"data": {"subscribe": ["peter", "henrietta", "xavier"]},
"success": mySuccessHandler,
"failure": myFailureHandler,
"error": myErrorHandler,
"jsonp": true
});
You might have noticed several advantages to this method:
- The values you are passing at invocation time are self-documenting — even without looking at the function signature, you can tell which option each value corresponds to.
- You can omit any options that should use the default value — in the above example, the we did not have to pass a null value to the pre_execute, post_execute, method and async options; the function automatically uses default values if they are omitted from the JSON object passed to it.
- Individual options can be documented in the function body itself.
p One thing to be careful about though, is that the different options are no longer part of the function’s signature, so IDE tooltips and auto-completion won’t work. Make sure you document well your functions and the options they expect if you are using this technique!