Sunday, May 20, 2012

What are here docs and how are they implemented in ruby?

A here document is a way to represent a multiline string. It's not a concept that's unique to ruby - it's also used in shell scripts, php, perl, and some other scripting languages. In ruby, regular strings with double or single quotes can span multiple lines, so there is almost never a need for here documents.

Of course, in true ruby style, there is an implementation available. To define a here doc, you type <<ID, where ID is some unique word, like EOF or MY_COOL_LIST. The convention, as I understand it, is to type this identifier in all caps. Then, use as many lines as you want to define the actual string. Ruby will stop defining the string when it encounters the identifier again, on its own line. So for example:

puts <<HEREDOC
Hello there!
This is a heredoc, which means
that       if you print it, line breaks and spacing will be preserved.
HEREDOC

Will print:

Hello there!
This is a heredoc, which means
that       if you print it, line breaks and spacing will be preserved.

A couple things to note: the text does not have to be specifically enclosed in the <<ID and ID; instead, saying <<ID specifies that a heredoc will be defined on the following lines. Also, if there are two heredocs defined on one line, e.g. as in func(<<DOC1, <<DOC2), then the second doc will begin to be defined on the line after the closing DOC1. Then you can treat the <<DOC1 declaration as an object, as in the above example. Example: you can do this (copied from Nicholas Evans on Jay Fields' blog):

array_of_long_pasted_in_strings = [<<FOO, <<BAR, <<BLATZ]
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
FOO
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum
BAR
I recently discovered that I can use multiple heredocs as parameters.
Isn't that neat? Because almost nothing needs to be escaped with heredocs, I prefer to use it when strings are pasted in from elsewhere. Because the syntax for using it inside parameter lists is so nice, I prefer to use it whenever a multiline string literal is being passed into something as an argument.
BLATZ

Another quick note: if you use <<ID to define heredocs, then the closing ID must be the first thing on its line, or it will be included in the multiline string. If you put in a dash, like <<-ID, the string definition will end whether there are spaces before the ending ID or not. Refer to the blog post linked above for a good example of this.

So why would you want to use heredocs instead of just regular multiline strings? The only example I can think of is if you have a function with a lof of arguments that you need to pass a multiline string to. For example: open_address(:write, <<ADDRESS, 10. ' '). Then you could define the multiline string below the function call instead of awkwardly making the function call span multiple lines.

So, that's what heredocs are, and that's how they're implemented in ruby.

No comments:

Post a Comment