I recently read Chad Fowler’s The Passionate Programmer, and he mentions the importance of reading other peoples’ code. This reminded me of something I’ve kind of had in mind for a while, which is to do a deep dive into some parts of the Ruby on Rails code base, in order to better understand how certain things work. The first thing I’d like to explore is the startup process, and the first part of understanding that is learning how it loads all of the relevant pieces of code before things start.
For the first part of this exploration, I’m going to look at Ruby’s lowest level primitives for loading code (i.e. the require method and related methods). In the second part, I’ll take a look at how RubyGems works and in the third part, I plan to tackle how bundler comes into play.
The require method
So to start off, looking at the documentation for the require method on ruby-doc.org, a few of the details are explained. When the require method is called, the following happens:
- If an explicit path to a file was specified (i.e. absolute path or path starting with dot), it will be loaded.
- If a path is not specified, the directories contained in the variable
$:
or$LOAD_PATH
(both refer to the same array) are searched for the file requested. - Various extensions are added to the name requested (.rb, .so, .o, .dll, etc.)
- If the file is successfully loaded, the full path will be added to the
$"
or$LOADED_FEATURES
array. - The file will only be loaded once, so it is safe to call
require
as many times as you want.
In a modern version of ruby, RubyGems will be loaded at startup, and it will augment the logic to the require method, so things get a lot more complicated. We will be looking at RubyGems in part 2, but we want to ignore it for now. For demonstration purposes, we’ll disable RubyGems and poke around a bit.
$ RUBYOPT="--disable-gems" irb
This will cause the underlying ruby to use the --disable-gems
option when IRB is started, so we can access the primitive built-in require
method without the RubyGems functionality. The first thing we can explore is the default load paths:
2.1.2 :003 > puts $:.join("\n") /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/site_ruby/2.1.0 /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/site_ruby/2.1.0/x86_64-darwin13.0 /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/site_ruby /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/vendor_ruby/2.1.0 /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/vendor_ruby/2.1.0/x86_64-darwin13.0 /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/vendor_ruby /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0 /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/x86_64-darwin13.0 => nil
Remember, $:
or $LOAD_PATH
is the list of directories that are searched when require is called. This shows the current list of paths with our fresh IRB console. So, for example, if you type require 'foo'
, ruby will search each of those directories for a foo.rb, foo.so, etc.
The load path can be altered in a few ways, including the -I option to ruby:
$ ruby -I /foo/bar -e 'print $:.join("\n")' /foo/bar /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby/2.0.0 /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby/2.0.0/x86_64-darwin12.5.0 /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby/2.0.0 /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby/2.0.0/x86_64-darwin12.5.0 /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0 /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/x86_64-darwin12.5.0
…or the RUBYLIB variable:
$ RUBYLIB=/foo ruby -e 'print $:.join("\n")' /foo /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby/2.0.0 /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby/2.0.0/x86_64-darwin12.5.0 /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby/2.0.0 /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby/2.0.0/x86_64-darwin12.5.0 /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0 /Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/x86_64-darwin12.5.0
Now, let’s see what’s currently loaded:
2.1.2 :004 > puts $".join("\n") enumerator.so enc/encdb.so enc/trans/transdb.so /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/e2mmap.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/init.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/workspace.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/inspector.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/context.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/extend-command.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/output-method.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/notifier.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/slex.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/ruby-token.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/ruby-lex.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/src_encoding.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/magic-file.rb readline.so /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/input-method.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/locale.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/completion.rb /Users/alwold/.rvm/scripts/irbrc.rb => nil
As mentioned before, the $"
or $LOADED_FEATURES
variable shows the list of files that have already been loaded in the current instance of ruby. We have a few .so native shared library files loaded, as well as a bunch of ruby files for IRB. The IRB-related files are loaded by the irb script (it calls require 'irb'
), so they wouldn’t be there in a plain ruby instance without IRB running:
$ ruby --disable-gems -e 'puts $".join("\n")' enumerator.so enc/encdb.so enc/trans/transdb.so
So, you can see there are just a few files loaded at startup with a plain ruby.
Now, lets’s see what happens if we load another library from the standard ruby library (we don’t have RubyGems loaded, so we can’t load gems right now).
2.1.2 :005 > original = $".dup => ["enumerator.so", "enc/encdb.so", "enc/trans/transdb.so", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/e2mmap.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/init.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/workspace.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/inspector.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/context.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/extend-command.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/output-method.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/notifier.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/slex.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/ruby-token.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/ruby-lex.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/src_encoding.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/magic-file.rb", "readline.so", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/input-method.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/locale.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/completion.rb", "/Users/alwold/.rvm/scripts/irbrc.rb"] 2.1.2 :006 > require 'net/http' => true 2.1.2 :011 > puts ($" - original).join("\n") socket.so /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/socket.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/protocol.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/common.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/generic.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/ftp.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/http.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/https.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/ldap.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/ldaps.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/mailto.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri.rb zlib.so stringio.so /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/exceptions.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/header.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/generic_request.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/request.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/requests.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/response.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/responses.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/proxy_delta.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/backward.rb /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb => nil
Here, we create a copy of the original value of $"
, then we can then check what was added to it after calling require 'net/http'
. When calling require 'net/http'
, ruby looks through its load paths, which includes /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0
(which we saw earlier) then it looks for net/http
with various extensions, finding /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb
. That file then requires various other files, some of which require other files and so on, resulting in the eventual loading of all of the files we see in the previous output.
Each file that is loaded is evaluated by the interpreter, defining methods, classes, modules, etc. and executing code at the top level. In our case, this results in the Net::HTTP
class (and lots of other stuff) being defined, at which point we can call its methods and make HTTP connections.
require_relative, load and autoload
Another method of interest is the require_relative
method. require_relative
is similar to require, except that it loads a file using a path relative to the script that contains the call to require_relative
. So, for example, if you call require_relative 'bla'
from within the file /foo/bar/baz.rb
, the file /foo/bar/bla.rb
is loaded. This is useful when you have a set of files, where you want to require to one file from another. With the normal require
method, relative paths use the current directory as the starting point, and the current directory can be pretty unpredictable. With this in mind, it’s nice to be able to refer to a file relative to the path of the calling file.
Other methods of interest are the load
and autoload
methods. The load
method is similar to require
, except that it will always load the file, even if it has already been loaded. autoload
can be used to trigger automatic loading of a file when particular constants are referenced. One thing to note is that only the require
method is subject to the RubyGems magic I mentioned earlier.
The load
and autoload
methods are less commonly used, so I didn’t go too deep into detail on them, but if you want to learn more, this article does a good job explaining in more detail:
https://practicingruby.com/articles/ways-to-load-code
The require
method is pretty simple, but it’s the basic building block for most of the code loading mechanisms in ruby. I think knowing the details of how it works will be helpful in understanding the way RubyGems and bundler work in further explorations. It should also be helpful in diagnosing any future issues with code loading.
I just started learning Ruby and I’m making my students use Rails in my software engineering class! Just gotta stay a week ahead…
Cool. I know a lot of youngsters don’t like that most CS curriculum is still based on Java, so they should appreciate that. Let me know if there are any other good topics I should cover 🙂