Tag Archives: ruby

Loading Code in Ruby, Part 1: The require method and friends

I recently read Chad Fowler’s The Passionate Programmer, and he mentions the importance of reading other peoples’ code. This reminded me of something I’ve kind of had in mind for a while, which is to do a deep dive into some parts of the Ruby on Rails code base, in order to better understand how certain things work. The first thing I’d like to explore is the startup process, and the first part of understanding that is learning how it loads all of the relevant pieces of code before things start.

For the first part of this exploration, I’m going to look at Ruby’s lowest level primitives for loading code (i.e. the require method and related methods). In the second part, I’ll take a look at how RubyGems works and in the third part, I plan to tackle how bundler comes into play.

The require method

So to start off, looking at the documentation for the require method on ruby-doc.org, a few of the details are explained. When the require method is called, the following happens:

  • If an explicit path to a file was specified (i.e. absolute path or path starting with dot), it will be loaded.
  • If a path is not specified, the directories contained in the variable $: or $LOAD_PATH (both refer to the same array) are searched for the file requested.
  • Various extensions are added to the name requested (.rb, .so, .o, .dll, etc.)
  • If the file is successfully loaded, the full path will be added to the $" or $LOADED_FEATURES array.
  • The file will only be loaded once, so it is safe to call require as many times as you want.

In a modern version of ruby, RubyGems will be loaded at startup, and it will augment the logic to the require method, so things get a lot more complicated. We will be looking at RubyGems in part 2, but we want to ignore it for now. For demonstration purposes, we’ll disable RubyGems and poke around a bit.

$ RUBYOPT="--disable-gems" irb

This will cause the underlying ruby to use the --disable-gems option when IRB is started, so we can access the primitive built-in require method without the RubyGems functionality. The first thing we can explore is the default load paths:

2.1.2 :003 > puts $:.join("\n")
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/site_ruby/2.1.0
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/site_ruby/2.1.0/x86_64-darwin13.0
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/site_ruby
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/vendor_ruby/2.1.0
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/vendor_ruby/2.1.0/x86_64-darwin13.0
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/vendor_ruby
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/x86_64-darwin13.0
 => nil

Remember, $: or $LOAD_PATH is the list of directories that are searched when require is called. This shows the current list of paths with our fresh IRB console. So, for example, if you type require 'foo', ruby will search each of those directories for a foo.rb, foo.so, etc.

The load path can be altered in a few ways, including the -I option to ruby:

$ ruby -I /foo/bar -e 'print $:.join("\n")'
/foo/bar
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby/2.0.0
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby/2.0.0/x86_64-darwin12.5.0
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby/2.0.0
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby/2.0.0/x86_64-darwin12.5.0
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/x86_64-darwin12.5.0

…or the RUBYLIB variable:

$ RUBYLIB=/foo ruby -e 'print $:.join("\n")'
/foo
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby/2.0.0
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby/2.0.0/x86_64-darwin12.5.0
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/site_ruby
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby/2.0.0
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby/2.0.0/x86_64-darwin12.5.0
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/vendor_ruby
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0
/Users/alwold/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/x86_64-darwin12.5.0

Now, let’s see what’s currently loaded:

2.1.2 :004 > puts $".join("\n")
enumerator.so
enc/encdb.so
enc/trans/transdb.so
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/e2mmap.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/init.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/workspace.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/inspector.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/context.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/extend-command.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/output-method.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/notifier.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/slex.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/ruby-token.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/ruby-lex.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/src_encoding.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/magic-file.rb
readline.so
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/input-method.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/locale.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/completion.rb
/Users/alwold/.rvm/scripts/irbrc.rb
 => nil

As mentioned before, the $" or $LOADED_FEATURES variable shows the list of files that have already been loaded in the current instance of ruby. We have a few .so native shared library files loaded, as well as a bunch of ruby files for IRB. The IRB-related files are loaded by the irb script (it calls require 'irb'), so they wouldn’t be there in a plain ruby instance without IRB running:

$ ruby --disable-gems -e 'puts $".join("\n")'
enumerator.so
enc/encdb.so
enc/trans/transdb.so

So, you can see there are just a few files loaded at startup with a plain ruby.

Now, lets’s see what happens if we load another library from the standard ruby library (we don’t have RubyGems loaded, so we can’t load gems right now).

2.1.2 :005 > original = $".dup
 => ["enumerator.so", "enc/encdb.so", "enc/trans/transdb.so", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/e2mmap.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/init.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/workspace.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/inspector.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/context.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/extend-command.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/output-method.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/notifier.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/slex.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/ruby-token.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/ruby-lex.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/src_encoding.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/magic-file.rb", "readline.so", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/input-method.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/locale.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb.rb", "/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/irb/completion.rb", "/Users/alwold/.rvm/scripts/irbrc.rb"]
2.1.2 :006 > require 'net/http'
 => true
2.1.2 :011 > puts ($" - original).join("\n")
socket.so
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/socket.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/timeout.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/protocol.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/common.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/generic.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/ftp.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/http.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/https.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/ldap.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/ldaps.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri/mailto.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/uri.rb
zlib.so
stringio.so
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/exceptions.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/header.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/generic_request.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/request.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/requests.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/response.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/responses.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/proxy_delta.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http/backward.rb
/Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb
 => nil 

Here, we create a copy of the original value of $", then we can then check what was added to it after calling require 'net/http'. When calling require 'net/http', ruby looks through its load paths, which includes /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0 (which we saw earlier) then it looks for net/http with various extensions, finding /Users/alwold/.rvm/rubies/ruby-2.1.2/lib/ruby/2.1.0/net/http.rb. That file then requires various other files, some of which require other files and so on, resulting in the eventual loading of all of the files we see in the previous output.

Each file that is loaded is evaluated by the interpreter, defining methods, classes, modules, etc. and executing code at the top level. In our case, this results in the Net::HTTP class (and lots of other stuff) being defined, at which point we can call its methods and make HTTP connections.

require_relative, load and autoload

Another method of interest is the require_relative method. require_relative is similar to require, except that it loads a file using a path relative to the script that contains the call to require_relative. So, for example, if you call require_relative 'bla' from within the file /foo/bar/baz.rb, the file /foo/bar/bla.rb is loaded. This is useful when you have a set of files, where you want to require to one file from another. With the normal require method, relative paths use the current directory as the starting point, and the current directory can be pretty unpredictable. With this in mind, it’s nice to be able to refer to a file relative to the path of the calling file.

Other methods of interest are the load and autoload methods. The load method is similar to require, except that it will always load the file, even if it has already been loaded. autoload can be used to trigger automatic loading of a file when particular constants are referenced. One thing to note is that only the require method is subject to the RubyGems magic I mentioned earlier.

The load and autoload methods are less commonly used, so I didn’t go too deep into detail on them, but if you want to learn more, this article does a good job explaining in more detail:

https://practicingruby.com/articles/ways-to-load-code

The require method is pretty simple, but it’s the basic building block for most of the code loading mechanisms in ruby. I think knowing the details of how it works will be helpful in understanding the way RubyGems and bundler work in further explorations. It should also be helpful in diagnosing any future issues with code loading.