I’m looking to use a before :all hook to speed up some of my specs. I know that typically you should use something like destroy in a after :all hook to clean up after yourself.
I’m using DatabaseCleaner, and in the example group I’m working with, no examples need to turn on truncation. I’m considering setting the cleaning strategy to transaction in my before :all and starting the cleaner there as well. Then in after :all, I would rollback with DatabaseCleaner.clean. I briefly looked at DatabaseCleaner’s source code, and saw they when you call DatabaseCleaner.start, it looks like they are counting open transactions. Because of this, I’m guessing that when DatabaseCleaner.start is called for each example, it will open a nested transaction, but I haven’t actually tried this approach yet.
Assuming what I outlined above works, is there a reason I should not use it, and instead stick to using destroy?
One article I read/considered when thinking about this was Cleaning up after before(:all) blocks, but I don’t want to use this idea for every example group in the test suite, just for one specific group. Also, I’m not really sure if the suggestion in the article is a good idea (they kind of allude to the fact that it generally isn’t), and I’m not sure that the suggestion in the comment even works (especially if an example turns on truncation).
This is roughly how fixtures work in Rails: it starts a transaction, commits the fixtures, and then rolls back after each test.
If you definitely need all that data for every test, what you’re describing sounds like a pretty efficient way to do it. However, I think it could become an issue:
If you ever have a test where you need a clean database, you’ll need to start decorating tests so that they clear the database before they start.
With lots of preloaded data, it’s easy to develop accidental dependencies and assumptions based on that data. If you change the global data, you can end up with unexpectedly failing tests, or (worse) tests which pass but no longer actually test anything.
If you ever need a multi-threaded test (Capybara), you’ll run into issues sharing data between connections.
If it’s possible to trim your fixture data down to the point that you no longer have to resort to preloading, I think you’ll save yourself a lot of pain in the end.
I left out some details that I probably should have included. I’m currently working on building a rake task and corresponding helper modules to create seed data to populate development databases with, primarily so that we can have somewhat realistic data in the database when we’re structuring and styling our views.
I’m only using this group transaction idea for testing that the seed data is created the way I intended; I’m definitely not using it for the “application specs” like model, controller, etc specs. The rake task create a lot of records (via the modules it calls), and so I’d like to avoid creating them for every test, and instead have each test just test one specific thing about the seed data that I created in the before :all hook.
I have heard a lot about your second point - that having tests share data can be bad because of the reason you mention - but I’m kind of thinking that this case might be an exception. What do you think?
Normally, I would just have my tests only create a couple records, and then this wouldn’t be an issue, however I’ve designed the modules that create the seed data to use constants (or logic that depends on some constants) to determine how many records are created, e.g. create between MIN and MAX Foos and about 150% more Bars then Foos. And of course that could change so that I use parameters instead of constants, but I didn’t wanted to change my modules just to make testing easier. I like having my MakeFoos module be the one place that defines how I should seed Foo data, including how many Foo records are created.
It sounds like you’re not using the transactions to cache fixture data for tests, but to avoid re-running the same, slow exercise over and over. I think that’s pretty different than how I understood it originally. Ideally, the system you’re testing is fast enough that you don’t have to worry about rerunning it for different assertions. However, assuming the scope is limited to that one system, I think this is a reasonable optimization to make in your tests for that system.