This is a companion discussion topic for the original entry at https://thoughtbot.com/upcase/videos/extract-class
You said there is an exercise for us, is that true?
How often do you guys parse CSV in your projects?
I know it was a simple example in this video but I’m very interested in approaches you take to avoid CSV parsing errors like encoding, col/row separator, quote_char, etc.
My current project has a feature where our Rails app parses uploaded CSV file, the file is most of the times a conversion from an excel/numbers spreadsheet. I’m on my 3rd rewrite/refactor/revision of my parser class attempt but it feels like I still miss some small gotchas with encoding and quotation.
Great video, as always! I had a question on the forums a couple of weeks ago with a strikingly similar example, so this was perfect timing to help me work through that question and you guys did an excellent job of easing my concerns and explaining your approach.
I have historically been afraid to remove or alter the higher level tests after this type of refactoring, but seeing you and Joe keep the tests around until they are back to green, and then changing the test to use doubles to isolate it makes perfect sense. Joe also alluded to the potential changing the constructor of the original class to take an instance of the new class as a further refactoring, so hopefully we can look forward to more videos with more details around that type of inversion of control.
Another great video guys, thanks a lot. Really glad you mentioned the issue of dependencies towards the end, and how you could simply pass a
Sender instead of the CSV data. I think that makes total sense, but for me, the
Sender is not really expecting a parser object, but more an object that plays the role of a
RecipientList (or similar) which implements the the
That way, it just expects an object to give it an array of hashes. Then we could have
JSONParser or whatever, that plays the
RecipientList role. What do you think? (And just to say, this line of thinking is totally from just watching you guys in action over the last few weeks with this series!)
Probably I’ve been asking this for a long time now. But doesn’t adding
spec_helper implies that you are not isolating that particular test from Rails? In this particular example do you actually need to load Rails to get this test passing? It’s seems that you really don’t need it. Can you elaborate on that? Great video though; I probably will keep adding questions because I spend the whole week just trying to practice what you say in there.
While It may be loading probably more from the stack. The slow thing is when you start hitting the database. So loading
spec_helper doesn’t imply or avoid.
Also, with the new changes in rails, like adding Spring, most of the pain from loading the whole stack may be ignored, so keeping the spec_helper, is more like avoiding to manually load things like each of the files tested or libraries like factory girl.
@Cicloid Even in that case, I think that loading factoryGirl is something that for this particular refactoring is not needed. Also I prefer to just add the exact dependencies that I need if something on the performance run go wrong I will be able to see it then without traveling in the entire list of things that Rails loaded from the
spec_helper but then again that’s my approach at the moment.
HI Alex, have you thought about using the
importeroo gems? Especially if you’re working from Excel, it would probably be helpful to let that gem do some of the lifting for you.
@alexbush we run into CSV parsing and generating pretty frequently. I’ve used the built-in CSV support from Ruby for the past few years, and it’s held up pretty well. There are a few things I find helpful when dealing with it:
- Use a dedicated parser class that encapsulates CSV parsing. This also serves as a good place to collect errors.
CSV.parsehandles parse errors, but you can make them prettier.
- Avoid calling
to_csvor anything like that on collections. It inevitably results in trying to generate CSV for a huge collection, resulting in memory bloat. Do it row-by-row instead.
- Try to do both parsing and generating in background jobs, as large CSV files will slow down your HTML requests.
If there are any specific examples or questions, feel free to start a new thread on the subject.
@danscotton I agree - if you inject the dependency, you can totally hide the fact that parsing is going on altogether. I would probably make the
Parser implement the
Enumerable interface and then just accept a
recipients parameter in
initialize. Then, you can pass an
Parser, or the results from a database query, and
Sender doesn’t have to change at all, even semantically.
@Rafael_George This particular example actually doesn’t actually use Rails! The
spec_helper.rb I was using in this example just loaded
active_support, and files from
In general, I haven’t found a lot of benefit from having separate
spec_helper.rb files, even in Rails projects. There are two potential benefits to a slimmer
- Speed: loading Rails and its dependencies takes a while. However, using something like spring largely removes this concern.
- Isolation: loading Rails makes it harder to examine your dependencies. Unfortunately, just having ActiveSupport (necessary for most framework classes like models, views, controllers, or mailers) is enough to bring auto loading into the picture. Because Ruby doesn’t isolate files from each other - a file
required from one location is available everywhere - and because ActiveSupport automatically tries to load dependencies anyway, I find that you really can’t have dependency isolation in a Rails project. Attempting to do so just provides a false sense of security, as there’s no way to enforce the isolation - you never know where your dependencies are coming from.
@jferris Normally when I see
spec_helper I tend to think that’s loading Rails stuffs. In another note what I’m doing is just to be explicit about my dependencies requiring them one by one; also I tend to use rspec instead of
rake spec and use the
.rspec file to set which directories I want rspec to check. I’m talking from the point of view when I’m testing just unit after that I normally tend to autoload everything for my integration tests which are fewer. But I see your point regarding you don’t know where the magic is coming from. Thanks for your reply.
Another great episode, I have two questions:
- Do you prefer to have attr_reader or access instance variables directly like
- After it refactored, do you usually remove private method
recipientsand access it directly
I don’t think it matters much. There are pros and cons to each approach, but the differences are fairly small.
Nope. That method is there (and private) because it helps hide the details of how we access the recipients. Moving it into the
send method would be adding unnecessary detail into
Yep! It’s available on the dashboard, or here: https://github.com/thoughtbot/extract-class-exercise
@benorenstein Can we talk publicly about this ex. I’m talking about blogs and stuffs? Because it’s private are you already know.
When we extract private methods to a new class, don’t we think it’s just a implementation detail? So their behavior should be tested by their higher level?