← Back to Upcase

Data structure in Rails


(João Daniel) #1

I’m involved in developing an application that has a somewhat complex data structure. My first attempt was not successful, because the objects ended up having more than one responsability, while other responsibilities were divided between more than one model.

I need the opinion of you to help me get a good proposal for the structure of data.

Each user (or group of users) of the application belongs to an organization. Think of each organization as a completely independent environment, such as a CRM or ERP, i.e., the data of one company do not relate to the data from another company.

Each organization has several customers (which we will call Atom). Every month the user will upload the consolidated purchase of each customer (total and by product) . These data will be crunched and three different analyzes will be generated: Performance Analysis, Goal Analysis and Price Analysis. The point I’m more concerned about is related to the fact that we’re dealing with monthly data.

I would not want to rely on a date attribute to relate (i) the total purchase; (ii) the purchase by product; (iii) performance analysis; (iv) goal analysis and (v) price analysys all within the same month. I think the app would be more organized if I could relate all these models to a Month object.

For example, when displaying the page of a client, I would like to show all your analysis results. I could get it by date. But analysis does not have different days, they are consolidated by month, then it makes more sense to look for analyzes that belongs to a Month.

I’m picturing something like:

MonthData

AtomTotalData
  belongs_to MonthData
AtomProductData
  belongs_to MonthData

MonthAnalysis

PerformanceAnalysis
  belongs_to MonthAnalysis
GoalAnalysis
  belongs_to MonthAnalysis
PriceAnalysis
  belongs_to MonthAnalysis

where I separate raw data from analysis. The problem is that this way, I do not have a direct relationship between the raw data and their analysis. They belong to the same month, but they within the app, they belong to different objects.

I could also do:

Month

AtomTotalData
  belongs_to Month
AtomProductData
  belongs_to Month

PerformanceAnalysis
  belongs_to Month
GoalAnalysis
  belongs_to Month
PriceAnalysis
  belongs_to Month

In this case, every object belongs to the same Month object, but feel like I’m hurting the Single Responsibility Principle because the Month object adds both raw data and analysis.

A third option would be similar to the first, but MonthData and MonthAnalysis objects belong to a Month object. I feel that in this case I would be adding too much objects.

So I have some questions:

  • Do you think that my concern about creating a Month or MonthData object so that I do not only depend on the data attribute makes sense?

  • Which of the following alternatives do you think makes is the best? I have some concern with performance.

  • Is there another option I’m not considering?

Thank you!


(Ben Orenstein) #2

First, I wouldn’t include “Data” in your class names. http://c2.com/cgi/wiki?DontNameClassesObjectManagerHandlerOrData

Second, I’d be a little concerned about modeling everything around a Month. Will the system be flexible if your users suddenly want data for different time-spans? Is that a likely feature request?

If you do end up grouping things by month, I’d probably go with the first option.

I’d wait to worry about performance until it works.

Always :smile:


(João Daniel) #3

Thanks for your answer @benorenstein!

Indeed, every ActiveRecord object is storing data, so there’s no sense in adding “Data” to the name of some objects. I changed it to TotalSales and ProductSales because that’s what this data represent.

Regarding the month, it’s not likely that different time-spans may be necessary. Those analysis are based on consolidated month data and just makes sense for a complete month cycle. There will never be a “reference” date like “2014-09-15”. It’s always regarding a whole month.

I was thinking about doing something like this:

But I got curious about why you would go with the first option (the one with separate MonthData and MonthAnalysis). Is it because it does make sense to not group “data” and analysis together?

I was tending to group them under the same object because both sales data and analysis not only refer to the same month, but those analysis objects refer to the data from that same month. So they’re closely related.


(João Daniel) #4

My concern about relating all the Analysis objects to some common object is to assure that they were created based on the same data.

However the UML diagram I posted before does not solve my problem. I have one more complication.

When I’m viewing the AtomType page, which is a set of Atoms, I need to assure that all the analysis shown there also relate to the same month. So following my logic, I should have a “Month” object for AtomType too, so I can retrieve all the analysis from all the Atoms related to that month. Which starts to get a little more complicated.

Maybe I should just follow your recommendation about not attaching the data to a month, but having start_date and end_date attributes. Then, for every time I will show more than one analysis in the same page, I need to check if they start and end date matches?


(Ben Orenstein) #5

That still seems like the best option to me.

I wouldn’t put a ton of weight in my opinion, since I don’t fully know your domain, but I think that’s how I’d start.