3

I am new to database design and have started developing my first schema.

Some of my tables (e.g. user_profile) I am struggling to come up with an exhaustive list of fields for the table. A particular use case could be that a user wants to add an additional field to their profile e.g. favourite color.

I have read a few articles that say the way to capture this use case is to include an 'other' field to the database. This field then maybe contains a json string of additional fields.

Is this method the best practice way to add additional fields to a table? It seems somewhat hacky to me. The obvious con is that the additional fields will not be as accessible when the database runs a query.

Note: I can see that there may be two different scenarios here. One where users have a personalised list of fields (in which case i think the method above will be the only way). Another where over time I realise all users would benefit from having an additional field in their profile. Comments on both scenarios would be appreciated.

GWed
  • 509
  • 9
  • 24

2 Answers2

1

Where info is personalised it should be normalised into linked tables that only join to rows where that optional column has been added. Also if you might be adding many columns or long data it is common to have a core user table with extended info tables mapped on a one to one basis. These methods allow for smaller indexes more specialised to each query scenario.

Generally, good normalisation and matching tables well to real world entities rather than theoretical ones gives the most adaptable and future proof design.

JamesRyan
  • 658
  • 3
  • 14
1

You can't design a schema to cope with every possible eventuality, so don't try. Couple of options:

  • Design for what you know now. Revisit and modify when new requirements emerge.
  • Use an EAV model to extend the entities which require a flexible property bag of additional fields. You can use a traditional normalised model for the common/base attributes and EAV for expansion.

For a use case such as yours there's a good argument for reading the user profile attributes once per session and caching until exit. This avoids the major EAV caveat, crappy performance.

Mark Storey-Smith
  • 31,687
  • 8
  • 89
  • 124
  • thanks. Have read the link. Seems like EAV should only be used as a last resort. Considering the case where I decide that all users should have a 'favourite_color' field. Will there be any issues if i just add the field to my table? (new to dbs so sorry if it's a noob question) – GWed Feb 19 '13 at 10:26
  • The problem with extra columns is that it is always easiest at the time to just tack it on, but then you end up after some time with 30 odd columns added that shouldn't really be there. Bloating indexes, wasting diskspace with many nulls. And by the time you notice it is all in use so very hard to normalise properly afterwards. – JamesRyan Feb 19 '13 at 10:43
  • 2
    EAV is good as long as the data has no meaning to you, but then why are you collecting it? One strategy might be to use that to see what fields are used and then convert commonly chosen fields into a proper normalised structure later. – JamesRyan Feb 19 '13 at 10:51
  • @JamesRyan I disagree on your first point. I'm not a fan of EAV but there are valid use cases. Your last suggestion is something I've proposed in an answer here (somewhere) previously, it can certainly help. – Mark Storey-Smith Feb 19 '13 at 11:04
  • 1
    EAV is the worst choice for design unles the data changes are totally unpreodicatble. It is far better to adjust the schema when you need new fields. EAV are performance killers. – HLGEM Feb 19 '13 at 16:02
  • 1
    @HLGEM In this scenario, where you are essentially providing a per tenant/client customisation of profile attributes, you would modify the schema? 1000 users with 1 custom attribute each would result in 999 null columns... no thanks. – Mark Storey-Smith Feb 19 '13 at 16:30
  • The traditional way to do it is to create additional columns custom_1, custom2, etc, up to ten of them and then let the customers use those. It is a rare customer who would need more than ten columns to customize. EAvs are only useful if you willhave many fields thatt can't be defined at design time (such as fields for lab tests). THey scertainly shoudl not be a substitute for failure of teh devs to do due dilgence and find out what fields are needed which is the most common current use. – HLGEM Feb 19 '13 at 16:41
  • @MArk Storey-Smith if 1000 different users have chosen their own custom column then what use is the data to you without going through and interpreting each individually? Databases provide uniformity so that we can do something useful with a set, if you sidestep the uniformity to take generic data then you also sidestep the usefulness. EAV has a place but it shouldn't be an excuse to be lazy in design. – JamesRyan Feb 21 '13 at 16:45
  • You are both (HLGEM & JamesRyan) missing the point by such a wide margin that I'm not sure where to start. Feel free to pop into chat at some point. – Mark Storey-Smith Feb 21 '13 at 20:16