1

I have a table with the following structure:

Table_1
--- UserId (int)
--- UserName (varchar)

A user can have a relation with other users. Meaning, a UserId from Table_1 can have a relation with other UserIds. So, the question is, how should I structure a second table to implement this?

The second table needs to contain the UserId of a user and other UserIds that the user has relations with. So, should I structure the second table as follows:

Table_2
--- UserId (int - Primary Key)
--- OtherId (varchar)

In the structure above, in OtherId I was planning to store all other UserIds in conjunction, like one column would contain all other UserIds that the principal UserId is related with, but each UserId in the OtherId column would be separated with a comma (,). The way I would retrieve them would be by taking the value from the OtherId column and then, in code-behind, splitting the string by (,) into an array. Then I would use each value from the array to retrieve the UserName from Table_1.

This was the idea I had. I tested it on a database of 300,000 records in Table_1 and one relationship string consisting of almost 3000 UserIds in the second table. This took a significant amount of time (10s) to process all the data, including the splitting of the string.

I am almost certain that my idea is nowhere near to ideal practices. Can someone just give me some ideas on how I should go about this?

Andriy M
  • 22,983
  • 6
  • 59
  • 103

1 Answers1

4

It seems what you're describing for Table_2 is a linking table for a self-reference for Table_1. In the case where there is only one possible reference for each UserID, you would store this information on Table_1 itself (UserID and ReferenceUserID with a foreign key back to UserID).

What you have described, however, is a many-to-many relationship, where a UserID can have many related UserIDs and vice-versa. In this scenario, Table_2 should not use a string holding all of the related userIds for your principal. There are several reasons for this, but the two most important are:

  1. Performance will be terrible when querying because you will have to do string manipulation to split the CSV string into rows, then join on those extracted IDs
  2. You will not be able to store information about individual relationships.

To illustrate #2 (using hypothetical use cases for the table), say this database holds information about family relationships. Users can be associated with 0 to many other users in their family tree. If you want to store information about each relationship (not DB relationship, but IRL) then each relationship should be a separate row.

For example, Dave has a UserID of 1. Pete is Dave's brother and has a UserID of 5. In Table_2, you would store Dave's UserID in the UserID column and Pete's in the OtherID column. You can then add fields to record the relationship (Brother) and other information (maiden name for marriages etc).

With the CSV column for OtherID, you could not efficiently store this information for the potentially many relationships a single UserID might have.

Therefore, consider this as the base design for the second table:

CREATE TABLE Table_2
(
  UserID          int FOREIGN KEY REFERENCES Table_1 (UserID),
  ReferenceUserID int FOREIGN KEY REFERENCES Table_1 (UserID),
  PRIMARY KEY (UserID, ReferenceUserID)
);

You can add other columns to this setup to store additional details about each relationship, as discussed above.

Andriy M
  • 22,983
  • 6
  • 59
  • 103
HandyD
  • 9,942
  • 1
  • 11
  • 27
  • So, what should i do then? If i use Table_1 itself to hold ReferenceId as a foreign key pointing back to the UserId, how am i supposed to add multiple ReferenceId fro each UserId? – Aousaf Rashid May 03 '19 at 09:57
  • @zackraiyan hold those relations in a second table. Something like Table_2 (UserID, ReferenceUserID) as Handy has said above – George.Palacios May 03 '19 at 10:47
  • @George.Palacios, in the 2nd table, should UserId and ReferenceId both be foreign keys? – Aousaf Rashid May 03 '19 at 10:47
  • 2
    @zackraiyan yes, both referencing Table_1 (UserId) – George.Palacios May 03 '19 at 10:48
  • @George.Palacios, thanks man! I appreciate it. Just one last question. For example, as each userId would have multiple relation to other userIds, does it mean that Table_2 should record each relation is individual rows? – Aousaf Rashid May 03 '19 at 11:29
  • 2
    As the OP kept asking questions, I took the liberty of explicitly showing in your answer what the second table should look like. Feel free to edit further as you see fit. – Andriy M May 03 '19 at 11:42