I am trying to figure out whether I am following best practices while passing arguments to functions. I will give a simplified example directly from the project I am working on to describe my conflict.
Say I have the following class definition
class PatternA(): def __init__(self): self.my_var2=1 self.my_var3=2 def move_data_to_database(self): self.my_var_vA=self.clean_data_vA(self.my_var2, self.my_var3) self.clean_data_vB() def clean_data_vA(self, input_var2, input_var3): mod_var=input_var2*input_var3 return mod_var def clean_data_vB(self): self.my_var_vB=self.my_var2*self.my_var3
The class PatterA
is much more complicated than the code I have provided above.
I originally implemented the clean_data
function with the second form ie. with clean_data_vB(self)
, where the variables self.my_var2
and self.my_var3
are menbers of PatternA
class. This reduced the number of variables that I needed to pass to clean_data_vA
, and reduces the complexity in writing the function interfaces. Further, instead of returning a variable mod_var
, like in clean_data_vB
, I can directly assign it to self.my_var_vA
, which also reduces the number of lines of code I have to write.
However, as my code base grew, I started realizing that its very difficult to make sure that the function clean_data
written in the form clean_data_vA
is truly independent of the other functions or the state of the PatternA
instance object. This is because its accessing two instance variables, self.my_var2
and self.my_var3
, which could change with the state of the instance object. Further, if I had to test the clean_data
funciton using the form clean_data_vB
format, I would have to instantiate an object of type PatternA
, assign the correct values to self.my_var2
and self.my_var3
, and only then I could test the function clean_data_vB
.
On the other hand, if I had test clean_data_vA
, I could do that much simply with PatternA.clean_data_vA(PatternA, 1, 2)
. Further, because the variables used in clean_data_vA
are directly passed through the function interface, they are not dependent on any other variables in the class, or how they change. So its easier to make sure that the clean_data_vA
works reliably.
I have read in Code Complete2
which states that I think both these rules are simplistic and miss the most important consideration: what abstraction is presented by the routine’s interface? If the abstraction is that the routine expects you to have three specific data elements, and it is only a coincidence that those three elements happen to be provided by the same object, then you should pass the three specific data elements individually. However, if the abstraction is that you will always have that particular object in hand and the routine will do something or other with that object, then you truly do break the abstraction when you expose the three specific data elements.
In this case, when the function clean_data_vA
is called, I do have the object in hand. However, I am unclear as to what he means by If the abstraction is that the routine expects you to have three specific data elements. The routine does expect 3 specific data elements, but thats because I wrote it in a different way.
In Clean Code
the author Robert Martin mentions that we should try to minimize the number of arguments passed, but 2 arguments are ok. In this case, to me it seems justified to use 2 arguments, because it leads to better encapsulation of the functions within clean_data_vA
.
Question:
- To me it seems, that
clean_data_vA
is better encapsulated and is thus a better design. I face this kind of situation a lot, so I wonder what is your opinion on the best design of these functions? - None of the books talk about designing functions which are easily testable. To me, it seems that
clean_data_vA
is better encapsulated so its easier to test. Should I consider the testability of functions when I write functions?