Site Specific Equilibrium Constants: The Need for Standardized Conventions

In biological chemistry, small biomolecules play important roles as building blocks for biopolymers, as regulators of biological processes, as receptor agonists or antagonists and in cell signaling. When a list of biomolecules [1] is perused, it is quickly obvious that many of them are poly-functional weak acids. In developing methods to detect and quantitate these molecules, ensemble properties are usually sufficient. However, in determining biological activity, the actual speciation is often important. An example of this is may be found in a paper by Bené Noszál, et al. [2]. Thus pharmaceutical researchers are increasingly characterizing the micro speciation of drugs. The fact that the complete physicochemical characterization of poly-functional weak acids requires a complete characterization of all their micro species was recognized soon after equilibrium constants were defined. However, different groups of scientists have defined site specific equilibrium constants in ways that are ambiguous.

In biological chemistry, small biomolecules play important roles as building blocks for biopolymers, as regulators of biological processes, as receptor agonists or antagonists and in cell signaling. When a list of biomolecules [1] is perused, it is quickly obvious that many of them are poly-functional weak acids. In developing methods to detect and quantitate these molecules, ensemble properties are usually sufficient. However, in determining biological activity, the actual speciation is often important. An example of this is may be found in a paper by Bené Noszál, et al. [2]. Thus pharmaceutical researchers are increasingly characterizing the micro speciation of drugs. The fact that the complete physicochemical characterization of poly-functional weak acids requires a complete characterization of all their micro species was recognized soon after equilibrium constants were defined. However, different groups of scientists have defined site specific equilibrium constants in ways that are ambiguous.
In the late 19 th century, Rud Weyscheider [3] discussed in detail the behavior of an unsymmetrical, dibasic acid. (Note: the term dibasic acid is used in the medical field and in biology; analytical and physical chemists use the term diprotic acid.) Using the notation HA-BH for the acid, he indicated that ionization led to the charged forms ( -)A -BH, HA -B( -), and ( -)A-B( -). Using an unsymmetrical, dicarboxylic acid as a specific example, he said that dissociation constants k 1 and k 2 could be assigned to the respective carboxyl groups. Mathematically he wrote expressions for k 1 and k 2 that corresponded to the following respective equilibria: HA-BH Ý H + + ( -)A-BH and HA-BH Ý H + + HA-B( -) . (1) Ignoring the formation of ( -)A-B( -), he proceeded to show that the affinity constant, k, for the acid could be expressed in terms of the affinity constants k 1 and k 2 as k = k 1 + k 2 . Thus it is seen that ambiguity was introduced at the very beginning. Dissociation constants and affinity constants are diametrical opposites. An affinity constant is "(1) a mathematical constant that describes the bonding affinity between two molecules at equilibrium and (2) the reverse of dissociation constant." [4]. "In chemistry and biochemistry, the affinity constant is the reciprocal of the dissociation constant." [5].
Elliot Adams considered the complete system for the relationships between all four micro species of the diprotic acid. He seems to be the first researcher to introduce a diagram for the process [6], a diagram that we would recognize today as a network graph. As shown in Figure  1, each vertex was assigned to a micro species (here we use the same micro species notation as Weyscheider), and the edges were double arrows accompanied by an equilibrium constant. The network graph was oriented to show the micro species with the same number of ionizable H atoms in the same layer. In this case, the layers were in rows, the completely protonated micro species in the top layer and the completely unprotonated micro species in the bottom layer ( Figure 1).
This may be contrasted with the sub-network graph given by Benesch and Benesch [7], shown in Figure 2. The same notation used in Figure 1 has been used to represent the micro species. In this case, the diprotic acid is the carboxylate ion of cysteine. They have assumed that the loss of one proton from cysteine occurs exclusively from the carboxyl group. Elliot Adams symbolized the successive ionization constants of a diprotic acid as K′ and K″, but the Ki were specifically assigned to individual one proton site specific ionization. Thus the K i symbols themselves conveyed nothing about which sites were being ionized. The same can be said for the K A through K D site specific ionization constants of Benesch and Benesch. The later authors did not symbolize the ionization constants for cysteine, as they wrote no equations relating the ionization constants of cysteine to their site specific constants. Undoubtedly this was done, because their assumptions eliminated 4 micro species and 7 site specific ionization constants from the actual network graph of cysteine. The layered network graph of Figure 1 should be preferred to that of Figure 2. The former shows the relationship between the number of micro species in a layer and the binomial coefficients. Calling the top layer "layer 0", corresponding to no protons lost from the completely protonated specie, it is evident that the number of micro species in each layer is equal to n C j , where n = the number of protons in the completely protonated specie and j = the number of protons lost from the completely protonated species. Thus for the diprotic acid, there are 3 layers populated in the ratio 1:2:1. For the triprotic acid, there are 4 layers populated in the ratio 1:3:3:1, etc.
The designation of the site specific constants should be such that the network graph gives information on the identity of the specific sites being ionized. An unambiguous scheme is one proposed by Terrell Hill [8]. He used k 1 to represent the loss of a proton from site 1. If site 1 Thus, when the relationships between protonation constants and site specific protonation constants are expressed, they should be written as follows: K H1 (k 1 ,k 2 ) = k 1 + k 2 and 1/K H2 (k 12 ,k 21 ) = 1/k 12 + 1/k 21 (3) for the diprotic acid case.
One should always make sure that the network graph matches the treatment in the text. In references [12] and [13], the authors present a network graphs that are labeled as exhibiting protonation equilibria, but in the text all constants are defined as ionization constants. Finally, do not mix terminology. S. F. Mason, in a study of N-heteroaromatic hydroxyl compounds, presented a network graph like that in Figure  1, but with site-specific constants K 1 , K 2 , K 4 , K 3 labeled K A , K B , K C , K D , respectively. K A and K B were said to be basic ionization constants, while K C and K D were said to be acidic ionization. However, an examination of his calculated results showed that they were all site-specific acidity constants.
lost a proton and then site 2 lost a proton, the constant was designated k 12 . Thus, if one replaces K 1 with k 1 , K 2 with k 2 , K 4 with k 12 , and K 3 with k 21 in Figure 1, then a site specific equilibrium expression can be written just by looking at the constant. When the relationships between ionization constants and site specific constants are expressed, they should be written as follows: K a1 (k 1 ,k 2 ) = k 1 + k 2 and 1/K a2 (k 12 ,k 21 ) = 1/k 12 + 1/k 21 (2) for the diprotic acid case. K a1 , K a2 , etc. are the IUPAC recommendations [9] for the designation of the acidity constants (aka macro-ionization constants) of poly-functional weak acids. Written as above, this indicates that the site specific constants (aka microionization constants) are acidity constants. The downside of this notation is that the maximum number of subscripts for a site-specific constant increases with n. For a triprotic acid, it is 3, for example k 123 . Experimentally, determination of values for all the site-specific constants is likely to be limited to diprotic and triprotic acids, because the number of site-specific constants increases "astronomically" with n. There are 4 for the diprotic acid, 12 for the triprotic acid, 32 for the tetraprotic acid, etc.
Why is it important to use the subscripts a1, a2, etc.? Because there is a second approach -a protonation scheme may be used. Here, the starting point is the ligand, which has no ionizable protons. The ligand can form bond with protons, just as they can form bonds with metal ions. So, using the micro species shown in Figure 1, the starting point would be ( -)A-B( -). In designating the site-specific protonation constants, Belá Noszál [10] used a letter coded superscript to indicate the site getting protonated. Thus, kA is the protonation constant when the ligand is protonated at site A to give HA-B( -), etc. When a second proton is added to HA-B( -) at site B, the already protonated site is shown as a subscript, and the site being protonated is shown as a superscript, thus kAB. Obviously, this scheme is not ideal for a dicarboxylate ligand. One could use numerical subscripts to indicate the site being protonated. How does one distinguish these site-specific pronation constants from the site-specific ionization constant? They are distinguished by showing the relationship to the protonation constants of the ligand. According to IUPAC, these should be designated as K Hn for the addition of the nth proton to a neutral or charged ligand [11].