Transcriptional regulation of haematopoietic transcription factors

The control of differential gene expression is central to all metazoan biology. Haematopoiesis represents one of the best understood developmental systems where multipotent blood stem cells give rise to a range of phenotypically distinct mature cell types, all characterised by their own distinctive gene expression profiles. Small combinations of lineage-determining transcription factors drive the development of specific mature lineages from multipotent precursors. Given their powerful regulatory nature, it is imperative that the expression of these lineage-determining transcription factors is under tight control, a fact underlined by the observation that their misexpression commonly leads to the development of leukaemia. Here we review recent studies on the transcriptional control of key haematopoietic transcription factors, which demonstrate that gene loci contain multiple modular regulatory regions within which specific regulatory codes can be identified, that some modular elements cooperate to mediate appropriate tissue-specific expression, and that long-range approaches will be necessary to capture all relevant regulatory elements. We also explore how changes in technology will impact on this area of research in the future.

TFs play important roles during haematopoiesis, from stem cell maintenance to lineage commitment and diff erentiation. However, relatively little is known about the way in which regulatory information is encoded in the genome, and how individual TFs are integrated into wider regulatory networks. Based on the recent analysis of large-scale eff orts to reconstruct tissue-specifi c regulatory networks, it has been suggested that transcriptional regulatory networks are characterised by a high degree of connectivity between TFs and transcriptional cofactors. Extensive cross-and autoregulatory links therefore create densely connected regulatory circuits that control the large numbers of tissue-specifi c eff ector proteins (enzymes, structural proteins) [3,4] (Figure 1). To understand the functionality of large mammalian regulatory networks, it will therefore be important to identify downstream target genes of specifi c TFs as well as gain insight into combinatorial TF interactions. Th is in turn will not only provide fundamental insights into normal development, but also advance our understanding of how deregulation of networks contributes to pathology.
Th e cis-regulatory regions of a gene locus can be thought of as diff erent modules, each partaking in an important role, such as driving expression of the gene to a specifi c subset of cells or a specifi c tissue type. Th e activity of each regulatory region is controlled by a distinct set of upstream regulators. Th e individual regulatory regions within a given gene locus may have overlapping or very distinct upstream regulators, and it is the combined activity of all these regions that ultimately controls gene expression. Comprehensive identifi cation and characterisation of true functional cis-regulatory regions therefore represent an essential prerequisite to integrate important regulatory genes into wider transcrip tional networks. Traditionally, DNaseI mapping was performed to identify regions of open/accessible chromatin. More recently, comparative genomic sequence analysis has been used to identify highly conserved sequences, which were taken to represent candidate regulatory elements based on the premise that sequence conservation indicated an important function [5][6][7]. Th e most recent development has been that of whole genome re-sequencing, which when coupled with chromatin

Abstract
The control of diff erential gene expression is central to all metazoan biology. Haematopoiesis represents one of the best understood developmental systems where multipotent blood stem cells give rise to a range of phenotypically distinct mature cell types, all characterised by their own distinctive gene expression profi les. Small combinations of lineage-determining transcription factors drive the development of specifi c mature lineages from multipotent precursors. Given their powerful regulatory nature, it is imperative that the expression of these lineage-determining transcription factors is under tight control, a fact underlined by the observation that their misexpression commonly leads to the development of leukaemia.
Here we review recent studies on the transcriptional control of key haematopoietic transcription factors, which demonstrate that gene loci contain multiple modular regulatory regions within which specifi c regulatory codes can be identifi ed, that some modular elements cooperate to mediate appropriate tissuespecifi c expression, and that long-range approaches will be necessary to capture all relevant regulatory elements. We also explore how changes in technology will impact on this area of research in the future.
immuno precipitation assays allows genome-wide map ping of the chromatin status for a given histone modifi cation [8]. Th ough more predictive than previous approaches, these techniques still require functional validation of candidate elements, which involves in vivo and in vitro experiments to assess the true function of a given candidate regulatory region.
Several gene loci coding for TFs essential for haematopoiesis have been characterised using a combination of the above techniques. Collectively, these studies provided important insights into TF hierarchies and regulatory network core circuits [9][10][11]. Th is review will specifi cally focus on three haematopoietic loci, encoding the key haematopoietic regulators Scl/Tal1, Lmo2 and Gfi 1.

Transcriptional regulation of Scl
Th e basic helix-loop-helix TF Scl/Tal1 is a key regulator of haematopoiesis with additional important roles in the development of the vascular and central nervous systems [12][13][14][15][16]. Within the haematopoietic system, Scl is essential for the development of HSCs as well as further diff erentiation into the erythroid and megakaryocytic lineages [17].
Since correct spatio-temporal expression of Scl is crucial for the appropriate execution of its biological functions, much eff ort has been invested into understand ing how Scl is regulated. Using a combination of long-range comparative sequence analysis and both in vitro and in vivo functional analysis, multiple cis-regulatory elements have been identifi ed in the murine Scl locus, each of which directs expression to a subdomain of endogenous Scl expression when tested in transgenic mice ( Figure 2). Scl has three promoters located in diff erent exons (exons 1a, 1b and exon 4), none of which displayed haematopoietic activity when tested in trans genic mice. A search for additional cis-regulatory elements led to the identifi cation of three haematopoietic enhancers (-4, +19 and +40 kb). Th e -4 Scl enhancer, characterized by the presence of fi ve Ets sites, drives expression to endothelium and fetal blood progenitors [18]. Th e +19 Scl enhancer was shown to drive expression of Scl in HSCs, haematopoietic progenitors and endothelial cells [19][20][21] and critically depended on an Ets/Ets/GATA composite motif shown to be bound in vivo by Elf-1, Fli-1 and Gata2 [22]. Of note, the +19 enhancer was fl anked by a nearby hypersensitive site (+18 Scl element), which did not function as an enhancer but contains a mammalian interspersed repeat that is essential for its ability to 'boost' activity of the +19 element [23]. Th e +40 Scl enhancer drives expression to erythroid cells [24,25] as well as midbrain and is characterized by the presence of two Gata/E-box motifs. Mutation or deletion of a single one of these motifs leads to a loss of function of the enhancer [24,25].
Taken together, these studies have highlighted the presence of three haematopoietic enhancers within the murine Scl locus, with distinct yet overlapping regulatory codes that contribute to the overall correct spatiotemporal expression of Scl. Interestingly, a recent study comparing the functionality of the mouse Scl enhancers with their corresponding chicken counterparts suggested that elements shared by mammals and lower vertebrates exhibit functional diff erences and binding site turnover between widely separated cis-regulatory modules [26]. Remarkably, however, the regulatory inputs and overall expression patterns remain the same across diff erent species. Th is in turn suggested that signifi cant regulatory changes may be widespread, and not only apply to genes with altered expression patterns, but also to those where expression is highly conserved.

Transcriptional regulation of Lmo2
Th e Lim domain only 2 gene (Lmo2) encodes a trans criptional cofactor that is essential for haematopoiesis [27,28]. Th e Lmo2 protein does not bind to DNA directly but rather participates in the formation of multipartite DNA-binding complexes with other TFs, such as Ldb1, Scl/Tal1, E2A and Gata1 or Gata2 [29][30][31]. Lmo2 is widely expressed across haematopoiesis with the exception of mature T-lymphoid cells where aberrant expression of Lmo2 results in T-cell leukaemias [32].
Lmo2 contains three promoters: the proximal pro moter, which drives the majority of expression in endothelial cells [33]; the distal promoter, which is active in the fetal liver and specifi c T-cell acute lymphoblastic leukemia (T-ALL) cell lines [34]; and the intermediate promoter, which was detected in CD34+ cells and was implicated in mediating LMO2 expression in T-ALL patients where high levels of LMO2 were present in the absence of any translocation involving the LMO2 locus [35]. However, none of the three promoters on their own displayed robust expression when tested in transgenic mice [33,36], which led to the identifi cation of eight enhancer elements dispersed over 100 kb that could recapitulate the expression of Lmo2 in normal haematopoiesis [36]. Of note, while individual elements augmented endothelial expression of the proximal promoter, robust haematopoietic expression was only observed when they were combined together (Figure 3). Th is type of combinatorial collaboration between regulatory elements to obtain haematopoietic activity has been seen for other gene loci, such as Endoglin [37], suggesting a process of step-wise and modular activation of the locus during the development of blood and endothelial cells from their common precursor.

Transcriptional regulation of Gfi 1
Th e Growth factor independence 1 gene (Gfi 1) was originally identifi ed in a retroviral screen designed to identify regulatory pathways that could initiate interleukin-2 independence in T cells [38]. Within the haematopoietic system Gfi 1 is expressed in HSCs [39], specifi c subsets of T cells [40], granulocytes, monocytes, and activated macro phages [41]. Gfi 1 -/mice lack neutrophils [41,42] and Gfi 1 -/-HSCs are unable to maintain long-term haematopoiesis because elevated levels of proliferation lead to eventual exhaustion of the stem cell pool [39,43]. Outside the haematopoietic system, Gfi 1 is also specifically expressed in sensory epithelia, the lungs, neuronal precursors, the inner ear, intestinal epithelia and during mammary gland development [44][45][46][47].
A recent study used a combination of comparative genomics, locus-wide chromatin immunoprecipitation assays and functional validation within cell lines and transgenic animals to identify cis-regulatory regions within the Gfi 1 locus [48]. Four regulatory regions (-3.4 kb min pro, -1.2 kb min pro, +5.8 kb enhancer and +35 kb enhancer) were shown to recapitulate endogenous expression patterns of Gfi 1 in the central nervous system, gut, limbs and developing mammary glands but no haematopoietic staining was observed. However, a recent genome-wide ChIP-Seq experiment [49] revealed binding of Scl/Tal1 to a region situated 35 kb upstream of the Gfi 1 promoter within the last intron of its 5' fl anking gene, Evi5. Th is element was subsequently validated in transgenic assays, which demonstrated lacz staining at multiple sites of haematopoietic stem/progenitor cell emergence (vitelline vessels, fetal liver, and dorsal aorta).
Moreover, the element was also shown to be bound by TFs known to be critical for haematopoiesis, including Scl/Tal1, Pu.1/Sfpi1, Runx1, Erg, Meis1, and Gata2, thus integrating Gfi 1 into the wider HSC regulatory network. Th is study therefore supports the notion that important regulatory elements can be located at a signifi cant distance from the gene they control (Figure 4), and thus emphasize the need for careful interpretation of genomewide TF binding datasets [49,50]. (b) Transgenic animals were generated with many diff erent combinations of the identifi ed regulatory elements. The -75 enhancer and pP showed strong expression in endothelium, circulating erythrocytes and fetal liver. The -70 enhancer together with pP showed weak staining in endothelium and haematopoietic progenitor cells. The -25 or the -12 enhancer together with pP showed strong expression in endothelium and fetal liver. The +1 enhancer with pP gave rise to lacZ staining in the tail, apical ridge of the limbs, fetal liver and strong endothelium. Only when these elements were coupled together was a staining pattern corresponding to endogenous expression of Lmo2 seen [36]. Strength of staining is indicated: +++, very strong; ++, intermediate; +, weak; -, not present.  [50] were transformed into a density plot for each transcription factor and loaded into the UCSC genome browser as custom tracks above the UCSC tracks for gene structure and mammalian homology. A discreet binding event for all ten TFs (Scl/Tal1, Lyl1, Lmo2, Gata2, Runx1, Meis1, Pu.1, Fli1, Erg and Gfi 1b) can be seen in the last intron of the 5' fl anking gene, Evi5 (indicated by an asterisk). This region was subsequently shown to drive expression in early haematopoietic cells in transgenic mouse embryos [48].

Transcriptional regulation of other key haematopoietic transcription factors
Th e transcriptional control of several other TFs known to play important roles within haematopoiesis have also been investigated. Runx1 has been shown to be transcribed from two promoter elements, both of which collaborate with the Runx1 +23 kb enhancer to drive expres sion of Runx1 to sites of HSC emergence [51][52][53]. Moreover, the Runx1 +23 kb region was shown to be regulated by important haematopoietic TFs (Gata2, Fli1, Elf1, Pu.1, Scl, Lmo2, Ldb1 and Runx1 itself ) [53,54]. Lyl1 is known to contain a promoter region that can be divided into two separate promoter elements that are responsible for driving the expression of Lyl1 within endothelial, haematopoietic progenitor, and megakaryocytic cells [55]. Th ese promoter elements were shown to contain conserved Ets and Gata motifs that were bound in vivo by Fli1, Elf1, Erg, Pu.1, and Gata2. Multiple elements within the Gata2 locus have been identifi ed (-77 kb, -3.9 kb, -3 kb, -2.8 kb, -1.8 kb, +9.5 kb and 1s promoter) [56][57][58] with the -1.8 kb region being essential for maintaining Gata2 repression in terminally diff erentiating cells [58]. Elf1 contains four promoter elements (-55 kb, -49 kb, -21 kb and proximal), which are used in a cell-type-specifi c manner in combination with a lineagespecifi c -14 kb enhancer element [59]. Enhancer elements utilising the Ets/Ets/Gata regulatory code, originally defi ned in the Scl +19 enhancer, were also identifi ed in the Fli1, Gata2, Hhex/Prh and Smad6 gene loci [5,57]. Th e picture emerging, therefore, is that transcriptional control of important haematopoietic TF loci is achieved through multiple regulatory elements but the number of upstream regulators may be relatively small. Th e same binding motifs are repeatedly found, but it is the precise arrangement within a single element as well as the interactions between elements that ultimately control expression.

Conclusion
Recent analysis of gene regulatory networks controlling pluripotency in embryonic stem cells suggests that a fi nite number of major combinatorial interactions are critical in controlling cellular phenotypes [60,61]. Identi fication and subsequent functional characterisation of specifi c regulatory elements provides a powerful route into deciphering these combinatorial regulatory interactions. Whilst traditional methods of identifying regulatory elements should not be overlooked, it is essential to integrate new genome-wide methods to ensure that regula tory elements outside traditional gene loci boundaries are not overlooked. With the genome-wide mapping of TF binding events now eminently feasible, the importance of sequence conservation as a primary technique for identifi cation of regulatory elements will diminish.
Nevertheless, genome-wide mapping of binding events is descriptive and therefore no substitute for conventional functional assays, which are therefore likely to remain an important component of any research programme aimed at elucidating transcriptional control mechanisms.

Competing interests
The authors declare that they have no competing fi nancial interests.