Supplemental Table S1.  Family Binding Profiles

PFAM acc#

PFAM family

Profile

Source

PF00847

AP2-domain

GGgAw.yGTGy

AlignAce

PF00170

bZIP

s.TGACGy

AlignAce

PF00170

bZIP

Tkr.GyMA

AlignAce

PF00170

bZIP

TGAsTCAs

AlignAce

PF00170

bZIP

caCGTGgc

AlignAce

PF00170

bZIP

TGCAAyms

AlignAce

PF00170

bZIP

mCACGTGk

DimerFinder

PF00170

bZIP

aTGACGTCAt

DimerFinder

PF00170

bZIP

aTGAsTCAt

DimerFinder

PF00170

bZIP

aTTg..cAAt

DimerFinder

PF00170

bZIP

gcCACGTGgc

DimerFinder

PF00170

bZIP

gTGacGTG

DimerFinder

PF00170

bZIP

TkACGTmA

DimerFinder

PF03131

bZIP Maf

GCtgaGTCA

AlignAce

PF03131

bZIP Maf

TGAsTCA

DimerFinder

PF02045

CBFB NFYA

CCAAysrg

AlignAce

PF00859

CTF NFI

tTGSC...

AlignAce

PF00859

CTF NFI

TGGc.....gCCA

DimerFinder

PF02376

CUT

..ATyGRT

AlignAce

PF02376

CUT

CgATcG

DimerFinder

PF02376

CUT

cgATCGAT

DimerFinder

PF00178

Ets

smGGAagy

AlignAce

PF00250

Forkhead

rYAAACAa

AlignAce

PF00250

Forkhead

AryMAATA

AlignAce

PF00250

Forkhead

AATATT

DimerFinder

PF00250

Forkhead

kTTGTT

DimerFinder

PF00250

Forkhead

TGTTTrTT

DimerFinder

PF00250

Forkhead

TTTrTTTA

DimerFinder

PF00320

GATA

.mGAyArG

AlignAce

PF00320

GATA

yGATArs.

AlignAce

PF02183

HALZ

AAT.ATTG

AlignAce

PF02183

HALZ

cAAT.ATTg

DimerFinder

PF00010

HLH

s.CrsGTG

AlignAce

PF00010

HLH

CAgcTG

DimerFinder

PF00010

HLH

cCACGTGg

DimerFinder

PF00010

HLH

tCACGTGa

DimerFinder

PF00505

HMG

AACAAwRr

AlignAce

PF04814

HNF-1 N

g.yRAw.ATTAAC

AlignAce

PF04814

HNF-1 N

GtTAAT.ATTAaC

DimerFinder

PF00046

Homeobox

TAAKKrss

AlignAce

PF00046

Homeobox

AAgyrcTT

DimerFinder

PF00046

Homeobox

AaT.AtT

DimerFinder

PF00046

Homeobox

TAATt.aATTA

DimerFinder

PF00046

Homeobox

TAATTAat

DimerFinder

PF00447

HSF DNA-bind

GAA..YTCkmG

AlignAce

PF00447

HSF DNA-bind

GAA..TTC

DimerFinder

PF00447

HSF DNA-bind

TTCtaGAA

DimerFinder

PF00447

HSF DNA-bind

TTCyaGAAg.TTC

DimerFinder

PF00605

IRF

gAAA.yGAAAs

AlignAce

PF00605

IRF

sTTTCrcTTT

DimerFinder

PF00605

IRF

gTTTCrsTTTC

DimerFinder

PF03165

MH1

tGGCw...

AlignAce

PF03165

MH1

TGGc.....gCCA

DimerFinder

PF00249

Myb DNA-binding

yAACsG.c

AlignAce

PF00249

Myb DNA-binding

CATaCAT

DimerFinder

PF00249

Myb DNA-binding

GGTwGGT

DimerFinder

PF01056

Myc N term

CACGTGs..

AlignAce

PF01056

Myc N term

cCACGTGg

DimerFinder

PF00105

NHR (zf-C4)

..rGGTCA

AlignAce

PF00105

NHR (zf-C4)

aGaACA...TGTtCt

DimerFinder

PF00105

NHR (zf-C4)

AGGTCAc.gTGACCT

DimerFinder

PF00105

NHR (zf-C4)

AGGTCATGACCT

DimerFinder

PF00105

NHR (zf-C4)

tcAAGkTCAag

DimerFinder

PF00105

NHR (zf-C4)

TGACCT...kTGACCT

DimerFinder

PF00105

NHR (zf-C4)

TGACCTTTGACCyy

DimerFinder

PF00292

PAX

raSCgKGrm

AlignAce

PF00292

PAX

CGT.ACG

DimerFinder

PF00292

PAX

TCA.gc.TGA

DimerFinder

PF03792

PBX

tGATTGAT

AlignAce

PF03792

PBX

TGATTGAT

DimerFinder

PF00157

POU

ATGCAAAT

AlignAce

PF00157

POU

ATGmATaw

AlignAce

PF00157

POU

ATAAwTTAT

DimerFinder

PF02257

RFX DNA binding

GTTGCcr.G..rm

AlignAce

PF00554

RHD

GGrAa.yCCc

AlignAce

PF00554

RHD

GGaawttCC

DimerFinder

PF00554

RHD

GGawwtCC

DimerFinder

PF00554

RHD

GGGGAwTCCCC

DimerFinder

PF00853

Runt

yTGyGGT.

AlignAce

PF00319

SRF-TF

CCwwAwaTrG

AlignAce

PF00319

SRF-TF

CTATwwATAG

DimerFinder

PF00319

SRF-TF

GGATCC

DimerFinder1

PF00319

SRF-TF

tCCwTwwAwGGa

DimerFinder

PF02864

STAT bind

TTCy.GGAA

AlignAce

PF02864

STAT bind

GyyTGTCTrrsGwsrkmGC

AlignAce1

PF02864

STAT bind

gACAAGCTTGTc

DimerFinder1

PF02864

STAT bind

GCGACGTCGC

DimerFinder1

PF02864

STAT bind

GTCTGTCT

DimerFinder1

PF02864

STAT bind

TGAGmkCTCA

DimerFinder1

PF02864

STAT bind

TGAGsTswsAsCTCA

DimerFinder1

PF02864

STAT bind

TGAGGTGAG

DimerFinder1

PF02864

STAT bind

TTyC..GrAA

DimerFinder

PF02864

STAT bind

wTTCy.rGAAw

DimerFinder

PF00352

TBP

G.ATATAwA

AlignAce

PF00352

TBP

TATwTAT

DimerFinder

PF01285

TEA

GGAATG.rr

AlignAce

PF03299

TF AP-2

GsSwssgss

AlignAce

PF03529

TF Otx

kgrGaTTAgtg

AlignAce

PF02319

Winged helix

GCGssAAa

AlignAce

PF03106

WRKY

cgGtCamcg

AlignAce

PF02701

zf-Dof

...wAAAG.

AlignAce

PF00172

Zn clus

CGG..g..

AlignAce

PF00172

Zn clus

CGGa..acwgt..tCCG

DimerFinder

Notes:

In a number of cases, a motif found by AlignACE is similar to a motif found by DimerFinder (For example, the motifs TGAsTCAs and aTGAsTCAt in the bZIP family.)  We have not attempted to remove these redundancies.

1This motif is not a valid Family Binding Profile. It derives from a fixed sequence flanking a region that was randomized in a selection experiment.


Supplemental Table S2.  Restricted Family Binding Profiles and Associated Refined Motifs

 

Protein

Label

Profile

Refined Motif

Mean Cross-Validation Error

c-Rel

A

gGGr.tTyC

gGGr.tTyC

0.35

c-Rel

B

krGAAAa.y

.gGrAAwcc

0.42

c-Rel

C

GGaawttCC

GGaawttCC

0.34

c-Rel

D

GGawwtCC

GgrwwycC

0.38

c-Rel

E

GGGgAwTcCCC

gGGrawtyCCc

0.35

E2F4

A

GCGssaaa

GCGssaaa

0.35

HNF3b

A

arTAAACA

.GYaAACA

0.39

HNF3b

B

kTTGTT

gkyGTt

0.46

HNF3b

C

TGTTTrTT

TGTTtrY.

0.44

HNF4a

A

..RGGTCA

marGGyCA

0.40

HNF4a

B

rGwaCA...tGTwC

rg.rCw..rkGkmC

0.48

HNF4a

C

aGaACA...TGTtCt

aGaACa...tGTtCt

0.46

HNF4a

D

AGGTCAc.gTGACCT

.gG.cwc.gwg.Cc.

0.42

HNF4a

E

AGGTCATGACCT

rGkyC..GrmCy

0.42

HNF4a

F

tcAAGkTCAag

tcaaGgtCaag

0.44

HNF4a

G

TGACCT...kTGACCT

tkaCCyymw.tkmyCy

0.43

HNF4a

H

TGACCTTTGACCyy

tGgmCytTGmCcy.

0.30

HNF6

A

ATCGAT.s

ATCGAT.s

0.321

HNF6

B

CAcm.Ata..TaTkG

CAcm.Ata..TaTkG

0.47

HNF6

C

CgATcG

cgATcg

0.43

HNF6

D

cgATCGAT

cgATCGAT

0.321

Nanog

A

TAATTrsy

tAAtkrsy

0.42

Nanog

B

AAgyrcTT

AAgyrcTT

0.43

Nanog

C

AaT.AtT

Aak.mtT

0.44

Nanog

D

TAATt.aATTA

taat...atta

0.44

Nanog

E

TAATTAat

tAAtkr.t

0.44

NeuroD1

A

cCACGTGg

cCamktGg

0.42

NeuroD1

B

CgCaCGC

CgCaCGC

0.46

NeuroD1

C

rCAgcTGy

rCAgcTGy

0.35

NeuroD1

D

tCACGTGa

tCACGTGa

0.44

Oct4

A

ATGCAAAT

ATGCAAAt

0.40

Oct4

B

TAAwTTA

kaAwTtm

0.44

p50

A

GraAw.cCCm

GGraAwyCCC

0.30

p52

A

GGrAw.yCCc

GGrAw.yCCc

0.28

p52

B

GGaawttCC

GGaawttCC

0.30

p52

C

GGawwtCC

GGawwtCC

0.33

p52

D

GGGgAwTcCCC

GGGrawtyCCC

0.21

p65

A

GGrAw.mCCc

ssRrAwycCc

0.401

p65

B

GGGGAwTCCCC

sggrawtyccs

0.401

P-CREB

A

rTGACgyr

rTGaCGy.

0.44

P-CREB

B

ttrtGYAA

tkrcGtMA

0.44

P-CREB

C

caCGTGGc

caCGTGGc

0.47

P-CREB

D

mCACGTGk

w.aCGt.w

0.45

P-CREB

E

aTGACGTCAt

aTgACGTcAt

0.40

P-CREB

F

aTGAsTCAt

.w.msk.w.

0.49

P-CREB

G

aTTg..cAAt

.wwscgsww.

0.46

P-CREB

H

gcCACGTGgc

.ysaCGtsr.

0.41

P-CREB

I

GtG.CaC

skkwmms

0.50

P-CREB

J

gTGacGTG

rTGaCGt.

0.43

P-CREB

K

TtACGTaA

TkaCGtmA

0.41

P-CREB

L

tTGCAa

tyGCra

0.48

RelB

A

GGrAw.yCCc

GGrAw.yCCc

0.30

RelB

B

GGaawttCC

GGrawtyCC

0.32

RelB

C

GGawwtCC

GGawwtCC

0.39

RelB

D

GGGGAwTCCCC

gGGrawtyCCc

0.33

Sox2

A

AACAAWRr

AACAAwrr

0.39

1For two factors, HNF6 and p65, the two best profiles tested gave very similar mean cross-validation errors.  We note that in both cases the refined motifs are also quite similar.

 

 

Supplemental Table S3.  ChIP-chip experiments

Protein

PFAM Domain Family

Species

Cell type

Technology

Array Version

Reference

cRel

PF00554  (RHD)

Human

U937

PCR

Hu13K

[4]

E2F4

PF02319  (Winged helix)

Human

HepG2

PCR

Hu19K

This study

HNF3b

PF00250  (Forkhead)

Human

Liver

PCR

Hu19K

This study

HNF4a

PF00105  (NHR)

Human

Liver

PCR

Hu13K

[1]

HNF6

PF02376  (CUT)

Human

Liver

PCR

Hu13K

[1]

Nanog

PF00046 (Homeobox)

Human

HES

Tiled oligo

Agilent 10-array

[3]

NeuroD1

PF00010 (HLH)

Mouse

MIN6

PCR

Mm13K

This study

Oct4

PF00157  (POU)

Human

HES

Tiled oligo

Agilent 10-array

[3]

p50

PF00554  (RHD)

Human

U937

PCR

Hu13K

[4]

p52

PF00554 ( RHD)

Human

U937

PCR

Hu13K

[4]

p65

PF00554 (RHD)

Human

U937

PCR

Hu13K

[4]

P-CREB

PF00170  (bZIP)

Human

HEK293T

PCR

Hu19K

[2]

RelB

PF00554 ( RHD)

Human

U937

PCR

Hu13K

[4]

Sox2

PF00505  (HMG)

Human

HES

Tiled oligo

Agilent 10-array

[3]

 


Supplemental Table S4.  Importance of Hypothesis Testing

Factor

THEME: Uninformative Hypothesis

AlignACE

THEME

Motif

Mean 3-fold CV error

Rank1

Mean test error2

Mean 3-fold CV error3

c-Rel

Not Found

0.46

Not Found

0.40

0.34

E2F4

Not Found

0.36

Not Found

0.39

0.34

HNF3b

Not Found

0.47

Not Found

0.47

0.39

HNF4

Not Found

0.40

Not Found

0.48

0.30

HNF6

Found

0.34

Not Found

0.50

0.32

Nanog

Not Found

0.45

Not Found

0.47

0.42

NeuroD1

Not Found

0.49

1

0.44

0.35

Oct4

Not Found

0.43

1

0.45

0.41

p50

Not Found

0.40

1

0.32

0.30

p52

Not Found

0.42

1

0.26

0.21

p65

Not Found

0.45

Not Found

0.46

0.40

P-CREB

Not Found

0.43

Not Found

0.47

0.40

RelB

Not Found

0.46

1

0.33

0.30

Sox2

Not Found

0.44

3

0.44

0.39

1Rank of motif matching known specificity

2AlignACE motifs were ranked by enrichment score.  THEME was used without refinement to evaluate the classification error of the top-ranked AlignACE motif.  In the case of Sox2, the motif that matched the known specificity was used in place of the top-ranked motif.  

3Cross-validation error for THEME results shown in Table 1.

 

 

Supplemental Table S5.  NeuroD1 Results Obtained Using Hypotheses Derived from Binding Sites

Binding Site

Initial Hypothesis

Refined Hypothesis

Optimal b

Mean 3-fold CV Error

CAAATG

0.05

0.34

CAGTTG

0.05

0.32

CAGGTG

0.05

0.36

 


Supplemental Table S6.  Top-ranked Family Determined by THEME after Testing with Profiles from All Families1

Factor

PFAM Family

Hypothesis

Refined Motif

Mean 3-fold

CV error

Rank

c-Rel

PF00554  (RHD)

GGrAw.yCCc

GGrAw.yCCc

0.34

1

E2F4

PF02319 (Winged helix)

GCGSsAAa

GCGssAAa

0.30

1

HNF3b

PF00250  (Forkhead)

rYAAACAa

ryAAACA.

0.41

1

HNF4

PF00105 (NHR )

TGACCTTTGACCyy

tGgmCytTGsCcy.

0.28

1

HNF6

PF02376  (CUT)

cgATCGAT

srATCgAT

0.31

1

Nanog

PF00172 (Zn clus)

CGGm.ga.

CgG.....

0.41

1

PF00046 (Homeobox)

TAATTrsy

yAAtkrsy

0.43

8

NeuroD1

PF00170  (bZIP)

gcCACGTGgc

rsCAgcTGsy.

0.38

1

PF00010 (HLH)

cCACGTGg

 sCAgcTGs

0.41

4

Oct4

PF02257 (RFX)

GTTGCya.G..am

.ttgw.atg..aa

0.40

1

PF00157 (POU)

ATGCAAAT

ATGcaaAt

0.41

4

p50

PF00554  (RHD)

GGGGAwTCCCC

GGGrawtyCCC

0.22

1

p52

PF00554  (RHD)

GGGGAwTCCCC

GGGGAwTCCCC

0.23

1

p65

PF00554  (RHD)

GGGGAwTCCCC

sggrawtyccs

0.35

1

P-CREB

PF00170  (bZIP)

aTGACGTCAt

.TgACGTcA.

0.40

1

RelB

PF00554  (RHD)

GGrAw.yCCc

GGrAw.yCCc

0.29

1

Sox2

PF02376  (CUT)

cgATCGAT

racAAw.g

0.37

1

PF00505  (HMG)

AACAAWRr

AACAAwrr

0.41

5

1The top-ranked motif is always shown.  In those cases where this motif is derived from a family other than that of the immunoprecipitated protein, the results for the expected family are also shown.  Similarities between these motifs and the top-ranked motif are indicated by the underlined letters.  We excluded from the analysis the Profiles marked with footnote #1 in Table S1.  These do not derive from binding sites, but rather from fixed sequences flanking a region that was randomized in a selection experiment. 

 

 


Supplemental Table S7.  Motifs Obtained from Hypotheses with 40% Noise

Protein

Refined Noisy Hypothesis

Refined Original Hypothesis

CV error

z-score

c-Rel

0.32

5.5

E2F4

0.39

12.0

HNF3b

0.39

6.2

HNF4a

0.28

9.5

HNF6

0.31

14.6

Nanog

0.43

10.5

NeuroD1

0.37

12.4

Oct4

0.40

16.1

p50

0.29

14.3

p52

0.22

8.4

p65

0.34

6.2

P-CREB

0.39

12.3

RelB

0.29

10.0

Sox2

0.38

23.1

 


 

References:

1. Odom DT, Zizlsperger N, Gordon DB, Bell GW, Rinaldi NJ, et al. (2004) Control of pancreas and liver gene expression by HNF transcription factors. Science 303: 1378-1381.

2. Zhang X, Odom DT, Koo SH, Conkright MD, Canettieri G, et al. (2005) Genome-wide analysis of cAMP-response element binding protein occupancy, phosphorylation, and target gene activation in human tissues. Proc Natl Acad Sci U S A 102: 4459-4464.

3. Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, et al. (2005) Core Transcriptional Regulatory Circuitry in Human Embryonic Stem Cells. Cell In Press.

4. Schreiber J, Jenner R, Murray HL, Gerber GK, Gifford DK, et al. (2005) Coordinated Action of NF- B Family Members in the Response of Human Cells to Lipopolysaccharide. Submitted.