Man page of BULK_EXTRACTOR
BULK_EXTRACTOR
Section: User Commands (1)
Updated: MAY 2010
Index of this MAN page
Back To MAN Pages From BackTrack 5 R1 Master List
NAME
bulk_extractor - Scans a disk image for regular expressions and other content.
SYNOPSIS
bulk_extractor -o
output_dir
[options]
image
DESCRIPTION
bulk_extractor
scans a disk image (or any other file) for a large number of pre-defined regular expressions and
other kinds of content. These items are called
features.
When it finds a feature,
bulk_extractor
writes the output to an output file. Each line of the output file
contains a byte offset at which the feature was found, a tab, and the actual
feature. Features therefore cannot contain the end-of-line character.
bulk_extractor
includes native support for EnCase (.E01) and AFFLIB (.aff) files, if
it compiled and linked on a system containing those libraries.
bulk_extractor
is multi-threaded. By specifying the
-j
option, multiple copies of the program can be run. Each thread writes
its results into its own feature file. The files are then combined by
the primary thread when all of the secondary threads complete.
bulk_extractor
is a two-phase program. In phase 1 the features are extracted. In
phase 2 a histogram is created of relevant features.
bulk_extractor
will also create a
wordlist
of all the words that are found in the disk image. This can be used as
a dictionary for cracking encryption.
The options are as follows:
- -o outdir
-
Specifies the output directory, which will be created by
bulk_extractor.
- -b bannerfile.txt
-
Read the contents of
bannerfile.txt
and stamp it at the beginning of each output file. This might be useful if
you have some kind of privacy banner that needs to be stamped at the
top of all of your files.
- -r alert_list.txt
-
Specifies an
alert list,
(or red list), which is a list of terms that, if found, will be specifically flagged
in a special
alert file
that begins with the letters
ALERT.
The alert list may contain individual terms, which
must be found in their entirity and are case-sensitive, or wildcards with standard Unix
globbing (e.g. *@company.com). Globbed terms are case-insensitive.
- -w stop_list.txt
-
Specifies a
stop list,
(or white list),
which is a list of terms that, if found, will be placed in a special
stopped
file (rather than in the main file). The whitelist may also contain
globbed terms.
- -s context_stoplist.txt
-
Specifies a
context-dependent stoplist,
which is a list of tokens to be stopped (but only when they have a
particular context).
- -p path/format
-
Open a disk image and print the information found at
path.
The
format
specification may be
r
for raw output and
h
for hex output.
- -q nn
-
Quiet mode. Only prints every
nn
status reports.
These commands are useful for tuning operation:
- -g n
-
Specifies the size of the margin in bytes.
- -j n
-
Use
n
threads for analysis.
- -Wn1:n2
-
The
scan_wordlist
scanner should only extract words that are between
n1
and
n2
characters in length.
The following commands are useful for debugging:
- -V
-
Print the version number
- -c
-
Enable crash protection.
- -R outdir
-
Restarts the program from where it left off for a particular directory.
- -dN
-
Enable debugging level
N.
- -B nn
-
Set the dedup Bloom filter to
nn
bits.
- -M nn
-
Specifies a maximum recursion depth of
nn.
- -z pagenum
-
Start on page number
pagenum.
- -Y offset
-
Start at offset
- offset.
-
HISTORY
bulk_extractor
is based on a feature extractor and named entity
recognizer developed for SBook in 1989. The feature extractor was
repurposed for disk images in 2003. The stand-alone
bulk_extractor
program was rewritten in 2005 and publicly released in 2007. The
multi-threaded
bulk_extractor
was released in May 2010.
AUTHOR
Simson Garfinkel <simsong@acm.org>
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- HISTORY
-
- AUTHOR
-
This document was created by
man2html,
using the manual pages.
Time: 07:34:21 GMT, September 13, 2011