US Extract API - Address extraction made easy
This page describes how to use the extraction endpoint to find and validate addresses in arbitrary text input.
Contents
- HTTP request
- HTTP response
- Supplementary materials
HTTP request: URL composition
Proper URL construction is required for all API requests. Here is an example URL:
https://us-extract.api.smarty.com?auth-id=123&auth-token=abc
Here is a more granular examination of the example above:
URL components | Values | Notes |
---|---|---|
Scheme | https |
NOTE: Non-secure http requests are not supported |
Hostname | us-extract.api.smarty.com |
|
Query string | ?auth-id=123&auth-token=abc |
Authentication information, inputs, etc. Additional query string parameters are introduced in the
next section. Note: When utilizing any of our APIs, the license parameter is optional. See License Selection for guidance. |
For additional information about URLs, please read our article about URL components.
Please note that all query string parameter values must be url-encoded (spaces become
+
or %20
, for example) to ensure that the data is transferred correctly. A common
mistake we see is a non-encoded pound sign (#
) like in an apartment number (# 409
).
This character, when properly encoded in a URL, becomes %23
. When not encoded this character
functions as the fragment
identifier, which is ignored by our API servers.
HTTP request: Supported methods/verbs
HTTP requests can be categorized according to their HTTP method. Most HTTP requests are defined using the
GET
method. We call these "get requests." Other common methods are PUT
,
POST
, and DELETE
.
The following methods are supported by this API:
POST
-
OPTIONS
(for "pre-flight" cross-domain requests)
Note: When calling any of our APIs using "embedded key" authentication, only the HTTP GET method is allowed; this means embedded keys are NOT supported in this API. With "secret key" authentication, only the HTTP POST method is allowed
Send the text with addresses to extract as the body of the request. Set the value of the
Content-Type
header to text/plain; charset=utf-8
. Each request body is limited to a
maximum length of 64 kilobytes. Here's an example POST
request submitted using the curl command:
curl -v 'https://us-extract.api.smarty.com/?
auth-id=YOUR_AUTH_ID&
auth-token=YOUR_AUTH_TOKEN'
-H 'Content-Type: text/plain; charset=utf-8'
--data-binary '
There are addresses everywhere.
1109 Ninth 85007
Smarty can find them.
3785 Las Vegs Av.
Los Vegas, Nevada
That is all.'
HTTP request: Input fields
Along with the body of your POST request (which is the input string from which to extract addresses) there are several other parameters which have an effect on address extraction behavior. These parameters, which are submitted as query string parameters, are detailed in the table below:
Name | Default | Description |
---|---|---|
html |
derived | HTML input is automatically detected and stripped, but you can manually specify whether your input
is formatted as HTML by setting this to true or false . |
aggressive |
false |
Aggressive mode may use more lookups on your account, but it can find addresses in populous cities without needing a state and ZIP Code , as well as finding addresses in some messy inputs. |
addr_line_breaks |
true |
This parameter specifies if addresses in your input will ever have line breaks. |
addr_per_line |
0 |
Limits the extractor to a certain number of addresses per input line. Generally, you will not need
this parameter unless you are submitting structured data that you know will only have a certain
number of addresses per line. Set to 0 (default) for no limit. |
license |
derived | Specifies the license or licenses (comma separated) to use for this lookup. Valid values can be found in your account's Subscriptions page. If multiple licenses are specified, they are considered in left to right order. |
match |
strict |
The match output strategy to be employed for this lookup. See more here. |
HTTP request: Headers
You must include the following required HTTP headers in all requests:
Header | Description | Example |
---|---|---|
Content-type |
The purpose of the Content-type field is to describe the data contained in the body fully enough that the receiving user agent can pick an appropriate agent or mechanism to present the data to the user, or otherwise deal with the data in an appropriate manner. | Content-type: text/plain; charset=utf-8 |
Host |
The Host request header field specifies the internet host and port number of the resource being requested | Host: us-extract.api.smarty.com |
HTTP response: Status codes and results
Responses will have a status
header with a numeric value. This value is what you should check for
when writing code to parse the response. The only response body that should be read and parsed is a
200
response.
Status code | Response and explanation |
---|---|
401 |
Unauthorized: The credentials were provided incorrectly or did not match any existing, active credentials. |
402 |
Payment required: There is no active subscription for the account associated with the credentials submitted with the request. |
400 |
Bad request (malformed payload): The request body was blank or otherwise malformed. |
422 |
Unprocessable entity: Returns errors describing what needs to be corrected. |
429 |
Too many requests: When using public embedded key authentication, we restrict the number of requests coming from a given source over too short of a time. If you use embedded key authentication, you can avoid this error by adding your IP address as an authorized host for the embedded key in question. |
413 |
Request entity too large: The request body was larger than 64 Kilobytes. |
200 |
OK (success!): The response body is a JSON object containing metadata about the results and zero or more extracted addresses from the input provided with the request. See the annotated example below for details. |
HTTP response: An annotated example
Rather than writing your own code to parse the JSON response, we recommend using a tried and tested JSON parser that is specific for your programming language. There is a very comprehensive list of such tools (as well as the complete JSON specification) at json.org.
NOTE: Any returned fields that are not defined within this document should be considered experimental and may be changed or discontinued at any time without notice.
curl -v 'https://us-extract.api.smarty.com?
auth-id=YOUR+AUTH-ID+HERE&
auth-token=YOUR+AUTH-TOKEN+HERE'
-H 'Content-Type: text/plain; charset=utf-8'
--data-binary '
<div>
<p>
Meet me at 5732 Lincoln Drive Minneapolis MN
</p>
</div>'
The above sample request yields the following JSON output. NOTE: We have modified the output with
// comment statements
(which are actually NOT valid JSON) as minimal documentation. Also, it is
important to notice that the api_output
field has structural parity with the response of the address verification endpoint:
{
"meta":{
// How many total lines of input were received?
"lines":6,
// Did the text have unicode characters or was it plain ASCII?
"unicode":false,
// How many addresses were found in the input?
"address_count":1,
// How many of the found addresses were valid?
"verified_count":1,
// Length of the input in bytes:
"bytes":53,
// Length of the input in characters:
"character_count":53
},
// Array of addresses extracted from the input.
"addresses":[
{
// The actual input text:
"text":"5732 Lincoln Drive Minneapolis MN",
// Was this address verified successfully?
"verified":true,
// The starting line of the 'text' in the input:
"line":4,
// The starting character index of the 'text':
"start":16,
// The ending character index of the text:
"end":49,
// The actual response from the US Street API:
"api_output":[
{
"candidate_index":0,
"delivery_line_1":"5732 Lincoln Dr",
"last_line":"Minneapolis MN 55436-1608",
"delivery_point_barcode":"554361608327",
"components":{
"primary_number":"5732",
"street_name":"Lincoln",
"street_suffix":"Dr",
"city_name":"Minneapolis",
"state_abbreviation":"MN",
"zipcode":"55436",
"plus4_code":"1608",
"delivery_point":"32",
"delivery_point_check_digit":"7"
},
"metadata":{
"record_type":"S",
"zip_type":"Standard",
"county_fips":"27053",
"county_name":"Hennepin",
"carrier_route":"C009",
"congressional_district":"03",
"rdi":"Commercial",
"elot_sequence":"0035",
"elot_sort":"A",
"latitude":44.90127,
"longitude":-93.40045,
"precision":"Zip9",
"time_zone":"Central",
"utc_offset":-6,
"dst":true
},
"analysis":{
"dpv_match_code":"Y",
"dpv_footnotes":"AABB",
"dpv_cmra":"N",
"dpv_vacant":"N",
"active":"Y",
"footnotes":"N#"
}
}
]
}
]
}
Subscription usage
With the extraction endpoint, the usage on your subscription varies depending on your input. One request to the extraction API will use zero or more lookups on your subscription. Aggressive mode will probably use more lookups, but it may find more addresses.
Credits
The US Extract API is brought to you, in part, by the following source code package:
github.com/glenn-brown/golang-pkg-pcre
Copyright (c) 2011 Florian Weimer. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.