The yml format is a data parser in html. Comparing Json and YAML

Stanislav Shashalevich

Content parser is our global and advanced solution that allows you to parse directories, pages and rss feeds. It would seem that what else can be required from this module ?! But it was not there. Our clients do not stand still and constantly demand from us the development of the solution. And we are only happy about it. And now we want to announce that we have satisfied another very important request of our customers: Parsing XML files. Now Parser can work not only with rss, page, catalog data types, but also with xml. And most importantly: the introduction of such useful functionality will not affect the cost of the solution. Solution price in 14 990 rub. will remain unchanged.

Parsing xml files allows you to parse such a useful format for online stores as YML files. That's why xml the default parser is configured for parsing yml issuance. But right there, our customers may have a question: What is your load YML files differs from similar solutions in the Marketplace. Here is a list of some of the advantages of our module over analogues:

  • the ability to convert and convert currencies
  • the possibility of changing prices
  • the ability to edit the name and properties of products
  • ability to specify default properties
  • the possibility of authorization on a third-party server
  • fulfill various activities over elements that are not in the current upload (do nothing, delete, deactivate)
  • automatic text translation
  • possibility of periodic launch (agents, cron)
  • ability to specify fields and properties to update
  • the ability to use a proxy server
If we compare parsing xml With catalog, then parsing xml already at first glance simpler: fewer tabs, fields and other information. The loading speed of information is also faster, as there are no many heavy requests to third-party sites.

The essence of parsing remained the same: processing xml file goes through selectors and attributes. So, if you have already configured the parser catalog, then setting up a new type of parser will be simple and easy for you.

Now let's take a closer look at the functionality of the new data type:

Parser tab:

Parser type- respectively, there is a parser type: rss, page, catalog, xml

Parser mode– the mode in which the parser works. There are two modes of operation: debug and work. By default, debug mode is used for debugging. It is in this mode that you need to configure the parser. In debug mode, the first 30 elements are parsed XML file.

It should be noted that if you use the "Content Parser" module in the trial version, then the parser works only in debug mode.

Additional urls of XML files- you can also include other URLs of xml files in the upload. To do this, simply enter them on a new line.

Infoblock-catalog ID– infoblock where sections and products will be loaded.

Partition ID– section of the infoblock where sections and products will be loaded.

The number of products unloaded in one step of the parser- the number of goods that the parser processes in one step. Default 300

Parser step is a concept that takes place in manual mode launching the parser. In this case, each step occurs disconnection and a new connection to the upload channel. Vary this value depending on the capabilities of your hosting. If the parser is running from an agent (crowns), then the parser step is ignored, and the unloading is done in one request.

Active, Sort, Name, Last run time- intuitive fields and do not need comments.

Encoding- xml file encoding. Deprecated field. On the this moment the encoding is determined automatically, but if there are any problems with the encoding, it is recommended to specify it manually.

Basic settings tab - Categories

Example XML file for categories:

Category name attribute selector- specifies the path to the category name. If empty, then the name is taken from the value of the category itself

Attribute selector containing category id– path to category id.

Attribute selector containing the id of the parent category– to organize the nesting of sections, you must specify the let to the value of the parent id of the category.

Basic settings tab - Products

An example XML file for products:

Product specific selector– path to the container of a specific product

Attribute selector containing product id– path to product id

Product name attribute selector- path to product name

Price Attribute Selector– a container containing the value of the price of the goods

Description Attribute Selector– contains a description of the product

Image preview attribute selector- image path

Detailed image attribute selector- image path

Properties tab

Add property. pictures- if there is an additional images, then you need to specify the fields in which the images will be uploaded.

Selector-attribute enumeration add. pictures- the selector and the attribute add. pictures. Picture example. Specified relative to the product selector.

Property default values– you can specify property values ​​that will be entered by default automatically when creating products

Parsing by selector- you can specify a specific property selector which is inside the product selector in xml. For example: vendor, barcode

Delete characters- you can also remove extra characters in the properties (units of measurement, etc.)

Parsing properties and automatic creation - allows you to automatically create, fill in and update the properties that are listed in the xml file.

The uniqueization of properties in this case goes by name.

Automatic property generation- if the checkbox is checked, then, in the absence of a property, it will create. If the property already exists

Property enum attribute selector– a general selector that contains information about the property

Property name attribute selector– the location path of the property name. We remind you that this is an important parameter, since the uniqueization in this case goes exactly according to this parameter.

Property value attribute selector is the path to the property value. If nothing is set, then the value is taken directly from the property selector

Select the type of properties to create– if the properties were not created, they will be created. You must select the type of new properties from the values: List or String.

Delete characters– allows you to remove extra characters from properties.

Adding/removing field and property symbols- functionality that allows you to add and remove symbols and product names, as well as from its properties.

Tabs trade catalog, Additional settings, Updates/uniqueness, Logs, Video instructions are identical to the type parser catalog. Therefore, we will not consider them in detail.

Trading Catalog tab

The tab allows you to flexibly work with prices:

Specify price and currency parameters

Convert currency

Change prices

Round prices

Advanced settings tab:

Update/Unique tab:

The tab allows you to set the uniqueization parameters, as well as configure the updating of product fields.

  1. Introduction. Brief description of Yandex XML parsing technology. Application options: import from other store engines + creation of stores (or product sections) to earn money on affiliate programs.
  1. Solution of the FIRST problem: automated transfer of goods from the old store to the new one
  2. Installing a test store, on your own
  3. Opening a hosting (free period) to install a store
  • Installing the store by the installer
  • Login to the admin panel of the installed store and the first steps to activate the store
    1. Cleaning the test store from demo content: deleting posts, categories, tags, pages, slides, banners, menus
    1. Installing the parser plugin through the plugins admin panel
    2. Creating a project for parsing
    3. A brief overview of what Yandex XML format is
    4. Populate project data, run YML source analysis
    5. Starting the import
    6. Explanation of the difference free version parser (with a limit of 100 products) from paid (unlimited)
    7. Review of parsing results, attention to the mistake made
    8. Import rollback: delete imported content, check that everything is deleted.
    9. Re-import, review of results: headings, entries.
    10. Overview of the imported record in the admin panel: title, description, product price, thumbnail
    11. Overview of transferred products on the site frontend: archive of the product category, page 1 of the product
    1. Creation home page test shop, with showcase. The test store is ready! The store prototype, based on your product from the old store, can be tested.
    1. The service of transferring content from the old store to the new one, using the WP Shop studio
    2. Brief description of the service
    3. Clearing the test store to import the sample file that the customer receives after the service has been rendered
    4. Import sample file with built-in WordPress importer
    5. Overview of import results
    6. Additional information about the service. We will solve any difficulties.
    1. Solving the SECOND task: creating a pseudo-shop (or a section with goods) for selling goods from other online stores
    2. A general overview of situations where there is a desire or need to put affiliate product to your website or store.
    3. The main difficulty is the automation of the process of transferring goods and periodically updating the assortment. The plugin solves all these problems
    1. Practical case: we put a partner product on our store
    2. One of the advantages of the themes from the WP Shop studio: replacing the click action with the “buy” button in case of an affiliate link
    3. MANDATORY BACKUP before importing third-party products by the parser
    4. Installing the parser, explaining the difference between a free parser and a paid one. Demonstration of the parser capabilities on the paid version
    1. Editing the scraping template to add affiliate links
    2. Parsing template editor overview: entry content area, additional fields area
    3. Compiling an affiliate link in the parsing template editor
    4. Parsing launch, results review: new headings, new products. Overview of a new product.
    5. Demonstration of the logic of the event by clicking on the "buy" button - the visitor goes to the website of the supplier store.
    1. Updating data and synchronizing the assortment with the source store
    2. Updates when prices, assortment at the source store change. Automation saves a huge amount of time and effort!
    3. Reaction to price changes, update example, result overview
    1. Reaction to the removal of goods from the source: an example of updating, viewing the results. The product is not deleted, but becomes in the "out of stock" mode
    1. Reaction to adding products in the source: an example of updating, viewing the results
    1. Updating products automatically according to a schedule: via server cron.
    2. Update url overview …/wp-admin/tools.php?iy-ajax&iy-project-id= 1 &iy-project-action=update
    3. Configuring cron on HostLand hosting: launch command syntax and setting the launch frequency
    1. Results of job triggering from cron: viewing results
    1. Importing goods from three different sources
    2. Copying a template from a previous project
    3. Changing the affiliate link structure
    4. Launching the import of goods from the second store. Overview of the XML source of the second store. Viewing Import Results
    1. Explanations on importing from the "param" fields - they are automatically written to arbitrary fields
    1. Overview of import results from the second store
    2. Overview of the logic for displaying “related products” in a product record
    1. Import from third store
    2. An overview of the features of the XML feed from the partner aggregator Mixmarket.biz to configure its parsing
    3. Editing the import template for the third store
    4. Starting the import
    5. Review of results
    1. Correction of errors revealed after parsing
    2. Deleting imported content
    3. Changing the affiliate link, deleting an extra parameter
    4. IMPORTANT information on the topic of the risk of pissimization by search engines in connection with the direct transfer of content from other sites: you need to close the transferred content NOINDEX and NOFOLLOW!
    5. Explanations why you need to close imported goods from indexing on your site
    6. An explanation of how to make a page from other people's products, increasing the level of originality, by "mixing" products from different affiliate programs, as well as articles and other things
    7. Explanation of how to technically exclude the record of an imported product from indexing, through the “robots” meta tag and the Platinum SEO plugin
    8. Correcting the import template in order to prescribe a ban on indexing by search engines for all imported products
    9. Start import, view the result. We make sure that all imported records are protected from indexing. The risk of pissimization for plagiarism is reduced (removed).
    1. Conclusion. The WP Shop parser is a handy tool for moneymakers. The WP Shop team will support everyone, but those users who use paid products or services, or .
    2. An example of parsing an array for 14,000 products, on a "powerful" server. For those who want to engage in parsing on an "industrial scale" - additional services: installation and configuration of servers and personal improvements to the parser.

    The day has come and configuration files for our application have become so large that the managers hinted that there are suspiciously many curly and non-curly braces in JSON configs, and they would like to get rid of them. A subtle hint was given that it would be nice to take a closer look at YAML, because there are rumors that it is very human-readable. And there are no brackets. And the lists are beautiful. Naturally, we could not help listening to the elders, we had to study the issue, look for the difference, the pros and cons of both formats. Obviously, such comparisons are made only to confirm the opinion of the leaders, or even if not confirmed, they will find why they are right and why it is worth making changes :)

    I am sure that many are familiar with these formats, but still I will give short description from wikipedia:

    JSON (JavaScript Object Notation) is a text-based data exchange format based on JavaScript and commonly used with this language. Like many others text formats, JSON is easy to read by humans. Despite its origins in JavaScript (more precisely, a subset of the language of the 1999 ECMA-262 standard), the format is considered language independent and can be used with almost any programming language. For many languages, there is ready-made code for creating and processing data in JSON format.

    YAML is a human-readable data serialization format, conceptually close to markup languages, but focused on the convenience of input/output of typical data structures of many programming languages. The name YAML is a recursive acronym for YAML Ain "t Markup Language ("YAML is not a markup language"). The name reflects the history of development: on early stages the language was called Yet Another Markup Language ("Another markup language") and was even considered as a competitor to XML, but was later renamed in order to focus on data, and not on document markup.

    And so what we need:

    • make the same complex JSON and YAML
    • determine the parameters by which we will compare
    • deserialize in Java objects about 30 times
    • compare speed results
    • compare file readability
    • compare usability with format

    Obviously, we are not going to write our own parsers, so to begin with, we will choose an already existing parser for each format.
    For json, we will use gson (from google), and for yaml - snakeyaml (from don't-know-who).

    As you can see, everything is simple, you just need to create a fairly complex model that will simulate the complexity of config files, and write a module that will test yaml and json parsers. Let's get started.
    We need a model of approximately this complexity: 20 attributes different types+ 5 collections of 5-10 elements + 5 nested objects of 5-10 elements and 5 collections.
    This stage of the whole comparison can be safely called the most tedious and uninteresting. Classes were created, with unsounding names like Model, Emdedded1, and so on. But we are not chasing the readability of the code (at least in this part), so we will leave it like that.

    file.json

    "embedded2": ( "strel1": "el1", "strel2": "el2", "strel4": "el4", "strel5": "el5", "strel6": "el6", "strel7": " el7", "intel1": 1, "intel2": 2, "intel3": 3, "list1": [ 1, 2, 3, 4, 5 ], "list2": [ 1, 2, 3, 4, 5, 6, 7 ], "list3": [ "1", "2", "3", "4" ], "list4": [ "1", "2", "3", "4", "5", "6" ], "map1": ( "3": 3, "2": 2, "1": 1 ), "map2": ( "1": "1", "2": "2", "3": "3" ) )


    file.yml

    embedded2: intel1: 1 intel2: 2 intel3: 3 list1: - 1 - 2 - 3 - 4 - 5 list2: - 1 - 2 - 3 - 4 - 5 - 6 - 7 list3: - "1" - "2" - "3" - "4" list4: - "1" - "2" - "3" - "4" - "5" - "6" map1: "3": 3 "2": 2 "1": 1 map2: 1: "1" 2: "2" 3: "3" strel1: el1 strel2: el2 strel4: el4 strel5: el5 strel6: el6 strel7: el7


    I agree that human readability is a rather subjective parameter. But still, in my opinion, yaml is a little more pleasing to the eye and more intuitive.

    yaml parser

    public class BookYAMLParser implements Parser ( String filename; public BookYAMLParser(String filename) ( this.filename = filename; ) @Override public void serialize(Book book) ( try ( DumperOptions options = new DumperOptions(); options.setDefaultFlowStyle(DumperOptions.FlowStyle.BLOCK); Yaml yaml = new Yaml(options); FileWriter writer = new FileWriter(filename); yaml.dump(book, writer); writer.close(); ) catch (IOException e) ( e.printStackTrace(); ) ) @Override public Book deserialize() ( try ( InputStream input = new FileInputStream(new File(filename)); Yaml yaml = new Yaml(); Book data = (Book) yaml.load(input); input.close(); return data; ) catch (FileNotFoundException e) ( e.printStackTrace(); ) catch (YamlException e) ( e.printStackTrace(); ) catch (IOException e) ( e.printStackTrace(); ) catch (Exception e) ( String message = " Exception in file " + filename + ", "; throw new Exception(message + e.getMessage()); ) return null; ) )

    json parser

    public class BookJSONParser implements Parser ( String filename; public BookJSONParser(String filename) ( this.filename = filename; ) @Override public void serialize(Book book) ( Gson gson = new GsonBuilder().setPrettyPrinting().create();; try ( FileWriter writer = new FileWriter(filename); String json = gson.toJson(book); writer.write(json); writer.close(); ) catch (IOException e) ( e.printStackTrace(); ) ) @Override public Book deserialize( ) ( Gson gson = new Gson(); try ( BufferedReader br = new BufferedReader(new FileReader(filename)); JsonReader jsonReader = new JsonReader(br); Book book = gson.fromJson(jsonReader, Book.class); return book ; ) catch (IOException e) ( e.printStackTrace(); ) return null; ) )

    As we can see both formats are supported in java. But for json, the choice is much wider, that's for sure.
    The parsers are ready, now let's look at the implementation of the comparison. Here, too, everything is extremely simple and obvious. There is a simple method that deserializes objects from a file 30 times. If anyone is interested - the code is under the spoiler.

    testing code

    public static void main(String args) ( String jsonFilename = "file.json"; String yamlFilename = "file.yml"; BookJSONParser jsonParser = new BookJSONParser(jsonFilename); jsonParser.serialize(new Book(new Author("name", "123-123-123"), 123, "dfsas")); BookYAMLParser yamlParser = new BookYAMLParser(yamlFilename); yamlParser.serialize(new Book(new Author("name", "123-123-123")), 123 , "dfsas")); //json deserialization StopWatch stopWatch = new StopWatch(); stopWatch.start(); for (int i = 0; i< LOOPS; i++) { Book e = jsonParser.deserialize(); } stopWatch.stop(); System.out.println("json worked: " + stopWatch.getTime()); stopWatch.reset(); //yaml deserialization stopWatch.start(); for (int i = 0; i < LOOPS; i++) { Book e; e = yamlParser.deserialize(); } stopWatch.stop(); System.out.println("yaml worked: " + stopWatch.getTime()); }

    As a result, we get the following result:
    json worked: 278 yaml worked: 669

    As seen, json files parses about three times faster. But the absolute difference is not critical, on our scale. Therefore, this is not a strong plus in favor of json.
    This happens because json is parsed on the fly, that is, it is read character by character and immediately stored in an object. It turns out the object is formed in one pass through the file. Actually, I don't know how exactly this parser works, but in general scheme such.
    And yaml, in turn, is more measured. The data processing stage is divided into 3 stages. First, a tree of objects is built. Then it is still somehow transformed. And only after this stage it is converted into the necessary data structures.

    A small comparative table ("+" - advantage, "-" - lag, "+-" - no clear advantage):

    How can this be summed up?
    Everything is obvious here, if speed is important to you - then json, if human readability - yaml. You just need to decide what is more important. For us it turned out - the second.
    In fact, there are many more different arguments in favor of each of the formats, but I think that these two points are the most important.

    Further, when working with yaml, I had to deal with not very beautiful exception handling, especially when syntax errors. Also, I had to test various yaml libraries. Also, in the end, it was necessary to write some kind of validation. We tried schema validation (where we had to call ruby ​​gems), and bean validation based on jsr-303. If you are interested in any of these topics - I will be happy to answer questions.
    Thank you for your attention:)

    P.S.
    Already at the end of writing the article, I came across the following comparison of yaml and json.

    (PECL yaml >= 0.4.0)

    yaml_parse- Parses a YAML stream

    Description

    yaml_parse (string $input [, int $pos = 0 [, int &$ndocs [, array $callbacks = NULL ]]]) : mixed

    Converts all or part of the YAML stream and writes to a variable.

    Parameter List

    Line for parsing as a YAML stream.

    parsing document ( -1 for all documents 0 for the first document, ...).

    If ndocs is found, then it will be replaced by the number of documents in the YAML stream.

    Return Values

    Returns the value encoded in input in the corresponding PHP type, or FALSE in case of an error. If the pos parameter is -1 , an array will be returned containing one entry for each document found in the stream.

    Examples

    Example #1 Usage example yaml_parse()

    $yaml =<<---
    invoice: 34843
    date: "2001-01-23"
    bill-to: &id001
    given: Chris
    family: Dumars
    address:
    lines: |-
    458 Walkman Dr.
    Suite #292
    city: Royal Oak
    state: MI
    postal: 48046
    site: zxibit.esy.es
    ship-to: *id001
    product:
    -SKU: BL394D
    quantity: 4
    Description: Basketball
    price: 450
    - sku: BL4438H
    quantity: 1
    description: Super Hoop
    price: 2392
    tax: 251.420000
    total: 4443.520000
    comments: Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338.
    ...
    EOD;

    $parsed = yaml_parse($yaml);
    var_dump($parsed);
    ?>

    The result of running this example will be something like this:

    array(8) ( ["invoice"]=> int(34843) ["date"]=> string(10) "2001-01-23" ["bill-to"]=> &array(3) ( [" given"]=> string(5) "Chris" ["family"]=> string(6) "Dumars" ["address"]=> array(4) ( ["lines"]=> string(34) " 458 Walkman Dr. Suite #292" ["city"]=> string(9) "Royal Oak" ["state"]=> string(2) "MI" ["postal"]=> int(48046) ) ) ["ship-to"]=> &array(3) ( ["given"]=> string(5) "Chris" ["family"]=> string(6) "Dumars" ["address"]=> array (4) ( ["lines"]=> string(34) "458 Walkman Dr. Suite #292" ["city"]=> string(9) "Royal Oak" ["state"]=> string(2) "MI" ["postal"]=> int(48046) ) ) ["product"]=> array(2) ( => array(4) ( ["sku"]=> string(6) "BL394D" [ "quantity"]=> int(4) ["description"]=> string(10) "Basketball" ["price"]=> int(450) ) => array(4) ( ["sku"]=> string(7) "BL4438H" ["quantity"]=> int(1) ["description"]=> string(10) "Super Hoop" ["price"]=> int(2392) ) ) ["tax" ]=> float(251.42) ["total"]=> float(4443.52) ["comments" ]=> string(68) "Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338." )

    1. In the list of products, click "Upload"

    After uploading the goods to the parser, being on the page with the list of goods, click the "Upload" button.

    2. Set the format settings

    In the window that appears, select the "Yandex.Market (YML)" format and set the format settings: how to upload characteristics and separate properties.

    A detailed description of each setting can be found below on this page.

    3. Upload started

    The unloading goods indicator will appear. If you don't want to wait, you can turn off your computer or close your browser - the download will continue regardless of you.

    What is YML?

    YML (Yandex Market Language) is a standard developed by Yandex for accepting and placing information in the Yandex.Market database. YML is based on the XML standard.

    Format settings:

    Generate offer id from— allows you to select the method for generating the id attribute of the offer tag, which determines the ID of the product offer.

    Share multiproperties— allows you to choose how to separate the selected properties (Sizes, Colors, etc.): either based on the repeated param tag, or based on splitting the offer into individual products by group_id, according to the YML specification.

    Items out of stock— you can set how "Out of stock" products will be displayed in the market: On order, i.e. with the ability to order or completely out of stock.

    General settings:

    Unload goods— allows you to select which products to unload based on the "Availability" attribute on the supplier's website.

    The procedure for unloading goods- allows you to select the order of unloading goods and set unloading backwards if desired.

    allow HTML markup in product fields— allows or disables HTML markup in product fields. Very rarely used by online stores.

    Uploading images- allows you to change the number or method of uploading images.

    Unloading characteristics- allows you to upload product properties (colors, sizes, etc.) as separate fields in a file or simply add them to the general description of the product. When added to the description, the columns themselves remain. It is selected depending on the capabilities of your online store or JV website.

    Split into multiple files— allows you to split the upload into several files: by category or by brand.

    Found an error in uploading to this format?

    If you find an error in the Yandex.Market (YML) upload format, please let us know on or in the chat on the site. We will try to fix the upload as soon as possible.

    Internet